// Insight

Can large language models trade? A simulated market, and a warning about correlation

May 5, 20258 min read

agent-based-modelsmarket-simulationsystemic-riskemergence

Lopez-Lira ran an experiment that sounds like a thought experiment and is not. He built a stock market and populated it entirely with large language models. No humans, no hand-coded trading rules, just LLM agents given roles and a shared order book, left to trade against one another. The paper asks the question in its title literally: can these models trade? The answer is yes. The more useful answer is that a market made only of LLMs reproduces the textbook stylized facts of real markets, including the ones we would rather it did not.

The setup

Agent-based markets are an old tool in finance. You populate a simulated exchange with simple programmed agents, a value trader here, a momentum chaser there, a market maker quoting both sides, and then watch what prices the interaction produces. The contribution here is replacing the simple rule-based agents with language models. Each agent is an LLM handed a mandate, value, momentum, or market-making, plus the visible state of the market, and asked to decide what to do. The orders clear against a persistent limit order book, the same mechanism a real continuous market runs on.

A market populated only by LLM agents

No human or hand-coded strategy sets the price. Each agent is an LLM acting on its assigned mandate and the visible market state. The loop is what a real continuous market is: decide, submit, clear, observe, repeat.

The first thing worth noting is that the agents behave coherently. A model told to trade on value buys when price falls below its estimate of worth and sells when it rises above. A model told to chase momentum does the opposite. The market makers post two-sided quotes and earn the spread. The roles are not decorative. The agents follow them with enough consistency that the market organizes into the familiar ecology of a real one, with distinct participants pursuing distinct objectives.

What emerged

The reason the paper matters is what the interaction produces without anyone scripting it. The simulated market reproduces several of the stylized facts that define real markets. Prices discover value: when an agent has information others lack, trading moves the price toward it, the mechanism that makes markets informative in the first place. Markets underreact to news, adjusting toward a new fair value gradually rather than instantly, a pattern documented for decades in real equities. Market makers provide liquidity strategically, widening quotes when the flow looks informed and tightening them when it looks benign.

And the market produces bubbles. Under the right conditions the agents bid prices well above any defensible estimate of value, ride the move, and let the price eventually break. Nobody coded a bubble. It emerged from agents reacting to each other’s behavior, the same feedback that produces bubbles among humans. That a market of language models manufactures the pathologies of a market of people is the result that should make a practitioner lean in.

This is the appeal of the agent-based approach, the reason it has survived as a research tool. You cannot rerun 2008 to see what a different policy would have done. You can rerun a simulation a thousand times under controlled conditions and watch which knobs produce which behavior. A simulation that reproduces real stylized facts is one whose knobs are worth studying, because the mapping from cause to outcome has some claim to realism.

The result a desk should sit with

The finding I keep returning to is not the bubbles. It is correlation. The agents in this market are language models, and when they share a base model and a similar prompt, they reason along similar paths. Given the same news, they tend to reach the same decision. Their orders line up. They buy together and sell together. The paper treats this as a tunable property of the simulation, where the tuning is the warning.

Why shared models become correlated traders

The simulation makes correlation a knob you can turn. In the real market, the knob is set by how many desks run the same model on the same kind of prompt. Diversity of strategy is what absorbs shocks. A monoculture removes it.

Map that onto where the industry is heading. A handful of base models underpin most of what desks are building. The prompts that turn a base model into a trading or research assistant are converging on similar shapes, because the good patterns get copied. If many desks deploy similar models off shared weights and similar prompts, the simulation says their decisions will correlate, and correlated decisions are exactly how liquidity evaporates in a crowd. A market is stable partly because its participants disagree. Disagreement is what puts a buyer on the other side of your sell. A monoculture of models that reason alike removes the disagreement, and removes it precisely in the moments of stress when everyone is reading the same headline the same way.

The cost shows up in one place above all: the downside tail.

What correlation does to the loss tail

A stylized loss distribution for a market of traders, drawn to make the mechanism visible rather than to report a measured number. A diverse market keeps the left tail thin, because in a sell-off someone with a different model is usually willing to take the other side. As more desks run the same weights off the same prompts, their selling lines up and the shaded crash tail fattens. The center of the distribution, the ordinary day, barely moves. The tail is where the monoculture is paid for, and the tail is exactly what a risk system is built to survive.

This is not a new worry in kind. Quant funds running similar factor models contributed to the August 2007 quant quake, when crowded positions unwound into each other and a diversified-on-paper book turned out to be one trade held by many. The model monoculture is the same hazard with a new cause. What the simulation adds is a clean demonstration that the correlation follows from the shared reasoning itself rather than from any explicit coordination. The agents never collude. They simply think alike because they were built alike, and thinking alike is enough.

What I would take from it

I read this as a research instrument rather than a forecast. The practitioner’s job is to ask what it implies for how we deploy. Three things.

First, treat model and prompt diversity as a risk parameter, not an aesthetic preference. If a desk’s edge depends on seeing what others miss, running the same model as everyone else erodes that edge by construction, and worse, ties your behavior to theirs in a drawdown. There is a real argument for deliberately using a different base model, a different prompt, or a human override precisely to decorrelate from the crowd.

Second, stress-test for correlated-agent behavior the way you stress-test for correlated positions. The relevant scenario is not just a market shock. It is a market shock that every model-driven desk reads the same way at the same moment, thinning liquidity faster than any single book would predict. That tail is fatter in a world of shared models, and belongs in the scenario set.

Third, keep the warning proportionate. This is a simulation, and simulations are arguments rather than evidence. The agents are simpler than a real desk’s full stack, the market is stylized, and no live venue yet runs on pure LLM flow. The result does not prove a crash is coming. It demonstrates a mechanism, cleanly enough that ignoring it would be the mistake. The honest reading is that the correlation hazard is real, its magnitude is unknown, and hedging against it through diversity is cheap.

The paper’s title asks whether language models can trade. They can, well enough to rebuild a market’s virtues and its vices from scratch. The part worth carrying out of the lab is the reminder that a market full of identical reasoners is a fragile one. The industry is, quietly, building toward exactly that.

A market of LLM agents reproduces price discovery, underreaction, and bubbles with nobody scripting them. The lesson for a desk is the correlation: shared models off similar prompts reason alike, trade alike, and thin liquidity exactly when stress hits. Treat model diversity as a risk parameter.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.

Get in touch Read the book →