Skip to content
Tim Frenzel

// Insight

Kimi k1.5: the second proof that RL makes reasoning

3 min read
reasoningreinforcement-learninglong-context

DeepSeek-R1 was not the only reasoning model to land in January. Kimi k1.5, from Moonshot AI, reached o1-level reasoning through reinforcement learning at the same time, by a different route than R1. Two independent teams landing RL-driven reasoning in the same month is the real signal: this is a method, not a one-lab fluke.

Kimi’s recipe is notable for what it leaves out. No Monte Carlo tree search, no separate value function, no process reward model, none of the elaborate machinery that reasoning systems were assumed to require. Instead it scales reinforcement learning with long context, letting the model think across long chains while rewarding correct answers. The simplicity is the point. When two labs reach the same capability with stripped-down RL, the lesson is that the capability comes from the reinforcement learning itself rather than the scaffolding around it.

Kimi k1.5: long chain-of-thought vs short, AIME 2024 (%)
Long-CoT77.5Short-CoT60.8

The long2short trick

Kimi adds one idea worth a quant’s attention: long2short. A model that reasons in long chains is accurate but slow and expensive to run. Kimi uses the long-chain model to train a short-chain one, transferring much of the reasoning quality into a model that answers more briefly. On AIME 2024 the long-chain version scores 77.5%, matching o1; the short-chain version scores 60.8%. The gap is the price of compression. The short version still beats the strong non-reasoning models by a wide margin, which is the payoff: you can buy back most of the speed without giving up all the reasoning.

For a desk, that is a usable lever rather than a curiosity. When you put a reasoning model into a pipeline, chain length is a cost dial. Keep the long, expensive reasoning for the genuinely hard cases, and run a distilled short-chain version on the routine ones. It is the same triage logic that governs any expensive resource, applied to how long the model is allowed to think.

One difference from R1 is worth noting for a desk that reads charts and tables. Kimi k1.5 is multimodal. It reasons over images as well as text, and posts 74.9 on MathVista, a visual-math benchmark. For financial work that is not incidental. A great deal of the information in filings and research lives in tables, charts, and exhibits. A reasoning model that can look at them, rather than only read a text extraction, is closer to what the job actually needs. R1’s reasoning is text-first. Kimi is a reminder that the recipe extends to the visual material finance is full of.

What the convergence tells a desk

There is a planning lesson in the timing. When a capability appears from one lab, a cautious desk waits to see whether it holds. When two labs reproduce it independently within a month, and one of them ships the weights openly, the capability has crossed from rumor to infrastructure. You can put reasoning agents on a research roadmap and expect the tools to be there, improving and getting cheaper, rather than betting on a single vendor’s continued goodwill. For a function that has to justify its dependencies to a risk committee, reproducibility across independent teams is exactly the evidence that de-risks the bet.

The caveat is the familiar one. These are competition-math and coding scores, the friendliest terrain for reasoning models. They say less about messy domain work than the headline numbers suggest. Kimi matters less as a model to deploy today than as confirmation that the reasoning recipe is real, general, and here to build on. For a desk planning a research-agent roadmap, that confirmation is worth more than any single benchmark number, because it says the capability will keep arriving from several directions rather than hinging on one lab.

Kimi k1.5 is the second independent proof that reinforcement learning alone produces reasoning, no elaborate search machinery required. Its long2short trick lets a desk trade chain length against cost on purpose.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.