Skip to content
Tim Frenzel

// Insight

Agentic RAG: when retrieval learns to loop

6 min read
RAGagentsretrieval

Standard retrieval-augmented generation runs once: fetch some passages, generate an answer, done. Agentic RAG: A Survey maps the next step, where the model stops being a pipeline and starts behaving like an agent. The shift is from one shot to a loop: the model plans what to retrieve, fetches it, critiques its own draft, and goes back for more when the evidence is thin. For a research assistant, that loop is the difference between a confident wrong answer and a corrected one.

The limitation of standard RAG is structural. It is a fixed left-to-right flow that cannot recover from its own mistakes. If the first retrieval pulls the wrong passage, the model answers from the wrong passage, fluently. The field-guide stack of contextual chunking, finance-tuned embeddings, and reranking makes that first retrieval far better. It is still one shot. When a question needs two hops, or when the first results reveal what you actually should have asked for, a single pass has no move left to make.

Agentic RAG: a loop, not a single shot
QueryPlan: decide what to retrieveRetrieve: choose the source or toolDraft answerReflect: is the evidence enough?Refine the query and retrieve againFinal answer
Standard RAG runs left to right once. Agentic RAG adds the reflect-and-iterate step. A bad first retrieval is caught and corrected instead of producing a confident wrong answer.

The four patterns

The survey organizes the field around four agentic patterns, worth knowing because they are the building blocks you assemble rather than a single monolithic design.

The four agentic patterns the survey names
Agentic RAGPlanning: break the task into retrieval stepsTool use: pick the right source or APIReflection: critique and revise the answerMulti-agent: split roles across agents
The taxonomy sorts real systems by how many of these they combine, how they are controlled, and how much autonomy each agent holds. Reflection is the one that most separates an agentic system from a pipeline.

Planning breaks a complex question into a sequence of retrieval steps instead of one query. Tool use lets the agent choose where to look, a vector store, a SQL database, a live API, rather than being wired to a single source. Reflection has the agent judge its own answer and the evidence behind it, then revise. Multi-agent collaboration splits the work across specialized agents, one to retrieve, one to synthesize, one to check. The survey’s contribution is a principled taxonomy that sorts real systems along a few axes: how many agents are involved, how control flows between them, how much autonomy each is given, and how knowledge is represented. That framing is more useful than a list of named systems, because it lets you place a new design and reason about its trade-offs.

What does it look like as a research assistant?

This is where the pattern earns its place on a desk. A useful research assistant does not answer from the first thing it finds. It decides which filing, dataset, or note to pull, reads what comes back, notices a gap, and goes after the missing piece. Asked which names in a book are exposed to a supplier warning, it plans the sub-questions, queries the holdings, cross-references the filings, checks its own list against the retrieved evidence, and only then commits to an answer. That is agentic RAG. It is much closer to how an analyst actually works than a single similarity search. It maps onto the same profile-memory-reflection structure the trading-agent survey described from the other direction.

The value is not autonomy for its own sake. It is recoverability. A one-shot system that retrieves the wrong quarter is simply wrong. An agentic system that retrieves the wrong quarter can notice the mismatch between the question and the evidence, and fix it before answering. On financial documents, where the near-identical distractor is the dominant failure mode, that self-correction is worth a great deal. The same loop handles the questions a one-shot system mangles: a year-over-year change that needs two filings, a definition that must be resolved before the number means anything, a claim that should be checked against a second source before it is repeated.

The cost: reliability and compute

The loop is not free. The costs are exactly the ones a quant should weigh before building one. Every iteration is more model calls, more retrieval, more latency, and more spend, which matters at scale. Worse, every added step is another place to fail. A chain of probabilistic steps that all have to go right is less reliable than any single step, which is the pass^k problem the agent-reliability work keeps surfacing: run the same task eight times and the success rate falls sharply. A reflection loop with no stopping rule introduces its own failure, where the system second-guesses a correct answer into a wrong one, or simply spins.

The survey is candid that evaluation, coordination, memory, and governance are open problems. That is the honest way of saying these systems are powerful and not yet easy to make reliable. For a regulated desk, the governance gap is the one to sit with: a system that decides for itself what to retrieve and when to stop is harder to audit than a fixed pipeline. The controls have to be designed in rather than bolted on.

How I would build it

Incrementally, starting from a solid one-shot stack. Get retrieval right first, with the field-guide moves, because an agentic loop on top of bad retrieval just iterates over garbage more expensively. Then add the cheapest high-value pattern, reflection: have the system check whether its evidence actually supports its answer, and retrieve again if not, with a hard cap on iterations so it cannot spin. Add planning when questions genuinely need multiple hops, rather than by default. Reach for multiple agents last, when the work clearly splits into distinct roles, because every agent you add multiplies the reliability problem rather than dividing it.

The discipline is the same one that governs any agent system. Decompose only where the task has real seams, instrument every step so you can see where it failed, and put a stopping rule on every loop. The goal is a research assistant that loops when it helps, not one that loops for its own sake.

The bottom line

Agentic RAG is the right direction for a research assistant. The survey is a useful map of the territory. The core idea, adding a plan-retrieve-reflect-iterate loop on top of retrieval, fixes the one thing one-shot RAG cannot do: recover from a bad first pull. The patterns it names, planning, tool use, reflection, and multi-agent collaboration, are the parts you compose. The catch is reliability and cost, both of which grow with every loop and every agent. Build from a solid retrieval base, add reflection first, cap the iterations, and add agents only where the work truly splits. Done that way, the loop is what turns a search box into an analyst.

Agentic RAG adds the move standard RAG lacks: a loop to catch and fix a bad retrieval. Build it on solid one-shot retrieval, add reflection first, cap the iterations, and add agents only where the work truly splits.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.