// Insight

GraphRAG: the retrieval that answers the question flat RAG cannot

September 23, 20246 min read

RAGknowledge-graphopen-source

Ask a vector-RAG system a local question and it does fine. Ask it a global one, the kind that spans a whole corpus. It falls apart. GraphRAG, now open on GitHub, is Microsoft’s answer to the second kind. It wins against naive RAG roughly 70 to 80% of the time on comprehensiveness and diversity for whole-corpus questions. For a research desk, that is the gap between a document lookup and a tool that actually reasons across your filings.

The distinction matters because the two questions are different problems. “What did this company say about its supply chain in the last 10-K” is a local question: the answer sits in a passage, and vector search finds it. “What are the recurring supply-chain risks across my entire coverage universe” is a global question. No single passage holds the answer. It has to be assembled from many documents. That assembly is exactly what flat retrieval cannot do.

How it works

GraphRAG builds structure before it answers. An LLM reads the corpus and extracts a knowledge graph: entities as nodes, relationships as edges. The system then finds communities, clusters of densely connected nodes, and summarizes each one. Those summaries are hierarchical, from the highest-level themes of the dataset down to granular sub-topics.

GraphRAG: structure first, then answer

A global query is answered by mapping it across community summaries and reducing the partial answers into one, rather than retrieving a handful of passages.

When a global query arrives, GraphRAG does not retrieve passages. It maps the question across the relevant community summaries, generates a partial answer from each, and reduces those into a final response. The graph is what lets it see connections that live across documents rather than inside one.

Why a quant should care

This is the retrieval pattern that matches how a research analyst actually thinks. The valuable questions are rarely about one filing. They are about relationships: which of my names share a supplier, which counterparties are linked through ownership, which companies are exposed to the same regulatory change. Those links are spread across documents by construction. A graph is the natural structure for them. Flat vector RAG, ranking passages by similarity, has no way to represent a chain of relationships at all.

The cost profile is better than it looks, too. The blog reports that answering from the highest-level community summaries reaches competitive quality at roughly 2 to 3% of the token cost of summarizing the underlying source text directly. You build the graph once, then query the compact summaries many times. For a corpus you interrogate repeatedly, that amortization is the difference between a research tool you can afford to run and one you cannot.

What this looks like on a desk

Make it concrete. Suppose your coverage universe is two hundred names and you want to know which of them are exposed to a single supplier that just issued a warning. A vector search cannot answer that. There is no passage anywhere that lists the answer, because the answer is a pattern spread across two hundred separate filings, each mentioning the supplier in its own language. GraphRAG can answer it. The supplier is a node. Every company that names it connects to that node. The query becomes a walk over the graph rather than a search for a passage.

The same structure handles the questions a risk committee actually asks. Which of my positions share a common counterparty. Which names in the book are exposed to the same regulatory change. Which companies sit downstream of the same single point of failure in a supply chain. Each of these is a relationship that lives across documents, invisible to retrieval that ranks passages by similarity. The graph makes the relationship a first-class object you can query.

There is a hybrid worth building, because the two retrieval styles are complements. Keep vector RAG for the local, single-document questions, where it is cheap and accurate. Route the global, relational questions to GraphRAG. A simple classifier on the incoming question, or a router agent, can decide which path a query takes. The result is a system that answers both the passage-level question and the pattern-level one, each with the retrieval method suited to it.

The build economics decide whether this is worth it for a given corpus. A research base of filings you query repeatedly amortizes the graph-construction cost over thousands of questions, which makes it cheap per query. The 2-to-3% token cost of answering from high-level community summaries, once the graph exists, is the number that makes repeated querying affordable. A corpus you touch once does not justify the build. The question to ask before committing is how many times you will query the graph against how often it changes.

Where to keep your skepticism

The graph is only as good as the extraction. An LLM building the knowledge graph can miss an entity, invent a relationship, or merge two distinct companies with similar names, and every downstream answer inherits that error. The construction step is a model-generated artifact, which means it needs the same validation as any other model output: spot-check the entities and edges before you trust a conclusion drawn from them.

The build cost is real as well. Extracting a graph and summarizing every community across a large corpus is an up-front LLM expense. It has to be redone as the corpus grows. For a static research base that you query often, the math works. For a fast-moving corpus that changes daily, the rebuild cost is the thing to measure before you commit.

The validation has to scale, which is the unglamorous part. Spot-checking a few entities is fine for a demo. For a production graph over a coverage universe, you want a systematic sample of the extracted nodes and edges, checked against the source, with an error rate you track over time. Weight that sample toward the rare and the consequential rather than spreading it uniformly. The rare relationship is the high-value one. A common, well-documented link shows up in many places and survives a few extraction errors. An unusual cross-holding or an obscure supplier dependency appears once. If the extraction misses it, the graph is silent on exactly the question you most wanted answered.

How I would use it

For the questions flat RAG genuinely cannot answer, and only those. Keep vector retrieval for local lookups, where it is cheaper and perfectly good. Reach for GraphRAG when the question is global and relational: cross-document risk, supply-chain exposure, ownership and counterparty links across a universe. Validate the extracted graph before you act on what it tells you. The contribution here is a retrieval structure that finally matches the shape of the hardest research questions, which is a real step, as long as you check the graph it built to get there. Used that way, GraphRAG is the missing half of a retrieval stack rather than a replacement for the half you already have.

Use vector RAG for the question inside one document and GraphRAG for the question across all of them: cross-document relationships are a graph problem, not a similarity-search one.

Working on AI that needs to ship?

I help funds, fintechs, and data teams take AI from prototype to production.

Get in touch Read the book →