← Back to blog
Anvit Blog

What Is Agentic RAG? A Plain-English Explainer

June 29, 2026 · 7 min read

What Is Agentic RAG? A Plain-English Explainer

If you have used a modern "chat with your documents" tool, you have used RAG — even if nobody told you what it was called.

RAG stands for Retrieval-Augmented Generation. It is the technique that lets an AI answer questions about documents it was not trained on. Agentic RAG is an evolution of that technique that produces noticeably better answers for complex questions.

This post explains both, without requiring any AI background.

Anvit's Agentic RAG Pipeline — from User Query through Query Router, Hybrid Retrieval, Relevance Evaluator, LLM Generation with Native Tools, Self-Critique Loop, to Final Response

The problem RAG solves

Language models like Gemma 4 are trained on large amounts of text, but that training has a cutoff date and does not include your documents. You cannot ask a base language model "What does clause 12 of my contract say?" because it has never seen your contract.

The naive solution is to paste the entire document into the model's context. This works for short documents but fails for anything longer — models have context limits, and stuffing a large document into the prompt produces slow, expensive, and often inaccurate responses.

RAG solves this by retrieving only the relevant parts of your document before generation.

How basic RAG works

  1. Chunk: split the document into segments (paragraphs, sections)
  2. Embed: convert each chunk into a vector — a mathematical representation of its meaning
  3. Store: save all vectors in an index alongside the original text
  4. Query: when you ask a question, embed the question and find the chunks with the most similar vectors
  5. Generate: pass those chunks and your question to the language model and generate an answer

This works well for simple, direct questions. "What is the refund policy?" → retrieve the refund section → generate an answer. Clean.

The limitation is step 4. Vector similarity (dense retrieval) can miss chunks that are relevant but use different words to your question. It also struggles with complex questions that require pulling information from multiple parts of a document.

What agentic RAG adds

Agentic RAG treats the retrieval process as an active, multi-step reasoning task rather than a single lookup. The key additions are:

Query routing

Before retrieving anything, the system analyses your question and decides how to handle it. Simple factual questions go through a fast direct retrieval path. Complex, multi-part questions are routed to the full agentic pipeline.

Query decomposition

For complex queries, the system decomposes your question into sub-queries and runs retrieval independently for each. If you ask "Summarise the risks in the executive summary and explain how the mitigation plan in section 4 addresses each one" — that is two separate retrieval tasks. A simple RAG system tries to answer with one retrieval pass and usually fails. An agentic system handles each part separately and synthesises the results.

Hybrid search

Rather than relying only on vector similarity, the retrieval combines two methods:

Both produce ranked candidate lists. Reciprocal Rank Fusion (RRF) merges them into a single ranking. The weight between the two methods shifts dynamically — queries about tables or numbers lean more toward lexical matching, while conceptual questions lean more toward vector similarity.

Corrective RAG (CRAG)

Before generating an answer, the system evaluates whether the retrieved chunks actually contain useful information. Three outcomes are possible:

This corrective step significantly reduces cases where the model invents an answer because its retrieved context was incomplete.

Self-critique

After an answer is generated, the system checks whether it is actually sufficient. If it detects gaps — for example, the answer referenced a section that was not fully retrieved — it generates a targeted follow-up query, retrieves the missing context, and re-generates a more complete answer.

Native tool calling

During generation, the model can autonomously call retrieval tools mid-response. If Gemma 4 determines while writing its answer that it needs a specific section it does not have in context, it can call searchDocuments() or getDocumentSection() to fetch it on the fly. This allows the model to be genuinely self-directed rather than limited to the context it was initially given.

How Anvit implements this on-device

Anvit runs a full agentic RAG pipeline entirely on your Android phone:

Everything runs locally. No query, no document content, and no intermediate reasoning step leaves your device.

When does it actually matter?

Simple factual questions ("What is the deadline?", "Who is the counterparty?") work fine with basic RAG. For these, the agentic pipeline runs fast and the extra steps are invisible.

The difference shows up for:

Think Mode in Anvit surfaces the reasoning steps, so you can see exactly how the model approached a complex question.


RAG went from a research concept to the standard architecture for document AI in a few years. Agentic RAG — with corrective retrieval, self-critique, and tool calling — is the current best practice. Running it on-device removes the last reason to send your documents to the cloud. Try it on Android.

← All posts