Is Anvit really fully offline?

Yes. After the AI model downloads once, Anvit needs no internet connection — it even works in airplane mode. No connection is required to use it.

Where do my documents and chats go?

Nowhere. Your PDFs, Word files, questions and chat history never leave your device. There is no account, no server, and no analytics.

What file types can I chat with?

Both PDF and Word (.docx) documents. Organize them into collections and ask questions across one library or all of them.

What AI model powers Anvit?

Google's open Gemma 4 model (E2B or E4B) runs on-device via LiteRT-LM, with EmbeddingGemma or Gecko handling document search.

Can I trust the answers?

Every answer cites the exact source passages it used, and Think Mode lets you watch the model reason step by step — so you can verify, not just trust.

How does Anvit read and index my documents?

When you add a document, Anvit parses its structure — headings, sections, tables, and lists — then splits it into section-aware chunks of up to 8,000 tokens each. Tables are kept intact as structured chunks so data rows aren't mixed with prose. Repeated page furniture like page numbers and company footers is stripped out. Each chunk remembers the section heading it came from, so when you ask a question Anvit can retrieve the exact passage and tell you precisely where in the document it found the answer.

Is there an iOS version?

Anvit is live on Android today. An iOS version is in active development — add your email on the site and you'll be the first to know when it launches.

Hierarchical Chunking vs Semantic Chunking — Why Structure Matters for Document AI

Before a language model can answer questions about your documents, those documents have to be split into pieces — chunks — that fit inside the model's context window. The strategy used to create those chunks has a larger effect on answer quality than almost any other design decision in a document AI system.

There are two dominant approaches: semantic chunking and hierarchical chunking. They produce very different results on real business documents.

How semantic chunking works

Semantic chunking splits text based on content boundaries: sentence endings, paragraph breaks, or in more sophisticated implementations, embedding-based similarity between adjacent sentences. The idea is to keep semantically related sentences together in the same chunk.

The input is a stream of text. The output is a flat list of chunks, each with a character or token count and a similarity score between adjacent spans. No document structure is considered.

It works reasonably well for continuous prose — think articles or essays, where content flows linearly and each paragraph contains a standalone idea. It falls short for the kinds of structured documents people actually need to query: contracts, financial reports, technical specifications, policy documents.

What goes wrong with semantic chunking on structured documents

Consider a financial report with this structure:

Revenue Overview
  Product Revenue
    Q1 Results ← table: 12 rows, 4 columns
    Q2 Results ← table: 12 rows, 4 columns
  Service Revenue
    Annual Comparison ← table: 8 rows, 6 columns

A semantic chunker operating on the raw text stream has no knowledge of this structure. It sees a sequence of characters. Its chunks might:

Split a table mid-row because a token boundary fell in the middle of a data row
Merge unrelated sections because two adjacent paragraphs from different sections happen to be semantically similar
Strip all context from a chunk — a chunk containing the number 4,821 has no way to record that it came from the "Q1 Product Revenue" table in the "Revenue Overview" chapter

When that detached number gets retrieved and passed to the language model, the model has to guess at its meaning. Frequently it guesses wrong, or hedges, or hallucinates a plausible-sounding context.

How hierarchical chunking works

Hierarchical chunking treats the document as a tree before it produces any chunks. The first pass builds a section tree from the document's own structural signals. The second pass walks that tree and emits chunks that are bounded by section boundaries — never crossing them.

Every chunk produced carries the full path from the document root down to the section it came from. A number from the Q1 revenue table does not arrive at the model alone. It arrives with the path ["Revenue Overview", "Product Revenue", "Q1 Results"] attached.

How Anvit builds the section tree

Anvit's document pipeline works in two stages.

Stage 1: Parse → SectionNode tree

For PDFs, Anvit uses PDFBox to extract every text run with its rendered font size and position. Before classifying any headings, it samples the first 50 text runs and computes the median font size as the document's baseFontSize. Heading level is then determined by how far a run's font size exceeds that baseline:

More than 6 pt above baseline → Heading level 1
More than 3 pt above baseline → Heading level 2
Bold or more than 1 pt above baseline, with fewer than 120 characters → Heading level 3

Each new heading pushes a SectionNode onto a stack, and existing nodes at the same or deeper level are popped first. The result is a proper section tree that mirrors the visual hierarchy a human reader would recognise.

For Word documents, Anvit uses Apache POI and reads the paragraph's Word style name directly. Paragraphs with style Heading1 become level 1 nodes, Heading2 become level 2, down to Heading4. No font size inference needed — the document's own style system is the authoritative source.

Tables are detected separately. In PDFs, a row of text runs where adjacent runs have a gap wider than 15 points is treated as a table row. Two or more such rows in sequence become a Block.Table. In Word documents, XWPFTable elements are parsed directly. Either way, tables are kept intact and attached to the section they belong to.

Lists are detected from bullet characters (•, ‣, *, -) and numbered prefixes, or from Word's numID field. They are accumulated and flushed as a Block.ListBlock when the list ends.

Stage 2: SectionNode tree → DocumentChunk list

The HierarchicalChunker traverses the section tree and produces the final chunks. Each chunk carries a hierarchyPath — a list of heading strings from the document root to the section the chunk came from.

Key behaviours during this traversal:

Tables always become their own chunks. A table is never merged with surrounding text. If a table exceeds the token limit (8,000 tokens by default), it is split into row groups, each prefixed with the original column headers so the model can always read a row in context.
Large lists get their own grouped chunks, linked by a shared groupId, so the retrieval system can fetch all parts of a list when any one item is relevant.
Plain text blocks within the same section are accumulated up to the token limit, then flushed. Adjacent text chunks from the same section that are individually small are merged up to 1,500 tokens — avoiding the fragmentation problem where a short paragraph becomes an isolated chunk with too little context to be useful.
Empty heading nodes (a heading with no body text and no children) are recovered as plain text content under the parent section rather than becoming orphaned empty chunks.

What the model receives

With semantic chunking, retrieval returns a flat string. The model sees text, with no information about where in the document it came from.

With hierarchical chunking, retrieval returns a chunk plus its full path. For the same retrieved passage, the model sees:

[Revenue Overview > Product Revenue > Q1 Results]

Quarter	Region	Revenue	Growth
Q1 2024	North	4,821	+12%
Q1 2024	South	3,104	+8%
...	...	...	...

The section path is passed in the retrieval context. The model knows it is reading a data table from the Q1 results subsection of product revenue, not a random number from an unknown part of the document.

Where the difference shows up

For simple lookups ("What is the CEO's name?", "What is the deadline in clause 4?"), both approaches often produce identical answers. The heading context does not change much when the answer is an isolated fact.

The gap opens on four query types:

1. Section-scoped questions. "Summarise the risks in section 3." Semantic chunking may retrieve paragraphs from multiple unrelated sections that happen to use the word "risk". Hierarchical chunking retrieves only chunks whose path includes section 3.

2. Table questions. "What was product revenue in Q1?" Semantic chunking may split the table or strip the column headers. Hierarchical chunking returns the full intact table with its headers, inside the correct section context.

3. Comparison questions. "How do the Q1 and Q2 results differ?" These require pulling two separate sections. With flat chunks, the model cannot reliably distinguish a Q1 chunk from a Q2 chunk if both contain similar numbers. With hierarchy paths, the sections are unambiguous.

4. "According to section X" questions. Any question that references a specific section by name requires the retrieval system to know which chunks belong to that section. Hierarchical chunking makes this exact: the path is stored with each chunk and can be filtered directly.

The tradeoff

Hierarchical chunking requires a parsing stage that semantic chunking skips. The parser has to understand the document's structural signals — font sizes, style names, table geometry. This is harder to implement and produces imperfect results on some documents (heavily scanned PDFs, files without heading styles, documents with unusual formatting).

Semantic chunking is more robust to unusual inputs because it operates only on text, not on structure. It is also simpler to implement and can work on any plain text.

For documents with clear structure — which describes the majority of business documents — hierarchical chunking produces substantially better retrieval. For unstructured text where no heading hierarchy exists, the two approaches converge.

Anvit uses hierarchical chunking for PDFs and Word documents because the documents people most need to query privately — contracts, reports, policies, specifications — are almost always structured. The parsing cost is paid once, at index time, on-device.

The chunking strategy is invisible to users but shapes every answer the model produces. See it in action on Android.