// capabilities · retrieval & knowledge layer

Retrieval that holds up at agent scale.

Vector search alone is not a knowledge layer. Production agents need hybrid retrieval, structured grounding, and context budgets that don't blow up the second a document grows past 200 pages.

// principles

What we hold true about retrieval.

/01

Hybrid retrieval by default

BM25 + dense vectors + structured filters, scored together. Pure vector search loses to keyword on entity-heavy queries; pure keyword loses on conceptual queries. We design the merge once and tune the weights per surface.

/02

Document parsing is the long pole

PDFs, call transcripts, regulatory filings, scanned forms — each surface needs its own parsing topology before any embedding step runs. We've seen 60% of retrieval quality come from the parser, not the embeddings.

/03

Context budgets, not context windows

Agents share a context budget across plan, retrieve, and execute. We split the budget per node and enforce hard caps. Otherwise one retrieval node consumes the whole window and starves the executor.

/04

Provenance every chunk

Every retrieved chunk carries doc_id, page, version, and timestamp. When the executor cites a fact, the citation resolves back to the source. Auditable by default — not retrofitted.

/05

Refresh windows, not periodic re-indexes

We tag chunks with a max-age. Stale chunks fall out of retrieval automatically; fresh chunks get priority. Beats re-indexing the whole corpus on a cron schedule, especially for compliance-heavy domains.

// stack we reach for

The default retrieval stack.

We don't lead with a fixed stack — every layer is chosen to match your data and your latency budget. Below is what we reach for first; we swap pieces out when the data demands it.

L01Parsingunstructured.io · custom OCR pipelines · LayoutLM for scanned forms
L02Chunkingsemantic + structural · domain-tuned splitters · 256–1024 token windows
L03Embeddingstext-embedding-3-large · cohere-embed-v3 · domain fine-tunes when needed
L04Vector storepgvector · Qdrant · Pinecone — chosen for the workload, not the brand
L05RerankerCohere rerank · BGE-reranker · custom cross-encoders for high-stakes surfaces
L06Citation layerstructured chunk metadata · resolves doc_id → page → paragraph

// anti-patterns we keep seeing

"Just put it in the vector store." Skipping parsing and chunking design is how teams end up with a vector store full of 16-page concatenated PDFs that all retrieve the same thing.
Pure-vector retrieval on entity-heavy queries. Hybrid retrieval is not optional once your corpus contains product SKUs, customer IDs, or contract numbers.
No provenance. If your executor can't cite the source of a fact, you can't pass a compliance review. We've never seen this added cheaply after the fact.
Stale corpus drift. Vector stores happily return chunks from documents that have been superseded twice. Refresh windows or hard age caps are the only fix.

// next

Run a discovery sprint on your retrieval layer.

Discovery format Book a session