// capabilities · retrieval & knowledge layer

Retrieval that holds up at agent scale.

Vector search alone is not a knowledge layer. Production agents need hybrid retrieval, structured grounding, and context budgets that don't blow up the second a document grows past 200 pages.

// principles

What we hold true about retrieval.

/01

Hybrid retrieval by default

BM25 + dense vectors + structured filters, scored together. Pure vector search loses to keyword on entity-heavy queries; pure keyword loses on conceptual queries. We design the merge once and tune the weights per surface.

/02

Document parsing is the long pole

PDFs, call transcripts, regulatory filings, scanned forms — each surface needs its own parsing topology before any embedding step runs. We've seen 60% of retrieval quality come from the parser, not the embeddings.

/03

Context budgets, not context windows

Agents share a context budget across plan, retrieve, and execute. We split the budget per node and enforce hard caps. Otherwise one retrieval node consumes the whole window and starves the executor.

/04

Provenance every chunk

Every retrieved chunk carries doc_id, page, version, and timestamp. When the executor cites a fact, the citation resolves back to the source. Auditable by default — not retrofitted.

/05

Refresh windows, not periodic re-indexes

We tag chunks with a max-age. Stale chunks fall out of retrieval automatically; fresh chunks get priority. Beats re-indexing the whole corpus on a cron schedule, especially for compliance-heavy domains.

// stack we reach for

The default retrieval stack.

We don't lead with a fixed stack — every layer is chosen to match your data and your latency budget. Below is what we reach for first; we swap pieces out when the data demands it.

  • L01Parsingunstructured.io · custom OCR pipelines · LayoutLM for scanned forms
  • L02Chunkingsemantic + structural · domain-tuned splitters · 256–1024 token windows
  • L03Embeddingstext-embedding-3-large · cohere-embed-v3 · domain fine-tunes when needed
  • L04Vector storepgvector · Qdrant · Pinecone — chosen for the workload, not the brand
  • L05RerankerCohere rerank · BGE-reranker · custom cross-encoders for high-stakes surfaces
  • L06Citation layerstructured chunk metadata · resolves doc_id → page → paragraph
// anti-patterns we keep seeing
  • "Just put it in the vector store." Skipping parsing and chunking design is how teams end up with a vector store full of 16-page concatenated PDFs that all retrieve the same thing.
  • Pure-vector retrieval on entity-heavy queries. Hybrid retrieval is not optional once your corpus contains product SKUs, customer IDs, or contract numbers.
  • No provenance. If your executor can't cite the source of a fact, you can't pass a compliance review. We've never seen this added cheaply after the fact.
  • Stale corpus drift. Vector stores happily return chunks from documents that have been superseded twice. Refresh windows or hard age caps are the only fix.
// next

Run a discovery sprint on your retrieval layer.