Hybrid retrieval by default
BM25 + dense vectors + structured filters, scored together. Pure vector search loses to keyword on entity-heavy queries; pure keyword loses on conceptual queries. We design the merge once and tune the weights per surface.
Vector search alone is not a knowledge layer. Production agents need hybrid retrieval, structured grounding, and context budgets that don't blow up the second a document grows past 200 pages.
BM25 + dense vectors + structured filters, scored together. Pure vector search loses to keyword on entity-heavy queries; pure keyword loses on conceptual queries. We design the merge once and tune the weights per surface.
PDFs, call transcripts, regulatory filings, scanned forms — each surface needs its own parsing topology before any embedding step runs. We've seen 60% of retrieval quality come from the parser, not the embeddings.
Agents share a context budget across plan, retrieve, and execute. We split the budget per node and enforce hard caps. Otherwise one retrieval node consumes the whole window and starves the executor.
Every retrieved chunk carries doc_id, page, version, and timestamp. When the executor cites a fact, the citation resolves back to the source. Auditable by default — not retrofitted.
We tag chunks with a max-age. Stale chunks fall out of retrieval automatically; fresh chunks get priority. Beats re-indexing the whole corpus on a cron schedule, especially for compliance-heavy domains.
We don't lead with a fixed stack — every layer is chosen to match your data and your latency budget. Below is what we reach for first; we swap pieces out when the data demands it.
Run a discovery sprint on your retrieval layer.