We've shipped 14 LangGraph topologies into production over the last 12 months. Four patterns showed up everywhere; one of them only worked in dev. This is the field-tested set — what we now reach for by default, and the one we removed from the playbook.
1. Plan-then-Act with a bounded retry edge
Every production engagement we've run has converged on this shape. A planner agent emits a typed plan; an executor steps through it; failures route back to the planner via an explicit retry edge with a hard cap (we use n=2). Beyond the cap, the graph escalates to a designated terminal node — never throws.
┌──────────┐ ok ┌──────────┐
│ planner ├─────────▶│ executor │─────▶ done
└──────────┘ └────┬─────┘
▲ │ err
│ ▼
└────── retry ───────┤ n ≤ 2
│
▼
┌──────────┐
│ escalate │
└──────────┘Why bounded matters: random LLM failures are correlated by intent class. Without a hard cap you either burn the token budget on an infinite retry loop or you end up hand-bombing your alerting at 3am. Bounded retry plus an explicit escalation node gives you a single observable place where the topology gives up — which is exactly the thing your on-call wants.
2. Guarded fanout
When latency matters and you have independent sub-tasks (research, retrieval, policy lookup), run them in parallel and merge through a deterministic guardrail node. The merge is not a summarizer — it validates.
graph.add_node("research", research_agent)
graph.add_node("retrieval", retrieval_agent)
graph.add_node("policy_lookup", policy_agent)
graph.add_node("guardrail_merge", deterministic_merge) # no LLM
graph.add_edge("plan", "research")
graph.add_edge("plan", "retrieval")
graph.add_edge("plan", "policy_lookup")
graph.add_edge("research", "guardrail_merge")
graph.add_edge("retrieval", "guardrail_merge")
graph.add_edge("policy_lookup", "guardrail_merge")
graph.add_edge("guardrail_merge", "executor")The guardrail_merge node is the load-bearing piece. It checks that every parent has produced output, that no output violates the policy library, and that the merged context fits within the executor's input window. If any of those fail, it routes to escalate — never to executor. This gives you the speed of fanout with the safety guarantees of sequential.
3. Reviewer-as-Edge
Human approval is a graph node with the same shape as every other node: typed input, typed output, timeout, SLA. It is not a webhook callback. It is not an out-of-band Slack message. It is not a queue your ops team checks once a day.
We see teams reach for callbacks because they're easier to wire up. The cost shows up six weeks later when something breaks and nobody can replay what the reviewer saw — because the input was lost, the timeout was implicit, and the state was held in a queue table that's already been garbage-collected.
4. Replay-safe trace
Every edge produces a single idempotent trace event keyed by (edge_id, input_hash). Trace events live in your observability stack, not in graph state. When an engagement breaks at 3am, replay-from-trace beats reconstruct-from-logs by an order of magnitude.
- edge_id — stable identifier for the edge in the topology
- input_hash — sha256 of the deterministic input bundle
- agent_id + agent_version — what produced the output
- policy_versions — which guardrail policies were live at the time
- elapsed_ms — wall clock, not just billed tokens
When the audit team asks why claim #29481 routed to fast-track six weeks after the fact, you point at the trace. If you can't reconstruct the exact decision path from the trace alone, you don't have replay safety — you have logs.
Anti-pattern: dynamic agent spawning
We tried this twice. In dev it looked elegant: an intent classifier produces a label, and the graph spawns a specialized agent for that label. In prod it gave us O(intent_classes) agents to debug, version, and observe — and the long tail of rare intents produced agents that ran maybe twice a quarter, well below the threshold for keeping their prompts current.
What works instead: a small fixed set of agents (we land between three and six), each with a wide enough tool scope to handle their class of work, and a policy library that's authored separately from the agent prompts. Versioning the policies is easier than versioning prompts; the agents stay stable; the audit story stays coherent.
Reaching for the set
Most engagements use three of the four: plan-then-act, replay-safe trace, and either guarded fanout or reviewer-as-edge depending on whether the constraint is latency or compliance. If you're building agentic systems and reaching for something exotic, ask first whether one of these four already covers it. They usually do.
Introducing guardrail-eval-bench: a deterministic eval set for prompt-injection detectors.