How Factory's Droid CLI routes between 3 models to cut costs 20-25% — with automatic escalation when the cheap model gets stuck.
Despite a 58-model registry, the "auto" router chooses between exactly 3 models — one premium, two cheap. The cost savings come from routing routine tasks away from the expensive model.
e7hBefore each LLM request, a gate function decides which model handles it. On the first turn, it runs the full classifier. On subsequent turns, it uses the cached result — unless something changes.
Most turns. Reuse the model picked on turn 1. No classifier call, zero latency overhead.
If cached model can't handle images but this turn has images → instant upgrade to opus-4-7.
Org revoked the cached model mid-session → clear cache, run full classifier again.
Classifier times out, crashes, or returns no viable candidates → always falls back to opus-4-7.
gpt-5.4-mini (0.3x cost) reads the task, scores each candidate 0–1 on predicted first-attempt success rate, then a deterministic selector picks the cheapest model that clears the 0.7 threshold.
<session> context block. Budget: 16K chars max, 2K head + 2K tail.gpt-5.4-mini scores each candidate. 10s timeout, 2048 max tokens. Returns JSON: {"scores": {"opus": 0.92, "kimi": 0.78, "minimax": 0.71}}effectiveFactoryRouterModel, lock provider via x-api-provider header, cache for subsequent turns.Each candidate model includes a capability card with real benchmark scores. The classifier pattern-matches the task against these examples:
Two mechanisms work together: the model is taught to self-recognize failure patterns, and a background scanner watches for thrashing phrases and injects a nudge.
If ≥ 3 matches in the last 5 assistant messages and UpgradeSessionModel hasn't been called recently, a <system-reminder> is injected telling the model to call the upgrade tool.
The tool description teaches the model: "Call this when you catch yourself saying 'let me try a different approach' on the 2nd attempt. This is not admitting defeat — it's correct resource allocation."
Background scanner (signals.ts) counts phrases → injects a <system-reminder> nudge → model calls UpgradeSessionModel({}) → one-way upgrade.
When there's no upgrade path (model already at max tier, or target blocked by policy), UpgradeSessionModel is removed from the tool list entirely. The LLM literally cannot see it.
9 of 11 upgrade paths converge on claude-opus-4-7. There is no downgrade. Cross-provider switches trigger conversation compaction.
Session model stays "auto". Only effectiveFactoryRouterModel changes. No compaction if same provider family. Cached for all future turns.
Session model permanently changes. Cross-provider switches (e.g. factory → anthropic) trigger conversation serialization and compaction. Provider lock updates.
<system-reminder> escalation hint injected.UpgradeSessionModel({}). Cross-provider switch triggers conversation compaction. All subsequent turns use opus.The droid binary is a Bun single-executable — the JS application is embedded as literal UTF-8 text. Minified but not obfuscated. Variable names are mangled, but all string constants survive.
String literals, enum values, error messages, log statements, metric names, JSON schemas, prompt text, file paths.
Variable names (mangled to T, R, H...), type annotations, comments, formatting. Sourcemaps have empty "names":[] arrays.
Function purposes (from log messages), data flow (from error strings), architecture (from sourcemap paths + metric names).