Reverse-engineered from binary

Factory Router

How Factory's Droid CLI routes between 3 models to cut costs 20-25% — with automatic escalation when the cheap model gets stuck.

droid v0.140.0 Mach-O arm64 Bun 1.3.14 58 models in registry
scroll

Three models. That's it.

Despite a 58-model registry, the "auto" router chooses between exactly 3 models — one premium, two cheap. The cost savings come from routing routine tasks away from the expensive model.

Model Cost Relative Tier
claude-opus-4-7
Anthropic · $15/$75 per 1M tokens
2.0x
premium
kimi-k2.6
Moonshot · $0.96/$4.00 per 1M tokens
0.4x
standard
minimax-m2.7
MiniMax · $0.30/$1.20 per 1M tokens
0.1x
standard

Every message goes through e7h

Before each LLM request, a gate function decides which model handles it. On the first turn, it runs the full classifier. On subsequent turns, it uses the cached result — unless something changes.

Fast path

Cache Hit

Most turns. Reuse the model picked on turn 1. No classifier call, zero latency overhead.

Image upgrade

Auto-escalate

If cached model can't handle images but this turn has images → instant upgrade to opus-4-7.

Policy change

Invalidate & Re-route

Org revoked the cached model mid-session → clear cache, run full classifier again.

Safety net

Fallback to Opus

Classifier times out, crashes, or returns no viable candidates → always falls back to opus-4-7.

A cheap LLM picks the model

gpt-5.4-mini (0.3x cost) reads the task, scores each candidate 0–1 on predicted first-attempt success rate, then a deterministic selector picks the cheapest model that clears the 0.7 threshold.

1. Extract Context Signals
Scans conversation for: current message, images, tool call history (last 10), failed tools, turn count, surface type, recent messages (last 6), system info.
2. Build Classifier Prompt
Assembles: scoring rubric + model capability cards (with real eval examples) + org routing guidance + <session> context block. Budget: 16K chars max, 2K head + 2K tail.
3. Call Classifier Model
gpt-5.4-mini scores each candidate. 10s timeout, 2048 max tokens. Returns JSON: {"scores": {"opus": 0.92, "kimi": 0.78, "minimax": 0.71}}
4. Cost-Optimal Selection
Filter candidates ≥ 0.7 → sort by cost ascending → pick cheapest. If none ≥ 0.7 → pick highest score regardless. The cheapest viable model wins, not the best.
5. Lock & Send
Set effectiveFactoryRouterModel, lock provider via x-api-provider header, cache for subsequent turns.

The classifier doesn't guess — it has a cheat sheet

Each candidate model includes a capability card with real benchmark scores. The classifier pattern-matches the task against these examples:

// kimi-k2.6 model card (verbatim from binary) score_examples: "Build a gRPC server with CRUD operations"0.95 // strength "Summarize log files by date range into CSV"0.95 "Recover a secret from rewritten git history"0.80 "Implement a statistical sampling algorithm"0.50 // uncertain "Fix a legacy Java binary format parser"0.15 // weakness "Implement a cryptanalytic attack"0.10 "Write x86-64 assembly for a protocol parser"0.05 // hard fail

When the cheap model thrashes, the system intervenes

Two mechanisms work together: the model is taught to self-recognize failure patterns, and a background scanner watches for thrashing phrases and injects a nudge.

17 stuck phrases scanned in last 5 messages

"different approach"
"another approach"
"try a different"
"try another"
"let me reconsider"
"let me rethink"
"need to rethink"
"actually, wait"
"wait, actually"
"hmm, wait"
"on second thought"
"isn't working"
"that didn't work"
"this doesn't work"
"i'm missing something"
"must be missing"
"step back"

If ≥ 3 matches in the last 5 assistant messages and UpgradeSessionModel hasn't been called recently, a <system-reminder> is injected telling the model to call the upgrade tool.

Passive — Tool Description

Self-awareness instruction

The tool description teaches the model: "Call this when you catch yourself saying 'let me try a different approach' on the 2nd attempt. This is not admitting defeat — it's correct resource allocation."

Active — Phrase Scanner

Forced escalation hint

Background scanner (signals.ts) counts phrases → injects a <system-reminder> nudge → model calls UpgradeSessionModel({}) → one-way upgrade.

The tool is dynamically hidden

When there's no upgrade path (model already at max tier, or target blocked by policy), UpgradeSessionModel is removed from the tool list entirely. The LLM literally cannot see it.

One-way escalation to the apex model

9 of 11 upgrade paths converge on claude-opus-4-7. There is no downgrade. Cross-provider switches trigger conversation compaction.

From To Cost jump
minimax-m2.7 (0.1x) claude-opus-4-7 20x
kimi-k2.6 (0.4x) claude-opus-4-7 5x
claude-sonnet-4-6 (1.2x) claude-opus-4-7 1.67x
gpt-5.4-mini (0.3x) gpt-5.4 3.3x
gemini-3-flash (0.2x) gemini-3.1-pro 4x
Path A — Router Session

Soft switch

Session model stays "auto". Only effectiveFactoryRouterModel changes. No compaction if same provider family. Cached for all future turns.

Path B — Concrete Session

Hard switch

Session model permanently changes. Cross-provider switches (e.g. factory → anthropic) trigger conversation serialization and compaction. Provider lock updates.

A session from cheap to premium

Turn 1 · minimax-m2.7 · 0.1x
User: "fix the typo in README.md" → classifier scores minimax 0.85 → cheapest viable → task completed.
Turns 2–4 · minimax-m2.7 · 0.1x
Cache hits. No classifier calls. Simple follow-ups handled at minimum cost.
Turn 5 · minimax-m2.7 · 0.1x
User asks something harder. Minimax starts struggling: "let me try a different approach"
Turns 6–7 · stuck detection triggers
"actually, wait, that didn't work""let me rethink this" → 3+ stuck phrases detected → <system-reminder> escalation hint injected.
Turn 7 · upgrade → claude-opus-4-7 · 2.0x
Model calls UpgradeSessionModel({}). Cross-provider switch triggers conversation compaction. All subsequent turns use opus.
Turns 8+ · claude-opus-4-7 · 2.0x
Task completed successfully with the premium model. No further routing changes.
6
Turns at 0.1x cost
~20x
Savings on those turns
0.7
Score threshold
10s
Classifier timeout

How this was extracted

The droid binary is a Bun single-executable — the JS application is embedded as literal UTF-8 text. Minified but not obfuscated. Variable names are mangled, but all string constants survive.

# 1. Find a string constant in the binary grep -boa 'FACTORY_ROUTER' droid # → 62886569:FACTORY_ROUTER # 2. Extract surrounding JS at that byte offset dd if=droid bs=1 skip=62886400 count=500 | strings -n 5 # → IR.FACTORY_ROUTER="auto"})(AE||={}); # → ((H)=>{H.Concrete="concrete";H.Router="router"})(yMT||={}); # 3. Sourcemap paths reveal original file structure strings droid | grep 'model-router/' | sort -u # → packages/droid-core/src/model-router/router.ts # → packages/droid-core/src/model-router/signals.ts # → packages/droid-core/src/model-router/selector.ts
Preserved

What we can see

String literals, enum values, error messages, log statements, metric names, JSON schemas, prompt text, file paths.

Destroyed

What's lost

Variable names (mangled to T, R, H...), type annotations, comments, formatting. Sourcemaps have empty "names":[] arrays.

Reconstructed

What we inferred

Function purposes (from log messages), data flow (from error strings), architecture (from sourcemap paths + metric names).