Routing the models, fixing the floor

03/05/2026 · Field Notes

Most of the work that matters is invisible. A note on building a model routing system, learning to live in the MD files, and treating breakdown as design feedback.

A respectable share of the studio's work this week happened in files most users will never open.

The job, on paper, was to fix the local models. The job, in practice, was to build a routing system: a structural decision about which model handles which class of request, with fallbacks for rate limits, model-specific quirks, and the difference between a query that needs a frontier model and one that does not.

Three more local models came online. The architecture now has graceful fallbacks. Token spend is meaningfully lower because the cheaper and locally hosted options pick up the requests they are good enough for, leaving the frontier models for the work that genuinely needs them.

None of this is visible from the outside. All of it is structurally significant.

The more interesting shift this week was in my relationship to the system's MD files. I had been treating them as configuration — things to be edited reluctantly when something broke. I am beginning to see them, instead, as the control surface of the studio. The MD files are where the system's behaviour is described, where its corrections live, where its self-improvement loops attach. To live in the MD files is to live in the source of the system's behaviour, not its symptoms.

I was, frankly, intimidated by them. I am no longer.

A few things I want to mark for the record.

Patience as system optimisation. The more I understand how the agent infrastructure functions, the easier it becomes to read breakdown not as failure but as design feedback. A failed call is information about the routing logic. A wrong response is information about the prompt scaffolding. Frustration is the early warning that the system has not yet absorbed a class of input. Patience, here, is not virtue. It is technique.

Continuous-improvement loops at the file level. The architecture now feeds its own corrections back into the MD layer, which then changes how the next call is shaped. This is the slow compounding work that makes a system go from useful to indispensable.

Invisible work is the work. Solo operators in particular have a habit of confusing visible output with progress. A week spent in the MD files produces nothing that photographs well. It also is the difference between a system that survives next quarter and one that does not.

I keep coming back to a line: the structure of your tools encodes the structure of your thinking. When I work in the MD files, what I am really doing is editing the conditions of my own future attention. That is not configuration. That is craft.

The MD files are not the configuration. They are fossilised thought.