Material Considerations

Devlog #7: Slowly But Surely

It’s been one of those stretches where I feel like I’m spinning plates — but when I take a step back, the threads line up pretty neatly. I’ve been stress-testing The Planner’s Assistant against a real major application, building up local AI capacity, and exploring how to ground models in planning theory. All of it points to the same idea: if planning AI is going to stick, it has to be done properly.


Stress-testing on Earl’s Court

The Earl’s Court application (~3.3GB of PDFs, 147 documents) has become my benchmark. It’s not an abstract dataset; it’s the kind of oversized, messy case officers know all too well. Running my codebase against it has been equal parts frustrating and illuminating.

Stress-testing like this is slow and occasionally demoralising, but it’s also the only way to build tools that survive contact with reality. Earl’s Court is exposing the cracks that actually matter.


Local AI as Infrastructure

Those tests hammered home another point: local inference isn’t optional. On my setup the NPU (48 TOPS) maxed out, the GPU strained at the edges, and still it ground through. If I had to rely on API calls for jobs like this, costs would spiral and stability would suffer.

That’s why I’ve treated new hardware as an investment, not a luxury. It lets me:

Some people upgrade wardrobes; I upgrade hardware. The humour aside, the serious point is: local capacity is what makes planning AI practical and sustainable.


Grounding in Planning Theory

But hardware alone isn’t enough. The bigger intellectual thread is how to make models reason credibly. That’s where distillation comes in: feeding long-context models the NPPF, PINS precedents, and the core planning texts. Not site-specific policies — too narrow, too biased — but the backbone of UK planning itself.

This isn’t only about cutting down hallucinations. It’s about:

Hardware and methodology meet here. The compute makes the distillation experiments possible. The distillation makes the compute meaningful.


Why Fine-Tuning Matters More Than Wrappers

It’s easy to think a Retrieval-Augmented Generation (RAG) system wrapped around a chatbot is “good enough.” Fetch some snippets, let the model summarise, job done. But that’s brittle, and it risks souring planning AI’s reputation before it even begins. Fine-tuning on planning theory is a slower but far more credible path:

Depth vs. Recall

Consistency

Professional Credibility

Sustainability

Wrappers make quick demos. Fine-tuning makes credible tools. If the goal is professional adoption, methodology is the product.


One Thread

So while it might look like I’m juggling different things — pipelines, hardware tinkering, distillation — it’s really one continuous arc. Testing against live applications. Building the capacity to run it locally. Grounding models so they reason like planners rather than autocomplete engines.

The pace is slow by design. That’s how to make sure planning AI is robust, transparent, and worth taking seriously.


Next Steps