Why Off-the-Shelf AI Tools Fall Short in Planning

01 Sep, 2025

When new “AI for planning” demos appear, it’s tempting to believe the problem is solved: upload a Local Plan, ask a question, get an instant answer. To a casual observer, that looks revolutionary. But the reality is more complicated. Planning is not just about retrieving text; it is about judgement, balance, and navigating a tangle of exceptions, caveats, and competing aims.

What makes planning challenging for AI

Planning documents are structured in ways that make sense to officers, inspectors, and practitioners — but not necessarily to a machine:

Not everything carries equal weight. Some statements are decisive, while others are supporting or contextual.
Policies interact. One policy rarely applies in isolation. They cross-reference, contradict, or condition each other in subtle ways.
Balance is everything. A decision often rests not on one policy but on how competing aims are weighed against each other.

Generic language models can repeat text back, but they don’t instinctively understand these dynamics. Without careful structuring, they risk treating every paragraph as equally important, or offering confident but shallow answers.

The quirks of language models

Large language models are pattern recognisers. They can appear fluent, but left to themselves:

They can invent details when the source material runs out (so-called hallucinations).
They struggle to hold the full span of long documents in mind.
They don’t know which kinds of reasoning are legally or procedurally expected unless that structure is built around them.

The raw model is an engine, but it needs a gearbox, steering, and brakes to function in planning. That supporting structure — sometimes called scaffolding or context engineering — is what separates a useful planning tool from a novelty demo.

Why wrappers aren’t enough

A number of emerging products are little more than wrappers: a search engine that fetches policy text, then feeds it into a model like ChatGPT for summarising. That may look slick in a demo, but it falls apart under scrutiny:

They rely on a generic model. If the underlying system has never been trained on planning concepts, it cannot recognise what makes a paragraph decisive or marginal. You’re effectively asking a general-purpose chatbot to improvise as if it were a planning officer.
They can’t reason across conflicts. Wrappers may retrieve Policy A and Policy B, but they won’t show how to weigh them against each other. That’s the core of planning, and it gets lost.
They lack durability. Policies change. If the wrapper is just piping in raw text, it won’t adapt gracefully. Yesterday’s retrievals quickly become misleading.
They encourage shallow trust. Because the answers look fluent, it’s easy to mistake surface polish for real reasoning. But beneath the wrapper, there’s often nothing more than word-search plus paraphrase.

In short: wrappers repackage a generic model, but they don’t change its limits. Without re-engineering the underlying model and surrounding it with proper scaffolding, the system cannot rise to the level of planning reasoning.

What planners and councils should be asking

Procurement teams and officers should cut through the marketing by asking:

Which model underpins this system — and has it been trained on planning law, policy, or casework?
How does the system adapt when policies are updated or replaced?
Can it show how it weighed one policy against another, rather than just listing them?
Does it distinguish between supporting commentary and decisive text?
How are precedents used — and how does the system avoid being trapped by outdated ones?
What safeguards are in place against hallucinations or overconfident errors?
Can the outputs be audited, and would they stand up to scrutiny in committee or appeal?

If the vendor can’t answer clearly, the product is probably just a wrapper — and you should treat its results with caution.

A better way forward

AI can support planning, but only if designed with the system’s complexity in mind:

Transparency. Show sources, reasoning, and limits clearly.
Faithful scaffolding. Build in the structures that reflect cross-references, procedural expectations, and the need to balance competing aims.
Domain grounding. Models need to be trained or fine-tuned on the substance of planning theory and practice. Without this grounding, they risk importing generic reasoning patterns that miss the nuances of the system.
Augmentation, not replacement. These tools should amplify officers’ ability to navigate complexity, not offer glib shortcuts.

Fine-tuning matters because it teaches a model the distinctive “rules of the game” in planning: what counts as significant, how to balance policy trade-offs, and how inspectors and officers frame their reasoning. Generic models may know a lot about language, but without this specialist grounding they will stumble over the very points that make planning decisions defensible.

An example: policy conflicts in practice

Consider a housing application in an area of designated green belt. A generic model might simply quote back the restrictive wording of green belt policy, or, conversely, cite the local housing target. A model fine-tuned on planning practice would recognise the need to weigh these policies together: to test whether “very special circumstances” apply, to consider precedent from appeal decisions, and to show how inspectors typically reason through such conflicts. But crucially, it would also be designed to adapt when the policy framework changes, rather than locking in yesterday’s precedents. That shift — from static retrieval to dynamic, context-aware reasoning — is what makes the difference between a useful assistant and an unreliable shortcut.

Done right, AI can free planners from repetitive work and help surface key issues faster. Done wrong, it risks undermining trust before the technology has even had a chance to prove its worth.