🧠 Concept Note: Agentic Retrieval for Structured, Multimodal Reasoning in Planning AI

22 May, 2025

This design sets out a domain-specific architecture for agentic retrieval, built to support structured, explainable, and spatially grounded reasoning within the discretionary planning system.

The core principle is that retrieval is directed by the reasoning task itself, not by semantic similarity alone. Each stage of the planning judgement process — e.g. “heritage impact,” “residential amenity,” or “transport access” — is treated as an explicit reasoning node. Each node issues a formal Intent: a declarative request for context, shaped by policy themes, spatial overlays, precedent patterns, and available visual materials. This Intent can recursively request further enrichment until predefined coverage criteria are met.

The system integrates:

A Master Reasoning Model, implemented as a stateful LLM that controls the flow of judgement stages, evaluates sufficiency, and ultimately synthesises the final officer-style report based on structured intermediate reasoning outputs;
A Reasoning Graph, defining the procedural structure of officer reports as a finite-state decision space;
Agentic Retrieval, which adapts dynamically to the needs of each reasoning node, issuing structured queries (e.g. policy('design', [local, national]), image('elevation'), or appeal('amenity', 500m)), and assessing sufficiency against task-specific coverage rules;
An Enrichment Loop, which iteratively expands and refines the context window until the reasoning model confirms that its evidence base is complete;
Subsidiary Agents, each representing a specialised mode of reasoning (e.g. visual assessment, quantitative policy compliance, precedent clustering), capable of issuing their own sub-reports, performing cross-validation, or submitting summary analyses to the master model for integration;
Multimodal Retrieval, drawing not only from structured databases and embedded text, but also from geo-anchored photographs, street-view imagery, aerial overlays, elevation drawings, and application plans;
Vision–Language Integration, using CLIP-style embeddings, GPT-4o-class VLMs, or the current state of the art in multimodal models to interpret visual context, compare proposals against existing site conditions, and evaluate alignment with policy-led design expectations;
Hybrid Search, blending keyword-based retrieval (e.g. BM25 or equivalent) with semantic vector search (e.g. Instructor-XL or any similar embedding model), ontology-based filtering, and spatial joins via PostGIS;
Optional Web Search Modules, triggered when internal coverage is lacking — e.g. to source SPDs, historic guidance, or third-party data via web(keyword, domain?) queries;
Dynamic Prompt Generation, rendering task-specific prompts for each LLM or VLM call, incorporating structured citation formats, token-budgeted text+image bundles, and reasoning instructions tailored to the judgement stage;
A Fully Traceable Explainability Stack, where every output element — whether a paragraph or a visual evaluation — is trace-linked through its retrieval query, source object, and model interaction.

Rather than collapsing all evidence into a single embedding space, the system separates retrieval logic, reasoning logic, and multimodal synthesis. Retrieval is progressive and functional; reasoning is introspective and stateful; synthesis is auditable, citation-heavy, and modular.

This architecture enables planning AI to go beyond search, summary, or static templates — supporting defensible, context-aware reasoning that can be interrogated, overridden, and improved by human officers. It may be used as a tool for structured discretion, or — in specific, bounded scenarios — as a system of partial automation.

🔒 Scope of Novelty / Prior Art Markers

To defend the openness of this concept against proprietary claims, the following design features are published as explicit prior art:

1. Declarative Retrieval Contracts

Reasoning nodes emit structured Intent objects, specifying the type, theme, scope, and geometry of documents needed. This formal contract is inspectable, testable, and versioned.

2. Reasoning Satisfaction Tests

Each node declares itself satisfied only after coverage tests are passed — e.g. policy tiers, constraint types, precedent count. These thresholds may be rule-based or heuristic. The key innovation is that sufficiency is evaluated as part of the reasoning process itself, using predefined or dynamic criteria relevant to each judgement stage.

3. Multimodal Substitution Logic

When key context types are unavailable (e.g. outdated photos), the system actively substitutes with appeals, elevations, or web-sourced imagery via fallback functions (e.g. web_image()).

4. Hybrid Search as Default Composition

Retrieval is always composed from multiple modalities:

Keyword-based methods (e.g. BM25 or any term-weighted alternative)
Semantic vector similarity (e.g. Instructor-XL or equivalent)
Ontology filters (e.g. theme: design, tier: local)
Spatial queries (e.g. within 250m, intersects Green Belt)

Specific tools may vary; the architectural commitment is to hybrid, constraint-aware retrieval as default logic.

5. Explainability Graph Format

All outputs participate in a provenance DAG (directed acyclic graph) connecting:

source_text/image → retrieval_call → prompt_id → model_output_section

This trace is queryable and forms the backbone of the audit log.

6. Traceable Prompt Assembly

Prompt generation is treated as a first-class, logged operation. Each prompt is linked to its reasoning node, context bundle, and eventual model output via prompt_id. This includes multimodal inputs and token prioritisation strategies.

7. Delegated Reasoning via Subsidiary Agents

In complex reasoning stages, the master model may invoke subsidiary agents capable of specialised tasks (e.g. visual assessment, appeal synthesis, viability modelling). These agents may issue their own reasoning traces, perform validation checks, or submit structured summaries to be integrated into the final decision output.

8. Stateful LLM-Controlled Reasoning Chain

The master reasoning model is not a fixed rule engine but a stateful language model capable of interpreting intents, evaluating sufficiency, integrating subordinate responses, and composing the final report with full transparency and traceability.

Together, these features define a domain-specific architecture for structured reasoning with planning data — including policies, site constraints, applications, images, and precedent decisions. Any system claiming novelty in task-directed retrieval, contextual prompt assembly, or structured multimodal reasoning for planning should be tested against this public declaration.

License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

You are free to share and adapt this work for non-commercial purposes, provided you give appropriate credit, indicate if changes were made, and distribute any contributions under the same license.