🧠 Concept Note: Visuospatial Reasoning for Structured Planning Judgement
This design outlines a domain-specific architecture for visuospatial reasoning within The Planner’s Assistant — extending beyond policy text and constraint overlays to interpret the physical, visible, and spatial characteristics of sites and proposals.
The core principle is that spatial and visual understanding must be treated as first-class reasoning inputs, not merely supplemental evidence. Officers rely on proximity, character, setting, and visual cues — therefore the AI must too. Visuospatial reasoning bridges the gap between GIS data, visual media, and interpretive planning judgement.
The system integrates:
- A GISAgent, capable of executing proximity analysis, constraint intersections, buffer zone logic, and visibility inferences from PostGIS using structured spatial queries;
- A VisionLanguageAgent, capable of comparing design imagery with site photos, surrounding context, and design policies using multimodal prompts (image + text);
- A Prompted Reasoning Layer, where each report node — e.g. "heritage impact" or "design character" — issues an Intent requesting spatial or visual interpretation, not just document retrieval;
- A Tool Layer, exposing modular spatial utilities (e.g.
tool_find_nearest_features
,tool_describe_overlap
) that agents can call as functions, enabling progressive spatial computation; - A Multimodal LLM Interface, allowing structured prompts to be issued to models like Gemini 2.5 Pro or GPT-4o, and receiving visually-aware text back;
- A Report Trace Engine, where the output of spatial/visual agents is rendered into interpretable natural language, trace-linked back to source geometry, image, and model prompt.
Rather than reducing spatial inputs to flat labels or checkboxes, this architecture treats them as interpretable reasoning steps within a chain of structured planning logic. The result is officer-style reporting that includes:
- Grounded spatial reasoning (e.g. "within 5m of a listed building")
- Visual evaluation of design character (e.g. "brick tones appear discordant with street context")
- Integration with policy clauses and site metadata
- Traceable evidence pathways for each conclusion
🔒 Scope of Novelty / Prior Art Markers
To prevent enclosure or proprietary claims, the following are published as explicit design features:
Visuospatial Intent Objects Reasoning nodes emit
Intent
objects that request spatial or visual assessments — including geometry inputs, image references, thematic relevance (e.g. "heritage", "character"), and desired output types (e.g. "narrative", "score").Agentic Spatial Reasoning Dedicated agents structure PostGIS queries, interpret geometric relationships, and return interpretable summaries (e.g. "within 15m of Flood Zone 2") rather than raw data.
Vision–Language Interpretation of Proposals Design assessments are not rule-based, but issued as multimodal prompts to vision-capable LLMs. These prompts integrate plans, elevations, site photos, and textual principles.
Modular Tool Layer for Spatial Logic Spatial reasoning tools are exposed as discrete, chainable functions (e.g. distance buffers, overlap ratios, directionality vectors) — enabling both agent use and direct tracing.
Image–Geometry Linking and Metadata Graphs Images are stored with metadata including capture angle, location, and associated features. This enables visual queries like "compare elevation A with frontage context B" or "assess setting of image C with respect to geometry D".
Structured Visual Assessments Outputs from VisionLanguageAgent are returned as structured JSON with components like
design_match_score
,conflicting_materials
, orsetting_clash_comments
, suitable for downstream use in reasoning chains or UI display.Traceable Visual Reasoning in Reports Each visually informed output in a report paragraph is linked to its source image, spatial context, and the model interaction that generated it — allowing planners to inspect or override with confidence.
Integrated Multimodal Planning Reasoning Spatial and visual reasoning are not standalone modules — they are integrated into the procedural structure of the planning report via
NodeProcessor
, acting as evidence nodes in the overall judgement chain.
This architecture supports a shift from rule-matching AI to place-aware, judgment-capable AI — one that sees and reasons about space the way a planner does.
It reflects a belief that good planning is grounded, situated, and interpretable — and so should AI.
License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
You are free to share and adapt this work for non-commercial purposes, provided you give appropriate credit, indicate if changes were made, and distribute any contributions under the same license.