Visual Agentic System for Spatial Metric Query Answering in Remote Sensing Images

Loading...
Thumbnail Image
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
Accurately measuring real-world object dimensions from Remote Sensing (RS) images is crucial for applications in geospatial analysis and urban planning. Traditional Vision-Language Models (VLMs) struggle with spatial reasoning, while end-to-end remote sensing VLMs are often limited to predefined tasks such as image captioning. In this paper, we propose a visual agentic system for spatial metric query answering, dynamically integrating code-generation agents with a grounded remote sensing VLM and a Vision Specialist. Our system autonomously identifies reference objects, infers scale factors, and performs spatial measurements through structured subroutines. Experiments demonstrate that our approach achieves higher accuracy in footprint area estimation compared to state-of-the-art large language models with vision capabilities.
Description

CCS Concepts: Computing methodologies → Scene Understanding; Image Segmentation; Object Identification

        
@inproceedings{
10.2312:egp.20251028
, booktitle = {
Eurographics 2025 - Posters
}, editor = {
Günther, Tobias
and
Montazeri, Zahra
}, title = {{
Visual Agentic System for Spatial Metric Query Answering in Remote Sensing Images
}}, author = {
Wang, Yinghao
and
Wang, Cheng
}, year = {
2025
}, publisher = {
The Eurographics Association
}, ISSN = {
1017-4656
}, ISBN = {
978-3-03868-269-1
}, DOI = {
10.2312/egp.20251028
} }
Citation