News

GeoSkill: An Evolving Skill-Graph Framework for Enhanced Visual Geolocation in Vision-Language Models

GeoSkill: An Evolving Skill-Graph Framework for Enhanced Visual Geolocation in Vision-Language Models

Vision-language models (VLMs) have demonstrated promising capabilities in image geolocation tasks. However, a significant limitation persists: their lack of structured geographic reasoning and the capacity for autonomous self-evolution. Current methodologies predominantly rely on implicit parametric memory, which often leads to the exploitation of outdated knowledge and the generation of hallucinated reasoning. Furthermore, the inference process in existing systems is typically "one-off," lacking the essential feedback loops required for self-evolution based on reasoning outcomes.

To address these critical issues, a training-free framework named GeoSkill has been proposed, built upon an evolving Skill-Graph. The operational mechanism of GeoSkill unfolds in several key phases. Initially, the Skill-Graph is populated by refining human expert trajectories into atomic, natural-language skills. For subsequent execution, GeoSkill employs an inference model to perform direct reasoning, which is explicitly guided by the current state of the Skill-Graph. A crucial component for continuous growth is the Autonomous Evolution mechanism. This mechanism utilizes a larger model to conduct multiple reasoning rollouts on extensive image-coordinate pairs, sourced from web-scale data and validated through real-world reasoning.

By meticulously analyzing both successful and failed trajectories from these rollouts, the autonomous evolution mechanism iteratively synthesizes and prunes skills. This process effectively expands the Skill-Graph and corrects inherent geographic biases, all accomplished remarkably without requiring any parameter updates. Experimental evaluations confirm GeoSkill's promising performance in both geolocation accuracy and reasoning faithfulness when benchmarked against the GeoRC dataset. Furthermore, the framework demonstrates superior generalization capabilities across a wide range of diverse external datasets. A particularly notable outcome is how the autonomous evolution mechanism fosters the emergence of novel, verifiable skills. This significantly enhances the system's cognitive understanding of real-world geographic knowledge, moving beyond the limitations of isolated case studies.

↗ Read original source