SOURCE // LABS

Claude Code-Driven Scenario Mining for the Argoverse 2 Challenge

Claude Code-Driven Scenario Mining for the Argoverse 2 Challenge

In the field of autonomous driving, mining critical and safety-critical edge cases (such as aggressive cut-ins or emergency braking) from massive sensor datasets remains a core bottleneck. For the CVPR 2026 Argoverse 2 Scenario Mining Challenge, researchers presented a revolutionary four-stage automated pipeline, showcasing the immense potential of AI Agents in vertical industrial code generation and logical verification.

The proposed system completely bypasses the tedious process of manual rule drafting by establishing a closed-loop pipeline consisting of autonomous generation, metric-driven filtering, code review, and visual validation:

Stage 1: Autonomous Code Generation. The framework leverages a Claude Code agent powered by the GLM 5.1 engine. This agent ingests natural language definitions of specific scenario mining tasks and autonomously outputs executable Python search scripts, translating complex physical traffic scenarios into programmatic database queries.

Stage 2: Iterative Dataset Screening. To optimize the mining query performance, the system implements a strict screening mechanism utilizing Timestamp Balanced Accuracy (TBA). By enforcing a high TBA threshold of 0.8, it iteratively refines and curates premium few-shot examples, feeding them back into the agent's context for self-improvement.

Stage 3: Semantic Code Review. Code stability is paramount in autonomous driving. The team deployed a separate, isolated Claude Code session dedicated solely to conducting semantic reviews of the generated code, diagnosing edge-case bugs and syntax errors before execution.

Stage 4: Multimodal Scene Verification. To eliminate false positives, the system introduces the cutting-edge vision-language model Qwen3-VL. It analyzes multi-view video feeds and 3D point clouds from the Argoverse 2 dataset, executing a "seeing-is-believing" visual check to filter out erroneous semantic matches.

[AgentUpdate Depth Analysis] This submission represents a significant milestone in how AI Agents are applied to high-stakes, domain-specific engineering challenges. Traditionally, scenario mining for autonomous vehicles has relied on heuristics and manually tuned spatial-temporal rules. By elegantly pairing Claude Code's code-generation with GLM 5.1's reasoning, and compounding it with Qwen3-VL's physical-world multimodal perception, this pipeline showcases a mature multi-agent cooperative architecture. The "generate-review-verify" loop demonstrates how specialized Agents can out-perform monolithic models by introducing safety guardrails and multi-modal cross-checks. As the AI Agent ecosystem matures, we expect to see more of these compound AI systems replacing manual software and data engineering in highly specialized sectors like aerospace, robotics, and logistics.