CAMEO: A Quality-Aware Multi-Agent Framework for Feedback-Driven Conditional Image Editing

Conditional image editing, which aims to modify a source image based on textual prompts and optional reference guidance, is crucial for scenarios demanding strict structural control. Examples include inserting anomalies into driving scenes or executing complex human pose transformations.

Despite recent advancements in large-scale editing models like Seedream and Nano Banana, most existing approaches rely on a single-step generation paradigm. This method frequently lacks explicit quality control, can introduce excessive deviation from the original image, and often produces structural artifacts or modifications inconsistent with the environment. Achieving acceptable results typically necessitates extensive manual prompt tuning.

To address these limitations, researchers have introduced CAMEO, a structured multi-agent framework. CAMEO redefines conditional editing as a quality-aware, feedback-driven process, rather than a one-shot generation task. It decomposes the editing process into coordinated stages: planning, structured prompting, hypothesis generation, and adaptive reference grounding, where external guidance is invoked only when task complexity requires it.

To overcome the absence of intrinsic quality control in existing methods, CAMEO directly embeds evaluation within the editing loop. Intermediate results are iteratively refined through structured feedback, forming a closed-loop process that progressively corrects structural and contextual inconsistencies. CAMEO was evaluated on anomaly insertion and human pose switching tasks. Across multiple robust editing backbones and independent evaluation models, CAMEO consistently achieved a 20% higher win rate on average compared to several state-of-the-art models, demonstrating significant improvements in robustness, controllability, and structural reliability for conditional image editing.

CAMEO: A Quality-Aware Multi-Agent Framework for Feedback-Driven Conditional Image Editing

Next Stories to Read

IMAgent: Multi-Image Vision Agent Achieves SOTA with End-to-End Reinforcement Learning

AutoVerifier: An LLM-Powered Agentic Framework for Automated Technical Claim Verification

LLM Framework Leverages BFS for Efficient Causal Graph Discovery with Linear Queries

Related Tools & Resources

Skill Marketplaces

Matt Pocock's AI Skills