Zoom-IQA operates like an expert — 1. hypothesizing flaws 2. grounding with crops 3. verifying degradations for reliable quality assessment.
Answer content: {"bbox_2d": [0.29, 0.24, 0.88, 0.73], "rating": 3.56, "tool": "crop"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 3.47, "tool": "final"}
Answer content: {"bbox_2d": [0.37, 0.12, 0.96, 0.7], "rating": 2.98, "tool": "crop"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 2.89, "tool": "final"}
Answer content: {"bbox_2d": [0.2, 0.15, 0.88, 0.68], "rating": 3.79, "tool": "crop"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 3.79, "tool": "final"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 4.18, "tool": "final"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 2.29, "tool": "final"}
Hover over any example to preview the multi-turn dialogue here.
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 3.46, "tool": "final"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 1.92, "tool": "final"}
Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 2.59, "tool": "final"}
Answer content:{\"bbox_2d\": [0.0, 0.37, 1.0, 0.96], \"rating\": 2.76, \"tool\": \"crop\"}
Hover over the restoration carousel to preview the dialogue.
Our pipeline is designed to avoid visual hallucination by providing rationales that are verifiably grounded in visual regions with Gemini-2.5-pro.