Zoom-IQA
Image Quality Assessment with
Reliable Region-Aware Reasoning

Zoom-IQA operates like an expert — 1. hypothesizing flaws 2. grounding with crops 3. verifying degradations for reliable quality assessment.

Qualitative Results on image quality assessment
Zoom-IQA demonstrates superior image quality assessment capabilities across various real-world scenarios. Explore our results below.

Hover over any example to preview the multi-turn dialogue here.

Qualitative Results on Reasoning-guided Image Restoration
Zoom-IQA demonstrates superior downstream capabilities in guiding image resotration across various real-world scenarios. Explore our results below.

Hover over the restoration carousel to preview the dialogue.

Zoom-IQA
1
Region-Aware Reasoning. Zoom-IQA reasons about the image quality at the regional level, providing a comprehensive assessment of the image quality across different regions, especially in the part of Direction for Improvement and Issues to Avoid.
2
Iterative Refinement. Zoom-IQA can iteratively refine the restoration process by leveraging the textual rationale to guide the restoration process like the expert does (hypothesis flaws, ground with crops, verifying degradation).
3
Zero-Shot Application for Downstream Tasks. Zoom-IQA's superior reasoning capability allows for zero-shot application to downstream tasks, like image restoration. Directions for Improvement and Issues to Avoid in the textual rationale can be directly used to guide the restoration process as positive prompt and negative prompt respectively.
Zoom-IQA teaser
Zoom-IQA framework
Key Highlights
1
Grounded Quality Rationale Learning We first leverage our GR-IQA dataset to teach the VLM the foundational "how-to" skills: grounding textual rationales in visual regions and executing the "zoom" action.
2
Self-Guided Exploration To enable iterative refinement without exhaustive supervision, we employ Reinforcement Learning to derive a dynamic policy that optimizes the deployment of these skills. Specifically, we adopt Group Relative Policy Optimization (GRPO) as our optimization framework, facilitating stable and efficient policy exploration.
DATASET CURATION PIPELINE

Our pipeline is designed to avoid visual hallucination by providing rationales that are verifiably grounded in visual regions with Gemini-2.5-pro.

1
Data Generation. We prompt the closed-source VLM, Gemini-2.5-pro, on IQA dataset images using a structured prompt. This compels the VLM to generate a two-part response: a textual rationale and a JSON action.
2
Visual Reliance Filtering (VRF). VRF enforces grounding by measuring the generative output shift (with vs. without the image);
3
Hint-Augmented Consistency Filtering (HACF). HACF filters hallucination-like descriptions at the sentence level to maintain fluency.
Zoom-IQA dataset