Zoom-IQA

Image Quality Assessment with

Reliable Region-Aware Reasoning

Zoom-IQA operates like an expert — 1. hypothesizing flaws 2. grounding with crops 3. verifying degradations for reliable quality assessment.

📄 Paper 💻 Code (Soon) 🤗 Model (Soon) 📊 Dataset (Soon)

Guoqiang Liang¹ Jianyi Wang¹ Zhonghua Wu² Shangchen Zhou^1,✉

S-Lab, Nanyang Technological University¹

SenseTime Research²

Qualitative Results on image quality assessment

Zoom-IQA demonstrates superior image quality assessment capabilities across various real-world scenarios. Explore our results below.

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The image suffers from motion blur, most prominently affecting the main subject, the rickshaw, and the people moving around it. Exposure is uneven, with some bright areas on the pavement and shadows obscuring details elsewhere. Overall, the technical quality is fair due to motion artifacts. Directions for Improvement: Capturing the facial expressions and clothing textures of the people walking near the rickshaw with precise focus would yield a highly detailed scene. Issues to Avoid: Extreme motion blur severely softens the details of the rickshaw's canopy and the moving figures, including the woman in the striped shirt and the man pushing the cart. Decision & Rationale: I assign a moderate score because while the composition is dynamic, the motion blur compromises the image integrity. A crop is necessary to confirm the extent and nature of the motion blur, specifically examining how much detail is lost in the faces and clothing of the people interacting with the rickshaw versus the clearer foreground pavement. The bounding box [0.29, 0.24, 0.88, 0.73] isolates the main area of motion artifact and sharpness comparison.

Answer content: {"bbox_2d": [0.29, 0.24, 0.88, 0.73], "rating": 3.56, "tool": "crop"}

Here is the crop of the image. You are analyzing the original image and the provided crop for a final quality assessment.

./zoomiqa/iqa1_crop.jpg

Crop Inspection Summary: While the foreground pavement remains sharp, the crop clearly shows how the motion blur renders the woman's face and clothing (right side) and the man's lower body (left side) indistinct. The final quality remains fair. Directions for Improvement: Achieving sharp focus on the woman's striped shirt and the passenger's legs within the rickshaw would greatly improve the scene's detail. The spokes of the large wheel should exhibit precise, crisp detail without streaking. Issues to Avoid: Severe motion blur renders the woman's face and hair indistinguishable and smears the clothing textures on both the man and the woman. Final Decision & Rationale: The crop confirmed this issue, revealing that the motion blur is indeed extreme on the faces and clothing of the people around the rickshaw, confirming the primary flaw. The final rating is 3.47.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 3.47, "tool": "final"}

zoomiqa/iqa1.jpg

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The overall sharpness is poor due to significant motion blur and general softness affecting all areas, particularly the vehicle. Exposure issues exist, with blown-out highlights in the background sky area. Overall quality is assessed as fair. Directions for Improvement: Achieving crisp clarity on the dense green shrubbery in the midground would enhance depth perception and texture visibility. Furthermore, ensuring balanced illumination throughout the scene would reveal subtle details within the bright background foliage. Issues to Avoid: Excessive general softness degrades the definition of the approaching black vehicle, making its license plate and body lines indistinct. Blown highlights completely wash out any meaningful detail in the bright background sky area. Decision & Rationale: The primary uncertainty lies in assessing the extent of sharpness loss on crucial elements like the vehicle's license plate and the detailed textures of the foreground foliage. A crop is necessary to confirm the severity of the lack of fine detail registration.

Answer content: {"bbox_2d": [0.37, 0.12, 0.96, 0.7], "rating": 2.98, "tool": "crop"}

Here is the crop of the image. You are analyzing the original image and the provided crop for a final quality assessment.

./zoomiqa/iqa2_crop.jpg

Crop Inspection Summary: The crop, focused on the approaching vehicle and the foreground foliage, confirms the initial diagnosis of severe motion blur and general lack of sharpness. Fine details on the vehicle's license plate and the dense shrubbery are completely obscured, validating the assessment that the image suffers from significant clarity degradation. The final quality verdict remains fair. Directions for Improvement: Achieving sharp rendering of the dense green leaves in the midground would provide excellent depth and texture definition. Optimal focus registration on the approaching black vehicle would ensure the license plate characters and the headlight details are perfectly crisp and legible. Issues to Avoid: Significant motion blur renders the green foliage indistinct and mushy, lacking fine leaf detail. Excessive general softness degrades the definition of the vehicle's front grille and logo, making them appear indistinct blobs. Final Decision & Rationale: The crop confirmed that the image suffers from pervasive motion blur and general softness, which severely degrades the primary subjects. Since the findings confirmed the initial assessment of poor quality without revealing any unexpected strengths or weaknesses, the final rating is slightly adjusted but remains consistent with the initial judgment.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 2.89, "tool": "final"}

zoomiqa/iqa2.jpg

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The image suffers from general softness across the frame, particularly affecting the main subjects drawing at the table. Digital noise is also present, especially in shadowed areas and mid-tones. The overall quality is considered good but lacks sharpness and clarity. Directions for Improvement: Achieving sharp focus on the faces of the children, especially the girl in the pink jacket and the girl in the yellow jacket, would greatly enhance the viewer engagement. Optimal illumination would render crisp details within the coloring books and the art supplies organized on the wooden table surface. Issues to Avoid: Significant lack of sharpness diminishes the fine detail definition in the foreground subjects' clothing and hair. Excessive digital grain obscures subtle textures in the darker areas, such as the boy's red jacket and the shadows beneath the tables. Decision & Rationale: Given the overall composition is clear and the primary issue is generalized softness rather than severe motion blur or critical focus failure, a localized crop is beneficial to confirm the extent of the resolution loss and digital grain. I will crop the central area focusing on the three visible children and the immediate table surface to better assess the sharpness of their features and the clarity of the artwork.

Answer content: {"bbox_2d": [0.2, 0.15, 0.88, 0.68], "rating": 3.79, "tool": "crop"}

Here is the crop of the image. You are analyzing the original image and the provided crop for a final quality assessment.

./zoomiqa/iqa3_crop.jpg

Crop Inspection Summary: The crop, focused on the main subjects, confirms the initial diagnosis of generalized softness and moderate digital grain. Fine details on the faces, such as the eyes and hair strands, lack crisp definition due to missed focus or slight motion blur. The digital grain is clearly visible in the mid-tones and shadows, particularly on the jackets and skin. Directions for Improvement: Optimal sharpness would render the texture of the fluffy pink jacket with clarity. The details of the coloring book pages would be perfectly crisp, showing every pencil line without blurring. Issues to Avoid: The faces of all four children suffer from generalized softness, confirming missed focus points. Excessive digital grain obscures fine detail in the dark hair and shadows around the subjects. Final Decision & Rationale: My initial rating was 3.79, and the reason for the crop was to confirm the extent of resolution loss and digital grain. The crop confirmed that the image suffers from softness and moderate digital grain. Since the issues were neither significantly better nor worse than anticipated, the final rating is confirmed. The final score is 3.79.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 3.79, "tool": "final"}

zoomiqa/iqa3.jpg

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The overall composition and color fidelity are excellent, showcasing vibrant hues and intricate geometric patterns effectively. However, the image suffers from generalized softness across the entire frame, indicating slight missed focus or low resolution capture. Minor digital artifacts, likely resulting from compression or post-processing, are present along some high-contrast edges. Overall quality is considered good. Directions for Improvement: Achieving superior clarity would involve ensuring crisp definition around the edges of the teal and orange modular units in the foreground, maximizing the tactile appearance of the material. Optimal sharpness would reveal the texture and depth within the folds of the green and blue components throughout the mid-ground. Issues to Avoid: Significant generalized lack of sharpness diminishes the fine detail across the entire patterned surface, particularly noticeable on the edges of the large green and teal shapes in the mid-ground. Minor digital artifacts compromise the integrity of the sharpness along the boundaries between the orange and adjacent colored sections. Decision & Rationale: Given the overall clarity and vibrant presentation, the primary issue is generalized softness which is uniformly distributed and clearly visible without magnification. Therefore, a final decision can be made without needing a crop. I assign a score reflective of its strong composition being slightly held back by minor focus issues.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 4.18, "tool": "final"}

zoomiqa/iqa4.JPG

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The primary technical flaw is severe motion blur or camera shake, rendering all plant leaves soft and lacking definition. Resolution is very low throughout the frame. Additionally, the bright white background areas are slightly overexposed. The overall quality is poor due to the extreme lack of sharpness. Directions for Improvement: Achieving sharp focus would render the veins and textures on the large foreground leaves highly detailed and clear. Optimal exposure settings would ensure the bright white backdrop maintains subtle tonal gradations without losing highlight detail. The small yellow object on the left side would possess crisp, defined edges. Issues to Avoid: Extreme motion blur obscures the fine details across all the green plant leaves, especially those in the center and foreground. Blown highlights completely wash out any texture or information in the bright white background areas. A general lack of sharpness pervades the entire composition. Decision & Rationale: Given the extreme and uniform nature of the motion blur across the entire image, there is no ambiguity regarding the image's core flaw that requires magnification via cropping. Therefore, a final decision can be made based on the full frame. The rating reflects the severity of the motion blur combined with the overexposure.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 2.29, "tool": "final"}

zoomiqa/iqa5.jpg

Hover over any example to preview the multi-turn dialogue here.

Qualitative Results on Reasoning-guided Image Restoration

Zoom-IQA demonstrates superior downstream capabilities in guiding image resotration across various real-world scenarios. Explore our results below.

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The primary subject is well-lit and sharp, particularly around the face and eyes, creating strong visual focus. However, the entire image suffers from excessive digital grain or texture applied uniformly across all tonal areas, which severely degrades overall clarity and smoothness. This heavy processing masks subtle details and contributes to a generally rough appearance, despite the good composition. The overall quality leans toward good due to the effective primary focus. Directions for Improvement: Achieving superior smoothness and refined texture definition within the dark suit jacket and the background would greatly enhance realism and depth perception. Optimal clarity should be present in the facial hair and skin texture, allowing the subtle wrinkles on the forehead and brow to appear perfectly sharp and natural. Furthermore, ensuring the illuminated areas of the face maintain smooth gradients would maximize the dramatic effect of the lighting. Issues to Avoid: Excessive digital grain or artificial texture is highly distracting and negatively impacts the smooth surfaces of the face, especially around the forehead and cheekbones. This heavy processing obscures fine details and contributes to a gritty, unnatural appearance throughout the dark suit jacket. Additionally, the extreme darkness in the background areas results in a significant loss of shadow detail. Decision & Rationale: I will issue a final decision because the main technical flaw-the heavy, uniform grain-is clearly evident throughout the entire composition, including the face and suit. A localized crop is unnecessary as there remains no ambiguity regarding the nature or extent of this processing artifact. Given the effective composition and good focus on the primary subject's eyes, despite the overall graininess, the score is set above average. I assign a score reflecting the balance between good composition and poor technical execution.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 3.46, "tool": "final"}

zoomiqa/restoration1_output.png

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The primary issue with this photograph is extreme global defocus, rendering the entire scene blurry and lacking sharpness. While the composition and lighting are simple, the lack of a clear focal point obscures all fine detail in the subject's fur and face. There is also some visible digital noise or grain throughout the image, further reducing clarity. Overall, the technical execution results in a very low-quality image. Directions for Improvement: Achieving sharp focus on the dog's eyes would enhance the portrait's engagement and expressiveness. Ensuring crisp detail across the varied textures of the dog's coat and the blue collar would greatly improve the overall clarity of the main subject. A well-defined nose would provide a strong anchor point for the viewer's gaze. Issues to Avoid: Extreme softness and blur completely obscure any fine detail or texture on the dog's face and ears. Significant digital noise or grain degrades the smooth appearance of the background and the subject's fur. The lack of sharpness makes the identification of individual hairs or facial features impossible. Decision & Rationale: Given the severity and uniformity of the blurring, there is no ambiguity regarding the technical flaws that requires localized investigation via cropping. Therefore, a final decision can be made. The rating is extremely low because the fundamental photographic goal-sharp depiction of the subject-has been entirely missed.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 1.92, "tool": "final"}

zoomiqa/restoration2_output.png

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The image suffers from severe global defocus blur, rendering all facial features-eyes, nose, mouth, and the kanji character on the headband-as soft and indistinct shapes. While the color palette is vibrant and the composition is dynamic, the fundamental lack of sharpness makes the overall presentation poor. Directions for Improvement: Optimal clarity would render the sharp, defined edges of the prominent black kanji character on the gray headband. Precise focus would ensure the subject's eyes and nose exhibit crisp detail, allowing the viewer to appreciate the subtle shading and expression. Achieving sharp rendering of the white hair framing the face would complete the detailed portrait. Issues to Avoid: Extreme defocus blurring obscures all fine detail across the face, particularly degrading the definition of the eyes and mouth. The heavy blur also renders the kanji character on the headband indistinct and lacking sharpness. This pervasive lack of focus compromises the entire composition, including the white hair and the red headband strap. Decision & Rationale: Given the extreme and uniform nature of the blur across the entire frame, there remains no area of ambiguity or fine detail that requires localized magnification via cropping to accurately assess the quality. Therefore, a final decision can be made based on the full image.

Answer content: {"bbox_2d": [0, 0, 0, 0], "rating": 2.59, "tool": "final"}

zoomiqa/restoration3_output.png

What's your opinion on the quality of this picture? Please provide your assessment and reasoning.

Image Quality Summary: The primary focus is extremely shallow, rendering the snake's head sharp while blurring the body significantly. While the composition is striking, the extensive motion blur across the main subject compromises the overall detail and texture clarity. Exposure is well-managed between the bright orange scales and the deep black background, though some specular highlights are present. Overall quality is fair due to the extensive blur. Directions for Improvement: Achieving crisp definition along the entire length of the snake's body would enhance the visual appeal, particularly bringing out the intricate scale patterns on the curled segments. Optimal clarity around the snake's head and eye would allow the viewer to appreciate the fine detail of its features. Maintaining bright, vibrant color saturation on the orange scales is desirable. Issues to Avoid: Significant motion blur obscures all fine detail and texture across the majority of the snake's body, creating a smeared appearance from the head back toward the tail. The extremely shallow depth of field results in excessive softness along the main body structure. Specular highlights on the scales near the head are slightly distracting.\nDecision & Rationale: Given the extreme lack of sharpness across the main subject due to motion blur, a crop is necessary to confirm the exact plane of focus and assess the severity of the blur on the detailed head area versus the blurred body. This will help finalize the rating based on the degree of motion degradation. I assign an initial score reflective of the major flaw.

Answer content:{\"bbox_2d\": [0.0, 0.37, 1.0, 0.96], \"rating\": 2.76, \"tool\": \"crop\"}

zoomiqa/restoration4_output.png

Hover over the restoration carousel to preview the dialogue.

Zoom-IQA

Region-Aware Reasoning. Zoom-IQA reasons about the image quality at the regional level, providing a comprehensive assessment of the image quality across different regions, especially in the part of Direction for Improvement and Issues to Avoid.

Iterative Refinement. Zoom-IQA can iteratively refine the restoration process by leveraging the textual rationale to guide the restoration process like the expert does (hypothesis flaws, ground with crops, verifying degradation).

Zero-Shot Application for Downstream Tasks. Zoom-IQA's superior reasoning capability allows for zero-shot application to downstream tasks, like image restoration. Directions for Improvement and Issues to Avoid in the textual rationale can be directly used to guide the restoration process as positive prompt and negative prompt respectively.

Key Highlights

Grounded Quality Rationale Learning We first leverage our GR-IQA dataset to teach the VLM the foundational "how-to" skills: grounding textual rationales in visual regions and executing the "zoom" action.

Self-Guided Exploration To enable iterative refinement without exhaustive supervision, we employ Reinforcement Learning to derive a dynamic policy that optimizes the deployment of these skills. Specifically, we adopt Group Relative Policy Optimization (GRPO) as our optimization framework, facilitating stable and efficient policy exploration.

DATASET CURATION PIPELINE

Our pipeline is designed to avoid visual hallucination by providing rationales that are verifiably grounded in visual regions with Gemini-2.5-pro.

Data Generation. We prompt the closed-source VLM, Gemini-2.5-pro, on IQA dataset images using a structured prompt. This compels the VLM to generate a two-part response: a textual rationale and a JSON action.

Visual Reliance Filtering (VRF). VRF enforces grounding by measuring the generative output shift (with vs. without the image);

Hint-Augmented Consistency Filtering (HACF). HACF filters hallucination-like descriptions at the sentence level to maintain fluency.