This contains the codebase for our submission to the Adobe Mid Prep PS Competition. The project focuses on image classification and artifact detection, leveraging advanced techniques in super-resolution, classification, and interpretability.
We use Real-ESRGAN for image super-resolution to enhance the quality of input images. The enhanced images are then classified using DenseNet, determining if the images are real or fake.
- Image Super-Resolution: Apply Real-ESRGAN to upscale and enhance input images.
- Image Classification: Use DenseNet for classifying enhanced images.
This task employs Llama Vision 3.2 Instruct 11B in combination with Grad-CAM boundary boxes (from the last layer of DenseNet). The pipeline identifies potential artifacts in images, narrows down possible issues, and generates explanations for artifact fixes.
- Artifact Detection: Use Grad-CAM to localize areas of interest in the classified image.
- Fix Explanation: Narrow artifact choices and provide explanations for the artifact fixes.
- Install the required dependencies using:
(This step can be skipped, as the token setup is pre-configured in
pip install -r requirements.txt
VLM.ipynb.)
- Log in or create an account at Hugging Face.
- Generate a new token from your account settings.
- Save the token for configuring the pipeline.
Additionally:
- Visit Meta to request access for Llama Vision 3.2 models.
- Access Llama 3.2 Vision Instruct after gaining necessary permissions.
Configure Token in VLM.ipynb
- Paste the Hugging Face token into the appropriate section in
VLM.ipynb.
Here’s an overview of the repository:
main.py: The primary script for running the entire pipeline.data/: Contains input images for processing.models/: Stores downloaded model weights and the Real-ESRGAN model.output/: Contains outputs, including:- Results in
84_task1.jsonand84_task2.json.
- Results in
densenet.py: Script for classifying images using DenseNet.super-resolution.py: Performs super-resolution using Real-ESRGAN.VLM.ipynb: Interactive notebook for Task 2.requirements.txt: Lists all required dependencies.
Install required dependencies by running:
pip install -r requirements.txtAdd your input images to the data/ folder.
Run the main pipeline for Task 1:
python main.py- The first run may take longer due to model weight downloads.
Run the VLM.ipynb notebook to perform artifact detection and fix explanation.
Results will be stored in the output/ folder:
84_task1.json: Results of image classification (real or fake).84_task2.json: Results of artifact detection and explanations.
Additional outputs include:
- Super-resolved images.
- Grad-CAM visualizations.
After executing the pipeline:
-
Task 1 Output:
84_task1.json: Contains classification results of input images.
-
Task 2 Output:
84_task2.json: Contains artifact detection and fix explanations.