⛓💥 Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
Weidi Luo*, Tianyu Lu*, Qiming Zhang*, Xiaogeng Liu, Bin Hu
Yue Zhao, Jieyu Zhao, Song Gao, Patrick McDaniel, Zhen Xiang, Chaowei Xiao
Date | Event |
---|---|
2025/08/07 | 🎁 We have released our complete dataset on HuggingFace! |
2025/06/09 | 🛠️ We have posted our v3 paper on arXiv, with our source code and partial dataset (L1 and mirror) released on GitHub and HuggingFace. |
2025/04/29 | 💡 We have posted our v2 paper on arXiv. |
Recent advances in multi-modal large reasoning models (MLRMs) have shown significant ability to interpret complex visual content. While these models enable impressive reasoning capabilities, they also introduce novel and underexplored privacy risks. In this paper, we identify a novel category of privacy leakage in MLRMs: Adversaries can infer sensitive geolocation information, such as a user's home address or neighborhood, from user-generated images, including selfies captured in private settings. To formalize and evaluate these risks, we propose a three-level visual privacy risk framework that categorizes image content based on contextual sensitivity and potential for location inference. We further introduce DoxBench, a curated dataset of 500 real-world images reflecting diverse privacy scenarios. Our evaluation across 11 advanced MLRMs and MLLMs demonstrates that these models consistently outperform non-expert humans in geolocation inference and can effectively leak location-related private information. This significantly lowers the barrier for adversaries to obtain users' sensitive geolocation information. We further analyze and identify two primary factors contributing to this vulnerability: (1) MLRMs exhibit strong reasoning capabilities by leveraging visual clues in combination with their internal world knowledge; and (2) MLRMs frequently rely on privacy-related visual clues for inference without any built-in mechanisms to suppress or avoid such usage. To better understand and demonstrate real-world attack feasibility, we propose GeoMiner, a collaborative attack framework that decomposes the prediction process into two stages: clue extraction and reasoning to improve geolocation performance while introducing a novel attack perspective. Our findings highlight the urgent need to reassess inference-time privacy risks in MLRMs to better protect users' sensitive information.
- Comprehensive MLRM Evaluation: Support for 11+ multi-modal models
- Three-Level Privacy Framework: Systematic categorization of visual privacy risks
- GeoMiner Attack Framework: Novel collaborative attack methodology for enhanced geolocation inference
- Real-World Dataset: 500 curated images reflecting diverse privacy scenarios
- Distance-Based Accuracy Metrics: Precise evaluation using geospatial distance calculations
- Clue Mining Analysis: Automated extraction and analysis of privacy-revealing visual elements
- Parallel Processing: Multi-threaded evaluation for large-scale experiments
- Comprehensive Output: Detailed results with reasoning traces and statistical analysis
For compliance purposes, only partial dataset (L1 and mirror) is available on GitHub. The full dataset is available on HuggingFace.
Contains comprehensive EXIF metadata for all images including:
SourceFile
: Original filename for cross-referencingGPSLatitude
,GPSLongitude
: GPS coordinates for evaluationGPSPosition
: Combined GPS position informationMake
,Model
: Camera device information (iPhone devices)CreateDate
,DateTimeOriginal
: Temporal metadataImageHeight
,ImageWidth
: Image dimensionsISO
,Aperture
,ExposureTime
: Camera settings- 200+ additional EXIF fields for comprehensive metadata analysis
Ground truth geolocation data:
filename
: Image filename for cross-referencingaddress
: Human-readable address (via Google Geocoding API)latitude
: GPS latitude coordinatelongitude
: GPS longitude coordinate
Currently available categories:
- Level 1 (L1): in
benign_people
folder, ~128 images (25.2% of total dataset) - Personal imagery in public space - Level 2 (L2): in
privacy/privacy_no_people
folder, ~187 images (37.2% of total dataset) - Private space without person - Level 3 (L3): in
privacy/privacy_people
folder, ~140 images (28.4% of total dataset) - Personal imagery in private space - Mirror: in
mirror
folder, ~45 images (9.2% of total dataset) - Location inference through reflective surfaces
git clone https://github.com/lutianyu2001/DoxBench.git
cd ./DoxBench/code/experiment
conda env create -f environment.yml
conda activate gps-address
Create a .env
file in the experiment directory:
# Required API Keys
OPENAI_API_KEY=your_openai_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
DASHSCOPE_API_KEY=your_dashscope_api_key_here
GOOGLE_MAPS_API_KEY=your_google_maps_api_key_here
- OpenAI Models:
o3
,o4mini
,gpt4o
,gpt4.1
- Anthropic Models:
sonnet4
,opus4
- Google Models:
Gemini-2.5Pro
- Meta Models:
llama4-maverick
,llama4-scout
- Qwen Models:
qvq-max
python experiment.py INPUT_CSV [OPTIONS]
INPUT_CSV
: Path to input dataset CSV file (e.g.,dataset/result.csv
)
--model MODEL
: Select AI model- Available:
o3
,o4mini
,gpt4o
,gpt4.1
,gpt4.1-mini
,sonnet4
,opus4
,gemini
,qwen2.5vl
,llama4-maverick
,llama4-scout
,qvq-max
,llama-guard4
- Available:
--geominer_detector_model MODEL
: GeoMiner Detector model (OpenAI-compatible models only, default:gpt4o
)
-o, --output OUTPUT_DIR
: Output directory (default:results/
)
--top1
: Request Top-1 address candidate (default)--top3
: Request Top-3 address candidates
--cot MODE
: CoT modeon
: Standard chain of thought (default)off
: No chain of thoughtworkflow
: GeoMiner workflow mode
-p, --parallel NUM
: Number of parallel threads (default: 1)-m, --max-tasks NUM
: Maximum number of tasks to process-r, --random-sample NUM
: Random sample size-s, --random-seed SEED
: Random seed for reproducible results
--breakpoint ID
: Resume from specific output CSV ID
--reasoning_summary MODE
: Reasoning process recordingoff
: Disabled (default)plain
: Basic reasoning capturewith_llm_judge
: LLM-based reasoning analysiswith_llm_judge-MODEL
: Use specific model as judge
--prompt-based-defense MODE
: Privacy defense mechanism (on
/off
)--noise STD
: Apply Gaussian noise preprocessing (0.1-1.0)
# Basic evaluation with GPT-4o
python experiment.py dataset/result.csv --model gpt4o
# Test with Claude Sonnet 4, Top-3 addresses, CoT enabled
python experiment.py dataset/result.csv \
--model sonnet4 \
--top3 \
--cot on \
--parallel 4
# With prompt-based defense
python experiment.py dataset/result.csv \
--model o3 \
--top1 \
--cot off \
--prompt-based-defense on \
--parallel 4
# With Gaussian noise preprocessing
python experiment.py dataset/result.csv \
--model gpt4o \
--top1 \
--cot off \
--noise 0.3 \
--parallel 4
# Resume from specific ID in output CSV
python experiment.py dataset/result.csv \
--model qvq-max \
--top3 \
--cot on \
--breakpoint 286
# Random sampling for quick testing
python experiment.py dataset/result.csv \
--model gpt4o \
--top3 \
--cot on \
--parallel 4 \
--max-tasks 8 \
--random-sample 100
Results are automatically saved to timestamped files in the results/
directory:
- Format:
test492-cot_{MODE}-top{N}-{MODEL}-{TIMESTAMP}.csv
- Includes: geolocation predictions, accuracy metrics, reasoning traces, token usage
- Basic Info:
id
,image_id
,classification
,people
,selfie
- Ground Truth:
address
,geoid
,latitude
,longitude
,country
,region
,metropolitan
- Predictions:
guessed_address
,guessed_geoid
,guessed_lat
,guessed_lon
,guessed_country
,guessed_region
,guessed_metropolitan
- Accuracy:
country_correct
,region_correct
,metropolitan_correct
,tract_correct
,block_correct
,error_distance_km
- Process Data:
api_call_time
,clue_list
,address_list
,answer
,prompt
- Advanced:
reasoning_process
,extract_and_analyze
(if reasoning enabled)
Use --cot workflow
to invoke GeoMiner:
python experiment dataset/result.csv \
--geominer_detector_model llama4-scout \
--model llama4-scout \
--top1 \
--cot workflow \
--parallel 4
The clueminer.py
tool provides an advanced LLM-powered framework for building privacy clue taxonomies.
python clueminer.py [OPTIONS]
--input-csv PATH
: Input CSV file with clue data--model MODEL
: OpenAI model for analysis (default:o4-mini-2025-04-16
)--output-dir DIR
: Output directory (default:phase1_output
)--breakpoint PATH
: Resume from specific JSON file
# Basic category mining
python clueminer.py \
--input-csv results/your_results.csv \
--model o4-mini-2025-04-16
# Resume from breakpoint
python clueminer.py \
--input-csv results/your_results.csv \
--breakpoint output/phase1_categories_iteration_5.json
output/
├── {input_filename}_{timestamp}/
│ ├── final_categories.json # Final category taxonomy
│ ├── output/ # Individual round results
│ │ ├── phase1-output-round1-id_1-imageid_243.json
│ │ └── ...
│ └── input/ # Prompt/response logs
│ ├── phase1-input-round1-id_1-imageid_243.json
│ └── ...
Check the code in code/data_analysis.ipynb, select and deploy the evaluation metric in your code.
- Code: This project's code is licensed under the Apache License 2.0 - see the LICENSE file
- Dataset: The datasets are licensed under Creative Commons Attribution NonCommercial NoDerivatives 4.0 International - see LICENSE-DATA
If you use DoxBench in your research, please cite our paper:
@misc{luo2025doxinglensrevealinglocationrelated,
title={Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models},
author={Weidi Luo and Tianyu Lu and Qiming Zhang and Xiaogeng Liu and Bin Hu and Yue Zhao and Jieyu Zhao and Song Gao and Patrick McDaniel and Zhen Xiang and Chaowei Xiao},
year={2025},
eprint={2504.19373},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2504.19373},
}