Official repository for "Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search".
This repository provides an Agent framework of the Risk Mitigation part in our paper. The XGBoost-detector and PhishLLM-detector are for comparison. The code for the PhishLLM-detector can be found at: https://github.com/code-philia/PhishLLM
agent_defense/
├── src/
│ └──agent.py # build_agent
│ └──llm.py # a discarded trail of using some special API call
│ └──prompt.py # prompt
│ └──tools.py # tool calling (You could change the tools by modifying the `return_tools` function; the HtmlLLM-detector's prompt can be found in the `is_malicious` function.)
│ └──utils.py # XGBoost-detector method
│ └──selenium_fetcher.py # HtmlLLM-detector method for getting HTML content (optional)
│ └──template.csv # template for basic test
│ └──XGBoostClassifier.pickle.dat # XGBoost-detector model weight
├── template.json # template for basic test
├── prompt_defense.py # prompt-based defense code
└── main.py # run the defense (It uses the HtmlLLM-detector (ours) by default.)
- Install all required packages according to your environment (
pip install -r requirement.txt
). - Enter the
openai_api_key
andopenai_base_url
parameters within themain.py
file. - Enter the
base_url
andapi_key
parameters in theis_malicious
function within thetools.py
file. - Enter the
base_url
andapi_key
parameters in theprompt_defense.py
file.
-
Prepare the batch_result.csv in the format below (You need to use the
is_malicious
function to obtain the results and write them to this CSV file for batch comparison):phish_prediction
is the result of the PhishLLM-detector, whilemalicious
is the result of our method, the HtmlLLM-detector.url,phish_prediction,malicious https://example0.com,benign,False https://example1.com,benign,True
-
Prepare the input.json
[ { "LLM": "The platform name", "Query": "The Query", "Risk": "main", "content": { "output": "The output of AIPSE", "resource": [ "https://example0.com", "https://example1.com" ] } } ]
-
BasicTest Run
We provide all template files. To run a basic test, you can simply run:
python main.py python prompt_defense.py
after entering the parameters in the main.py, tools.py, and prompt_defense.py files.
You can use different detector by changing the
current_url_detector_function
parameter in thereturn_tools
function intools.py
file. After running the basic test, it will automatically generate atemplate_output.json
file for verification.
This is not included in our paper, but we have implemented this feature. You can directly test it by changing the return_tools
function in tools.py
.
@inproceedings{UnsafeSearch2025,
title={Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search},
author = {Zeren Luo and Zifan Peng and Yule Liu and Zhen Sun and Mingchen Li and Jingyi Zheng and Xinlei He},
booktitle = {{34th USENIX Security Symposium (USENIX Security 25)}},
publisher = {USENIX},
year = {2025}
}