GitHub - TrustAIRLab/Unsafe-LLM-Based-Search

Official repository for "Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search".

Introduction

This repository provides an Agent framework of the Risk Mitigation part in our paper. The XGBoost-detector and PhishLLM-detector are for comparison. The code for the PhishLLM-detector can be found at: https://github.com/code-philia/PhishLLM

Project Structure

agent_defense/
├── src/
│   └──agent.py                     # build_agent
│   └──llm.py                       # a discarded trail of using some special API call
│   └──prompt.py                    # prompt
│   └──tools.py                     # tool calling (You could change the tools by modifying the `return_tools` function; the HtmlLLM-detector's prompt can be found in the `is_malicious` function.)
│   └──utils.py                     # XGBoost-detector method
│   └──selenium_fetcher.py          # HtmlLLM-detector method for getting HTML content (optional)
│   └──template.csv                 # template for basic test
│   └──XGBoostClassifier.pickle.dat # XGBoost-detector model weight
├── template.json                   # template for basic test
├── prompt_defense.py               # prompt-based defense code
└── main.py                         # run the defense (It uses the HtmlLLM-detector (ours) by default.)

How to Run

Setup

Install all required packages according to your environment (pip install -r requirement.txt).
Enter the openai_api_key and openai_base_url parameters within the main.py file.
Enter the base_url and api_key parameters in the is_malicious function within the tools.py file.
Enter the base_url and api_key parameters in the prompt_defense.py file.

For Batch Comparison (Shown in Our Paper)

Prepare the batch_result.csv in the format below (You need to use the is_malicious function to obtain the results and write them to this CSV file for batch comparison):

phish_prediction is the result of the PhishLLM-detector, while malicious is the result of our method, the HtmlLLM-detector.
```
url,phish_prediction,malicious
https://example0.com,benign,False
https://example1.com,benign,True
```

Prepare the input.json

[
    {
        "LLM": "The platform name",
        "Query": "The Query",
        "Risk": "main",
        "content": {
            "output": "The output of AIPSE",
            "resource": [
                "https://example0.com",
                "https://example1.com"
            ]
        }
    }
]

BasicTest Run

We provide all template files. To run a basic test, you can simply run:
```
python main.py
python prompt_defense.py
```
after entering the parameters in the main.py, tools.py, and prompt_defense.py files.

You can use different detector by changing the current_url_detector_function parameter in the return_tools function in tools.py file. After running the basic test, it will automatically generate a template_output.json file for verification.

For Single Query

This is not included in our paper, but we have implemented this feature. You can directly test it by changing the return_tools function in tools.py.

Citation

@inproceedings{UnsafeSearch2025,
      title={Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search}, 
      author = {Zeren Luo and Zifan Peng and Yule Liu and Zhen Sun and Mingchen Li and Jingyi Zheng and Xinlei He},
      booktitle = {{34th USENIX Security Symposium (USENIX Security 25)}},
      publisher = {USENIX},
      year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
prompt_defense.py		prompt_defense.py
requirement.txt		requirement.txt
teaser.png		teaser.png
template.json		template.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Project Structure

How to Run

Setup

For Batch Comparison (Shown in Our Paper)

For Single Query

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

TrustAIRLab/Unsafe-LLM-Based-Search

Folders and files

Latest commit

History

Repository files navigation

Introduction

Project Structure

How to Run

Setup

For Batch Comparison (Shown in Our Paper)

For Single Query

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages