GitHub - Chen-X666/RepreGuard: [TACL 2025] RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

[TACL 2025] RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

If you like our project, please give us a star ⭐ on GitHub for the latest update.

This repository provides the official implementation of RepreGuard, a robust and efficient framework for detecting LLM-generated text (LGT) by leveraging hidden representation patterns, achieving state-of-the-art performance in both in-domain and out-of-domain settings.

📣 News

[2025.08.01] 🎉 Our paper is accepted by Transactions of the Association for Computational Linguistics (TACL 2025)!

🧐 Overview

RepreGuard is based on the hypothesis that the internal representation patterns of LLMs, when processing human-written texts (HWT) and LLM-generated texts (LGT), are distinct and can be systematically modeled. By employing a surrogate model as an observer, we extract neural activation patterns and identify discriminative features. The resulting RepreScore enables robust classification between HWT and LGT with minimal training data.

Zero-shot detection: Only a small sample of LGT/HWT pairs is needed for threshold calibration.
Strong OOD robustness: Outperforms all previous methods across different models, domains, text sizes, and attacks.
Resource-efficient: Competitive performance even with smaller surrogate models.

⚙️ Datasets, Environment and Experimental Reproduction

Datasets

We use DetectRL dataset, including XSum, Writing Prompts, Yelp Review, and ArXiv abstracts, covering diverse domains and both HWT/LGT pairs generated by ChatGPT, Claude, Google-PaLM, Llama-2-70b, and RAID dataset, including the llama-chat, mistral-chat, mpt-chat, mistral, mpt and gpt2 using both greedy and sampling decoding strategies, with and without the application of repetition penalties. The detail in https://drive.google.com/file/d/1mdTBSRLTo7ob-ofithewt8eDTEzZpb7V/view

wget https://drive.google.com/uc?id=D1mdTBSRLTo7ob-ofithewt8eDTEzZpb7V
unzip datasets.zip

Environment

conda create -n repre_guard python==3.10
conda activate repre_guard
pip install -r requirements.txt

Running RepreGuard

python3 repreGuard_evaluation.py \
    --model_name_or_path meta-llama/Llama-3.1-8B \
    --train_data_path datasets/detectrl_dataset/main_dataset/detectrl_train_dataset_llm_type_ChatGPT.json \
    --test_data_paths datasets/detectrl_dataset/main_dataset/detectrl_test_dataset_llm_type_ChatGPT.json, datasets/detectrl_dataset/main_dataset/detectrl_test_dataset_llm_type_Google-PaLM.json, datasets/detectrl_dataset/main_dataset/detectrl_test_dataset_llm_type_Claude-instant.json, datasets/detectrl_dataset/main_dataset/detectrl_test_dataset_llm_type_Llama-2-70b.json \
    --ntrain 512 \
    --batch_size 8 \
    --rep_token 0.1 \
    --bootstrap_iter -1

Surrogate Model Selection

You can specify the LLM surrogate model (e.g., Llama-3-8B, Phi-2, Gemma-2B-Instruct) via the --model argument.

🧪 Overall Results

Detector	ID	OOD	16-shots	Text w/ Attack	Text w/ Various Size	Text w/ Various Sampling Methods
RoBERTa	90.85	84.26	83.60	72.97	46.81	82.17
Binoculars	88.18	86.19	88.07	78.15	81.70	85.63
RepreGuard	96.34	93.49	90.21	92.61	84.61	92.05

Attack: Paraphrase & Perturbation Attack.
Various Size: 64 - 256 token size.
Various Sampling Methods: w/ and w/o Dec. Strategy and Rep. Penalty.

Please see paper for more detail result.

✏️ Citation

If you find our paper/code useful, please cite us and give your ⭐!

@article{chen2025repreguard,
  author       = {Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao and Derek F. Wong},
  title        = {RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns},
  journal      = {Transactions of the Association for Computational Linguistics},
  year         = {2025},
  url          = {https://github.com/NLP2CT/RepreGuard},
  note         = {Accepted at TACL 2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
datasets		datasets
image		image
repe		repe
run		run
.gitignore		.gitignore
README.md		README.md
repreGuard_detector.py		repreGuard_detector.py
repreGuard_evaluation.py		repreGuard_evaluation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[TACL 2025] RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

🧐 Overview

⚙️ Datasets, Environment and Experimental Reproduction

Datasets

Environment

Running RepreGuard

Surrogate Model Selection

🧪 Overall Results

✏️ Citation

About

Uh oh!

Releases

Packages

Languages

Chen-X666/RepreGuard

Folders and files

Latest commit

History

Repository files navigation

[TACL 2025] RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

🧐 Overview

⚙️ Datasets, Environment and Experimental Reproduction

Datasets

Environment

Running RepreGuard

Surrogate Model Selection

🧪 Overall Results

✏️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages