LLM - Detect AI Generated Text (Visual Analytics - UPF)

To replicate the development environment simply run the following command (you can change the name of the environment from vis_analytics to something else).

conda env create --name vis_analytics --file environment.yml
conda activate vis_analytics

If yor system does support CUDA, it is recommended to uncomment the pytorch-cuda requirement from the .yml file and to uncomment the nvidia channel.

Alternatively, we also provide a pip requirements.txt file. Please take into account that the project has been developed with python 3.11. We have not tested if the code works with other versions of python.

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Then you can simply run the streamlit app as follows:

streamlit run app/Welcome.py

The first execution of the app will download our pre-trained model, the AI-generated text detection model from SimpleAI and some datasets that we use. Depending on your internet connection this process could take several minutes, please be patient.

These files are too large to be uploaded to GitHub.

Project description and expected benefits

The goal of the project is to build a classification model that is capable of accurately detecting whether a text has been generated by a LLM or by a student. The purpose of the model is to improve plagiarism detection tools in this new learning context defined by AI. This project has been motivated by this competition hosted in Kaggle.

Since we don't expect to get great accuracy results (especially in such short time), we will put a lot of emphasis in explainable AI. The idea is to build a tool capable of detecting potential plagiarism candidates, and then leave the final call to a human (normally a professor). To help this decision, information about why the model has predicted plagiarism (e.g. using SHAP) will be given to the professor.

Required data sources

Jules King, Perpetual Baffour, Scott Crossley, Ryan Holbrook, Maggie Demkin. (2023). LLM - Detect AI Generated Text. Kaggle. https://kaggle.com/competitions/llm-detect-ai-generated-text

https://www.kaggle.com/datasets/jdragonxherrera/augmented-data-for-llm-detect-ai-generated-text/data

Expected results/delivery/output

The delivery of the project will include:

Python scripts with:
- Exploratory Data Analysis and feature engineering
- Training a Machine Learning model to classify texts (between AI-generated and human-generated)
- Model performance evaluation
A streamlit webapp to present the results and use Explainable AI plots to interpret predictions.

Visualization method

We will build a Streamlit webapp to present the methodologies used, the analysis of the data, the Machine Learning model and an interpretation of the results (Explainable AI). We will design the webapp with the goal of using storytelling techniques for the presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
app		app
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
report.md		report.md
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM - Detect AI Generated Text (Visual Analytics - UPF)

Project description and expected benefits

Required data sources

Expected results/delivery/output

Visualization method

Useful documentation for the project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Alejandro-FA/UPF-Detect-AI-Generated-Text

Folders and files

Latest commit

History

Repository files navigation

LLM - Detect AI Generated Text (Visual Analytics - UPF)

Project description and expected benefits

Required data sources

Expected results/delivery/output

Visualization method

Useful documentation for the project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages