- Official Website: https://dacon.io/competitions/official/236473/overview/description
With the recent advancement of Generative AI, particularly Large Language Models (LLMs), it has become increasingly difficult to distinguish between AI-generated and human-written text. To address societal issues like the spread of misinformation and public opinion manipulation, this project aims to develop an AI model that predicts the probability of a given text being generated by AI.
The goal is to develop reliable AI-generated content detection technology, contributing to the responsible use of AI and restoring trust in digital information.
- Objective: Develop an AI model to predict the probability (from 0 to 1) that a given paragraph of text was written by a generative AI.
- Unique Labeling Scheme:
- Training Data: Labeled at the full-text level. If even a single paragraph in a document is AI-generated, the entire document is labeled as 'AI-written (1)'. Paragraph-level labels are not provided.
- Evaluation Data: Provided at the paragraph level. The model must submit a probability for each individual paragraph.
- Core Challenge: The key challenge is to perform paragraph-level prediction using document-level weak labels.
- Additional Rule: Using context from other paragraphs within the same document (grouped by
title
) is permitted and encouraged for inference.
1 |
2 |
3 |
4 |
Tae-Min, Kim |
Jun-Hyuk, Seo (BuAs) |
Jae-Hyun, Jo |
Geon-Woo, Yoo |
|
![]() |
|
|
🏆 Award: Grand Prize (IITP President's Award)
The model achieved the following scores on the competition's official leaderboard, securing the second-place position.
Metric | Public Score | Private Score (Final) |
---|---|---|
ROC AUC | 0.9381 | 0.9323 |