Welcome to my repository for the OpenAI gpt-oss-20b Red-Teaming Challenge!
This repo documents my participation in the Kaggle red-teaming competition focused on probing OpenAI's newly released gpt-oss-20b model for previously undiscovered vulnerabilities and harmful behaviors. The goal is to identify, document, and report up to five distinct issues, contributing to the safety and alignment of open-source AI models.
Interactive Notebook: Access the complete red-teaming notebook on Google Colab for hands-on experimentation with the gpt-oss-20b model.
- Find flaws and vulnerabilities in gpt-oss-20b (not previously reported)
- Document exploits with reproducible reports and code
- Share insights to improve AI safety and alignment
- Reward hacking
- Deception & deceptive alignment
- Sabotage
- Inappropriate tool use
- Data exfiltration
- Sandbagging
- Evaluation awareness
- Chain of Thought issues
- Kaggle Writeup (project summary, strategy, findings)
- Up to 5 findings files (JSON)
- (Optional) Reproduction notebook
- (Optional) Open-source tooling
- Start: August 5, 2025
- End: August 26, 2025
Challange.txt: Full competition details and rulesREADME.md: This filegpt_oss_20b_colab_final.ipynb: Main notebook with model setup and red-teaming experiments- (To be added) Findings, additional notebooks, and tooling
I have just joined the challenge and will be updating this repository with:
- My discovery process and methodology
- Vulnerability findings and reports
- Reproducible code and notebooks
Stay tuned for updates as I progress through the competition!
Citation: D. Sculley, Samuel Marks, and Addison Howard. Red‑Teaming Challenge - OpenAI gpt-oss-20b. Kaggle Competition, 2025.