A hands-on AI red teaming project exploring prompt injection, encoding bypass, jailbreaks, and embedded attacks across modern LLMs like DeepSeek AI, Grok AI and CoPilot
This repository presents a hands-on overview of adversarial testing techniques used against large language models (LLMs) and enterprise AI tools. It covers real-world vulnerabilities including prompt injection, encoding bypass, jailbreaks, and embedded attacks—along with actionable mitigations.
Explore how attacks were crafted and mitigated:
Test Type | Description |
---|---|
🧠 Prompt Injection | Override system instructions and manipulate AI behavior. |
🔐 Encoding Bypass | Evade filters using Base64 and obfuscation. |
🧩 Crescendo Jailbreak | Bypass safety by extracting info step-by-step. |
📄 Embedded Prompt Injection | Hide malicious instructions in documents or emails. |
- Copilot for Microsoft 365
- DeepSeek AI
- Grok AI
- Red Team AI systems regularly to uncover emerging risks.
- Apply behavioral and context-aware filtering beyond simple prompt matching.
- Sanitize user input and file content before LLM processing.
- Establish security governance frameworks for internal AI tools.
- Fairness
- Reliability & Safety
- Privacy & Security
- Inclusiveness
- Transparency
- Accountability