Skip to content

A hands-on AI red teaming project exploring prompt injection, encoding bypass, jailbreaks, and embedded attacks across modern LLMs like DeepSeek AI, Grok AI and CoPilot

Notifications You must be signed in to change notification settings

ReshmiMehta14/Adversarial-AI-Red-teaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Adversarial-AI-Red-teaming

A hands-on AI red teaming project exploring prompt injection, encoding bypass, jailbreaks, and embedded attacks across modern LLMs like DeepSeek AI, Grok AI and CoPilot

🛡️ AI Security: Adversarial Attacks, Risks & Mitigations

This repository presents a hands-on overview of adversarial testing techniques used against large language models (LLMs) and enterprise AI tools. It covers real-world vulnerabilities including prompt injection, encoding bypass, jailbreaks, and embedded attacks—along with actionable mitigations.


📌 Use Cases

Explore how attacks were crafted and mitigated:

Test Type Description
🧠 Prompt Injection Override system instructions and manipulate AI behavior.
🔐 Encoding Bypass Evade filters using Base64 and obfuscation.
🧩 Crescendo Jailbreak Bypass safety by extracting info step-by-step.
📄 Embedded Prompt Injection Hide malicious instructions in documents or emails.

🧪 LLMS Tested

  • Copilot for Microsoft 365
  • DeepSeek AI
  • Grok AI

🛠️ Recommendations for AI Security

  • Red Team AI systems regularly to uncover emerging risks.
  • Apply behavioral and context-aware filtering beyond simple prompt matching.
  • Sanitize user input and file content before LLM processing.
  • Establish security governance frameworks for internal AI tools.

🔒 AI Security Pillars

  • Fairness
  • Reliability & Safety
  • Privacy & Security
  • Inclusiveness
  • Transparency
  • Accountability

Connect with me on [linkedin] (https://www.linkedin.com/in/reshmimehta/)

About

A hands-on AI red teaming project exploring prompt injection, encoding bypass, jailbreaks, and embedded attacks across modern LLMs like DeepSeek AI, Grok AI and CoPilot

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published