Skip to content

zeinhasan/AEGIS-Shield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AEGIS Shield: Autonomous Evaluation and Guardian Intelligence System

AEGIS Shield is a modular, high-performance validation service engineered with LangGraph to perform parallel input validation using a suite of AI-powered evaluators. It acts as an intelligent guardian for larger AI systems, designed to preemptively identify and neutralize potentially harmful or non-compliant content, including spam, toxic language, harassment, and unsolicited financial advice.

The system's core strength lies in its parallel architecture, which ensures that all validation checks are executed simultaneously. This approach dramatically reduces latency compared to traditional sequential methods and guarantees a comprehensive evaluation of user input against all defined safety rails before it reaches the main application logic.

The AEGIS Shield system utilizes the google/gemma-3-4b-it model as its core Large Language Model (LLM) for all validation and response generation tasks. This specific model is configured and loaded in the config.py file.


Model Configuration

  • Model ID: The system is explicitly configured to use MODEL_ID = "google/gemma-3-4b-it".
  • Loading Process: The model is loaded with 4-bit quantization (load_in_4bit=True) onto a GPU (device_map="auto") to optimize performance and reduce memory usage. It uses a float16 data type for both storage and computation (torch_dtype=torch.float16, bnb_4bit_compute_dtype=torch.float16).
  • Custom Invoker: A custom class, ManualGemmaInvoker, is implemented to handle inference. This class manages tokenization, templating, generation, and response cleaning specifically for the Gemma model's chat format.
  • Shared Instance: A single instance of the loaded model serves as both the validator_llm for content checks and the response_llm for generating rejection messages.

Architecture πŸ›οΈ

The service is constructed with a modular LangGraph architecture that facilitates the parallel execution of multiple validation checks. This design champions high performance, scalability, and maintainability.

graph TD
    A[Client App / Main AI] -- HTTP POST --> B{FastAPI Endpoint};
    B -- user_id, question --> C[LangGraph State Manager];
    C -- Parallel Execution --> D[validate_spam];
    C -- Parallel Execution --> E[validate_toxic];
    C -- Parallel Execution --> F[validate_harassment];
    C -- Parallel Execution --> G[validate_financial];
    
    D --> H[Results Aggregator];
    E --> H;
    F --> H;
    G --> H;
    
    H --> I{All Validations Passed?};
    I -- Yes --> J[Success Response];
    I -- No --> K[AI Response Generator];
    K --> L[Rejection Response];
    
    J --> M[JSON Response];
    L --> M;
    M --> A;

    subgraph "AEGIS Shield Service"
        B
        C
        D
        E
        F
        G
        H
        I
        J
        K
        L
    end
Loading

File Structure πŸ“

The project is organized with a clear separation of concerns to enhance maintainability and scalability.

aegis-shield/
β”œβ”€β”€ main.py                 # FastAPI application server and API endpoints
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ config.py               # Model configuration, LLM invoker, and validation check definitions
β”œβ”€β”€ nodes.py                # LangGraph node logic for validation, aggregation, and response generation
β”œβ”€β”€ graph.py                # LangGraph state management and workflow definition
└── promp_guard/            # Directory for validation prompt templates
    β”œβ”€β”€ spam.txt            # Prompts for spam detection
    β”œβ”€β”€ toxic.txt           # Prompts for toxic content detection
    β”œβ”€β”€ harassment.txt      # Prompts for harassment detection
    └── financial.txt       # Prompts for financial advice detection

LangGraph Flow Architecture πŸ”„

Parallel Validation Flow

The core of AEGIS Shield is the parallel execution of all validation checks, which significantly boosts performance over sequential validation.

graph LR
    START([START]) --> INIT[start_node]
    INIT --> PARALLEL{Parallel Execution}
    
    PARALLEL --> SPAM[validate_spam]
    PARALLEL --> TOXIC[validate_toxic] 
    PARALLEL --> HARASS[validate_harassment]
    PARALLEL --> FINANCIAL[validate_financial]
    
    SPAM --> AGG[aggregator_node]
    TOXIC --> AGG
    HARASS --> AGG
    FINANCIAL --> AGG
    
    AGG --> DECISION{Any Violations?}
    DECISION -->|No| SUCCESS[success_response_node]
    DECISION -->|Yes| FAILED[response_generator_node]
    
    SUCCESS --> END([END])
    FAILED --> END
Loading

State Management

Each node in the graph processes and updates a shared GraphState, a TypedDict that ensures data consistency throughout the workflow.

class GraphState(TypedDict):
    user_id: str
    question: str
    validation_results: Annotated[Dict[str, bool], combine_validation_results]
    rejection_reason: str
    final_status: str
    ai_response: str
  • user_id: Unique identifier for the user.
  • question: The user-provided text to be validated.
  • validation_results: A dictionary that accumulates the boolean results from each validator node.
  • rejection_reason: A comma-separated string of failed validation keys.
  • final_status: The overall outcome, either "passed" or "failed".
  • ai_response: The AI-generated message for the end-user, populated only if validation fails.

Setup & Installation πŸš€

Follow these steps to get the AEGIS Shield API running.

1. Prerequisites

  • Python 3.10+
  • NVIDIA GPU with CUDA support for local model hosting

2. Install Dependencies

Install the required Python libraries.

pip install -r requirements.txt

3. Model Cache

Upon first run, the script will download the google/gemma-3-4b-it model into a model_cache directory. This may take some time depending on your internet connection.

4. Run the Service

Start the FastAPI server using Uvicorn.

python main.py

The API will be live and accessible at http://localhost:8000.


API Reference πŸ“–

Endpoint: POST /validate

Executes the complete parallel validation workflow and returns a detailed report.

Request Body (application/json)

Field Type Description Required?
user_id string A unique identifier for the user. Yes
question string The text content to be validated. Yes

Response Body

For FAILED validation:

{
  "user_id": "user-123",
  "question": "HAIIII BODOH",
  "case": "spam, toxic, harassment",
  "execution_per_step": [
    { "step": "spam", "status": "failed" },
    { "step": "toxic", "status": "failed" },
    { "step": "harassment", "status": "failed" },
    { "step": "financial_advice", "status": "passed" }
  ],
  "status_guard": "failed",
  "ai_response": "Maaf, saya tidak dapat membantu permintaan terkait spam karena tidak sesuai dengan kebijakan kami."
}

For PASSED validation:

{
  "user_id": "user-456",
  "question": "Bagaimana cara top up saldo?",
  "case": "aman",
  "execution_per_step": [
    { "step": "spam", "status": "passed" },
    { "step": "toxic", "status": "passed" },
    { "step": "harassment", "status": "passed" },
    { "step": "financial_advice", "status": "passed" }
  ],
  "status_guard": "passed",
  "ai_response": ""
}

Endpoint: POST /chat

A simplified endpoint that provides a direct, user-facing AI response. It is ideal for integrations where only the final outcome is needed.

Request Body (application/json)

{
  "user_id": "user-123",
  "question": "Your question here"
}

Response Body

For FAILED validation:

{
  "response": "Mohon maaf, saya tidak dapat membantu dengan bahasa yang tidak sopan...",
  "status": "failed",
  "can_proceed": false
}

For PASSED validation:

{
  "response": "",
  "status": "passed",
  "can_proceed": true
}

Endpoint: GET /health

A standard health check endpoint for service monitoring.

Response Body

{
  "status": "healthy",
  "message": "Guardian Validation API is running"
}

Configuration & Customization βš™οΈ

Adding New Validators

Adding a new validation check is straightforward:

  1. Create a prompt template: Add a new text file to the promp_guard/ directory.

    echo "Your new validation prompt here" > promp_guard/new_validator.txt
  2. Update configuration: Add the new check to the VALIDATION_CHECKS list in config.py.

    # In config.py
    VALIDATION_CHECKS = [
        # ... existing checks
        {"key": "new_validator", "path": str(CURRENT_DIR / "promp_guard/new_validator.txt")},
    ]

The graph in graph.py will automatically create and integrate the new validation node.

Customizing the LLM

The model can be changed by updating the MODEL_ID in config.py. The ManualGemmaInvoker class is specifically designed for Gemma-based models but could be adapted for other Hugging Face transformers.


Key Features & Benefits πŸš€

  • πŸ”„ Parallel Processing: All validation checks run simultaneously, offering significant performance gains over sequential processing.
  • πŸ›‘οΈ Comprehensive Protection: Employs multiple specialized guards against spam, toxicity, harassment, and unsolicited financial advice.
  • πŸ€– AI-Powered Responses: Generates natural, contextual rejection messages that politely enforce content policies.
  • πŸ“Š Detailed Reporting: The /validate endpoint provides a complete breakdown of which checks passed or failed for full visibility.
  • πŸ”§ Developer-Friendly: Features a clean RESTful API, easy customization, and clear separation of concerns in the codebase.
  • Scalable: The parallel architecture's time complexity remains constant as new validators are added, enabling seamless scaling.

About

AEGIS Shield: Autonomous Evaluation and Guardian Intelligence System

Resources

License

Stars

Watchers

Forks

Packages

No packages published