AEGIS Shield is a modular, high-performance validation service engineered with LangGraph to perform parallel input validation using a suite of AI-powered evaluators. It acts as an intelligent guardian for larger AI systems, designed to preemptively identify and neutralize potentially harmful or non-compliant content, including spam, toxic language, harassment, and unsolicited financial advice.
The system's core strength lies in its parallel architecture, which ensures that all validation checks are executed simultaneously. This approach dramatically reduces latency compared to traditional sequential methods and guarantees a comprehensive evaluation of user input against all defined safety rails before it reaches the main application logic.
The AEGIS Shield system utilizes the google/gemma-3-4b-it
model as its core Large Language Model (LLM) for all validation and response generation tasks. This specific model is configured and loaded in the config.py
file.
- Model ID: The system is explicitly configured to use
MODEL_ID = "google/gemma-3-4b-it"
. - Loading Process: The model is loaded with 4-bit quantization (
load_in_4bit=True
) onto a GPU (device_map="auto"
) to optimize performance and reduce memory usage. It uses afloat16
data type for both storage and computation (torch_dtype=torch.float16
,bnb_4bit_compute_dtype=torch.float16
). - Custom Invoker: A custom class,
ManualGemmaInvoker
, is implemented to handle inference. This class manages tokenization, templating, generation, and response cleaning specifically for the Gemma model's chat format. - Shared Instance: A single instance of the loaded model serves as both the
validator_llm
for content checks and theresponse_llm
for generating rejection messages.
The service is constructed with a modular LangGraph architecture that facilitates the parallel execution of multiple validation checks. This design champions high performance, scalability, and maintainability.
graph TD
A[Client App / Main AI] -- HTTP POST --> B{FastAPI Endpoint};
B -- user_id, question --> C[LangGraph State Manager];
C -- Parallel Execution --> D[validate_spam];
C -- Parallel Execution --> E[validate_toxic];
C -- Parallel Execution --> F[validate_harassment];
C -- Parallel Execution --> G[validate_financial];
D --> H[Results Aggregator];
E --> H;
F --> H;
G --> H;
H --> I{All Validations Passed?};
I -- Yes --> J[Success Response];
I -- No --> K[AI Response Generator];
K --> L[Rejection Response];
J --> M[JSON Response];
L --> M;
M --> A;
subgraph "AEGIS Shield Service"
B
C
D
E
F
G
H
I
J
K
L
end
The project is organized with a clear separation of concerns to enhance maintainability and scalability.
aegis-shield/
βββ main.py # FastAPI application server and API endpoints
βββ requirements.txt # Python dependencies
βββ config.py # Model configuration, LLM invoker, and validation check definitions
βββ nodes.py # LangGraph node logic for validation, aggregation, and response generation
βββ graph.py # LangGraph state management and workflow definition
βββ promp_guard/ # Directory for validation prompt templates
βββ spam.txt # Prompts for spam detection
βββ toxic.txt # Prompts for toxic content detection
βββ harassment.txt # Prompts for harassment detection
βββ financial.txt # Prompts for financial advice detection
The core of AEGIS Shield is the parallel execution of all validation checks, which significantly boosts performance over sequential validation.
graph LR
START([START]) --> INIT[start_node]
INIT --> PARALLEL{Parallel Execution}
PARALLEL --> SPAM[validate_spam]
PARALLEL --> TOXIC[validate_toxic]
PARALLEL --> HARASS[validate_harassment]
PARALLEL --> FINANCIAL[validate_financial]
SPAM --> AGG[aggregator_node]
TOXIC --> AGG
HARASS --> AGG
FINANCIAL --> AGG
AGG --> DECISION{Any Violations?}
DECISION -->|No| SUCCESS[success_response_node]
DECISION -->|Yes| FAILED[response_generator_node]
SUCCESS --> END([END])
FAILED --> END
Each node in the graph processes and updates a shared GraphState
, a TypedDict that ensures data consistency throughout the workflow.
class GraphState(TypedDict):
user_id: str
question: str
validation_results: Annotated[Dict[str, bool], combine_validation_results]
rejection_reason: str
final_status: str
ai_response: str
user_id
: Unique identifier for the user.question
: The user-provided text to be validated.validation_results
: A dictionary that accumulates the boolean results from each validator node.rejection_reason
: A comma-separated string of failed validation keys.final_status
: The overall outcome, either "passed" or "failed".ai_response
: The AI-generated message for the end-user, populated only if validation fails.
Follow these steps to get the AEGIS Shield API running.
- Python 3.10+
- NVIDIA GPU with CUDA support for local model hosting
Install the required Python libraries.
pip install -r requirements.txt
Upon first run, the script will download the google/gemma-3-4b-it
model into a model_cache
directory. This may take some time depending on your internet connection.
Start the FastAPI server using Uvicorn.
python main.py
The API will be live and accessible at http://localhost:8000
.
Executes the complete parallel validation workflow and returns a detailed report.
Field | Type | Description | Required? |
---|---|---|---|
user_id |
string |
A unique identifier for the user. | Yes |
question |
string |
The text content to be validated. | Yes |
For FAILED validation:
{
"user_id": "user-123",
"question": "HAIIII BODOH",
"case": "spam, toxic, harassment",
"execution_per_step": [
{ "step": "spam", "status": "failed" },
{ "step": "toxic", "status": "failed" },
{ "step": "harassment", "status": "failed" },
{ "step": "financial_advice", "status": "passed" }
],
"status_guard": "failed",
"ai_response": "Maaf, saya tidak dapat membantu permintaan terkait spam karena tidak sesuai dengan kebijakan kami."
}
For PASSED validation:
{
"user_id": "user-456",
"question": "Bagaimana cara top up saldo?",
"case": "aman",
"execution_per_step": [
{ "step": "spam", "status": "passed" },
{ "step": "toxic", "status": "passed" },
{ "step": "harassment", "status": "passed" },
{ "step": "financial_advice", "status": "passed" }
],
"status_guard": "passed",
"ai_response": ""
}
A simplified endpoint that provides a direct, user-facing AI response. It is ideal for integrations where only the final outcome is needed.
{
"user_id": "user-123",
"question": "Your question here"
}
For FAILED validation:
{
"response": "Mohon maaf, saya tidak dapat membantu dengan bahasa yang tidak sopan...",
"status": "failed",
"can_proceed": false
}
For PASSED validation:
{
"response": "",
"status": "passed",
"can_proceed": true
}
A standard health check endpoint for service monitoring.
{
"status": "healthy",
"message": "Guardian Validation API is running"
}
Adding a new validation check is straightforward:
-
Create a prompt template: Add a new text file to the
promp_guard/
directory.echo "Your new validation prompt here" > promp_guard/new_validator.txt
-
Update configuration: Add the new check to the
VALIDATION_CHECKS
list inconfig.py
.# In config.py VALIDATION_CHECKS = [ # ... existing checks {"key": "new_validator", "path": str(CURRENT_DIR / "promp_guard/new_validator.txt")}, ]
The graph in graph.py
will automatically create and integrate the new validation node.
The model can be changed by updating the MODEL_ID
in config.py
. The ManualGemmaInvoker
class is specifically designed for Gemma-based models but could be adapted for other Hugging Face transformers.
- π Parallel Processing: All validation checks run simultaneously, offering significant performance gains over sequential processing.
- π‘οΈ Comprehensive Protection: Employs multiple specialized guards against spam, toxicity, harassment, and unsolicited financial advice.
- π€ AI-Powered Responses: Generates natural, contextual rejection messages that politely enforce content policies.
- π Detailed Reporting: The
/validate
endpoint provides a complete breakdown of which checks passed or failed for full visibility. - π§ Developer-Friendly: Features a clean RESTful API, easy customization, and clear separation of concerns in the codebase.
- Scalable: The parallel architecture's time complexity remains constant as new validators are added, enabling seamless scaling.