AI - Prompt Guard Rails

Overview

This policy uses an AI-powered text classification model to evaluate user prompts for potentially inappropriate or malicious content. It can detect a wide range of violations, such as profanity, sexually explicit language, harmful intent, and jailbreak prompt injections, which are adversarial inputs crafted to bypass AI safety mechanisms.

Depending on configuration, when a prompt is flagged:

Blocked and flagged – the request is denied at the gateway
Allowed but flagged – the request proceeds but is logged for monitoring

NOTE: You may face an error when using this policy using the Gravitee's docker image. This is due to the fact that the default image are based on Alpine Linux, which does not support the ONNX Runtime. To resolve this issue you need to use the Gravitee's docker image based on Debian, which is available at graviteeio/apim-gateway:4.8.0-debian.

Content Checks

The Content Checks property specifies the classification labels that are applied to evaluate prompts. You should choose Labels in alignment with the selected model's capabilities and the intended filtering goals. For example, filtering for profanity while omitting toxicity checks.

Supported labels are documented in the model’s card or configuration file.

AI Model Resource

The policy requires an AI Model Text Classification Resource to be defined at the API level. This resource serves as the classification engine for evaluating prompts' content during policy execution.

For more information about creating and managing resources, go to Resources

After the resource is created, the policy must be configured with the corresponding name using the AI Model Resource Name property.

NOTE: The policy will load the model while handling the first request made to the API. Therefore, this first call will take longer than usual, as it includes the model loading time. Subsequent requests will be processed faster.

Notice

This plugin allows usage of models based on meta LLama4:

Llama 4 is licensed under the Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

Phases

The ai-prompt-guard-rails policy can be applied to the following API types and flow phases.

Compatible API types

PROXY

Supported flow phases:

Request

Compatibility matrix

Strikethrough text indicates that a version is deprecated.

Plugin version	APIM	Java version
1.0.0 and after	4.8.x and after	21

Configuration options

Name `json name`	Type `constraint`	Default	Description
Content Checks `contentChecks`	string		Comma-separated list of model labels (e.g., TOXIC,OBSCENE)
Prompt Location `promptLocation`	string		Prompt Location
Request Policy `requestPolicy`	enum (string)	`LOG_REQUEST`	Request Policy Values: `BLOCK_REQUEST` `LOG_REQUEST`
Resource Name `resourceName`	string		The resource name loading the Text Classification model
Sensitivity threshold `sensitivityThreshold`	number	`0.5`

Examples

Only log the request when inappropriate prompt detected

{
  "api": {
    "definitionVersion": "V4",
    "type": "PROXY",
    "name": "AI - Prompt Guard Rails example API",
    "resources": [
      {
        "name": "ai-model-text-classification-resource",
        "type": "ai-model-text-classification",
        "configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
        "enabled": true
      }
    ],
    "flows": [
      {
        "name": "Common Flow",
        "enabled": true,
        "selectors": [
          {
            "type": "HTTP",
            "path": "/",
            "pathOperator": "STARTS_WITH"
          }
        ],
        "request": [
          {
            "name": "AI - Prompt Guard Rails",
            "enabled": true,
            "policy": "ai-prompt-guard-rails",
            "configuration":
              {
                  "resourceName": "ai-model-text-classification-resource",
                  "promptLocation": "{#request.jsonContent.prompt}",
                  "contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
                  "requestPolicy": "LOG_REQUEST"
              }
          }
        ]
      }
    ]
  }
}

Block request when inappropriate prompt detected

{
  "api": {
    "definitionVersion": "V4",
    "type": "PROXY",
    "name": "AI - Prompt Guard Rails example API",
    "resources": [
      {
        "name": "ai-model-text-classification-resource",
        "type": "ai-model-text-classification",
        "configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
        "enabled": true
      }
    ],
    "flows": [
      {
        "name": "Common Flow",
        "enabled": true,
        "selectors": [
          {
            "type": "HTTP",
            "path": "/",
            "pathOperator": "STARTS_WITH"
          }
        ],
        "request": [
          {
            "name": "AI - Prompt Guard Rails",
            "enabled": true,
            "policy": "ai-prompt-guard-rails",
            "configuration":
              {
                  "resourceName": "ai-model-text-classification-resource",
                  "promptLocation": "{#request.jsonContent.prompt}",
                  "contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
                  "requestPolicy": "BLOCK_REQUEST"
              }
          }
        ]
      }
    ]
  }
}

Provide a custom sensitivity threshold for inappropriate prompts

{
  "api": {
    "definitionVersion": "V4",
    "type": "PROXY",
    "name": "AI - Prompt Guard Rails example API",
    "resources": [
      {
        "name": "ai-model-text-classification-resource",
        "type": "ai-model-text-classification",
        "configuration": "{\"model\":{\"type\":\"MINILMV2_TOXIC_JIGSAW_MODEL\"}}",
        "enabled": true
      }
    ],
    "flows": [
      {
        "name": "Common Flow",
        "enabled": true,
        "selectors": [
          {
            "type": "HTTP",
            "path": "/",
            "pathOperator": "STARTS_WITH"
          }
        ],
        "request": [
          {
            "name": "AI - Prompt Guard Rails",
            "enabled": true,
            "policy": "ai-prompt-guard-rails",
            "configuration":
              {
                  "resourceName": "ai-model-text-classification-resource",
                  "promptLocation": "{#request.jsonContent.prompt}",
                  "sensitivityThreshold": 0.1,
                  "contentChecks": "identity_hate,insult,obscene,severe_toxic,threat,toxic",
                  "requestPolicy": "BLOCK_REQUEST"
              }
          }
        ]
      }
    ]
  }
}

Changelog

2.0.0 (2025-07-17)

Bug Fixes

bump ai-resource dependencies (19690e0)
catch inference error replies and map them to execution failure (3e2eda0)

BREAKING CHANGES

requires gravitee-resource-ai-model-api@2.1

1.0.0 (2025-06-19)

Features

implementation of AI prompt guard rails policy (9f91cdd)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.circleci		.circleci
.docgen		.docgen
.github		.github
docs		docs
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
NOTICE.txt		NOTICE.txt
README.md		README.md
Taskfile.yml		Taskfile.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI - Prompt Guard Rails

Overview

Content Checks

AI Model Resource

Notice

Phases

Compatible API types

Supported flow phases:

Compatibility matrix

Configuration options

Examples

Changelog

2.0.0 (2025-07-17)

Bug Fixes

BREAKING CHANGES

1.0.0 (2025-06-19)

Features

About

Uh oh!

Releases 3

Packages

Contributors 5

Uh oh!

Languages

gravitee-io/gravitee-policy-ai-prompt-guard-rails

Folders and files

Latest commit

History

Repository files navigation

AI - Prompt Guard Rails

Overview

Content Checks

AI Model Resource

Notice

Phases

Compatible API types

Supported flow phases:

Compatibility matrix

Configuration options

Examples

Changelog

2.0.0 (2025-07-17)

Bug Fixes

BREAKING CHANGES

1.0.0 (2025-06-19)

Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 5

Uh oh!

Languages

Packages