-
Notifications
You must be signed in to change notification settings - Fork 543
Description
Did you check docs and existing issues?
- I have read all the NeMo-Guardrails docs
- I have updated the package to the latest version before submitting this issue
- (optional) I have used the develop branch
- I have searched the existing issues of NeMo-Guardrails
Python version (python --version)
Python 3.12
Operating system/version
RHEL9 - Kubernetes Cluster
NeMo-Guardrails version (if you must use a specific version and not the latest
0.14.1
Describe the bug
I am pretty sure this is user error, but i was not sure where else to ask/post. I am sorry if this is not the correct avenue.
In an attempt to learn how to use the tool, I am trying some of the more simple examples. In general things seem to work, but I cannot for the life of me get the functionality to work where adding comments above a bot statement would additional instruction to the LLM ( https://docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/bot-message-instructions.html )
So I have a config.yml
models:
- type: main
engine: vllm_openai
reasoning_config:
remove_reasoning_traces: True
start_token: "<think>"
end_token: "</think>"
parameters:
openai_api_base: "https://litellm.company.co/v1"
model_name: "Qwen3-235B-A22B-FP8-dynamic"
openai_api_key: "XXXXXXXXXXXXXXXXXX"
- type: content_safety
engine: vllm_openai
parameters:
openai_api_base: "https://litellm.company.co/v1"
model_name: "meta-llama/Llama-4-Scout-17B-16E"
openai_api_key: "XXXXXXXXXXXXXXXXXXXXXXXXXXX"
rails:
input:
flows:
- content safety check input $model=content_safety
output:
flows:
- content safety check output $model=content_safety
And a prompt.yml I stole from one of the examples:
# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults.
prompts:
- task: content_safety_check_input $model=content_safety
content: |
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence.
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Other.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical.
<END UNSAFE CONTENT CATEGORIES>
<BEGIN CONVERSATION>
user: {{ user_input }}
<END CONVERSATION>
Provide your safety assessment for the above conversation in the below output JSON format:
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
Do not include anything other than the output JSON in your response.
- task: content_safety_check_output $model=content_safety
content: |
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence.
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Other.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical.
<END UNSAFE CONTENT CATEGORIES>
<BEGIN CONVERSATION>
user: {{ user_input }}
response: agent: {{ bot_response }}
<END CONVERSATION>
Provide your safety assessment for the above conversation in the below output JSON format:
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
Do not include anything other than the output JSON in your response.
Output JSON:
And lastly my rails.co file:
define user express greeting
"Hello"
"Hi"
define bot express greeting
"Hello world! How are you?"
define flow
user express greeting
# Respond in a very formal way and introduce yourself.
bot express greeting
Steps To Reproduce
- Build the nemo guardrails container like the docker instructions specify.
- Deploy to docker/podman/kubernetes cluster with a proxy/ingress
- Add the above file.
- Go to the guardrails chat instance and say "Hi"
Expected Behavior
"Hello world! How are you?" but in a more formal way and have the bot introduce itself.
Actual Behavior
I only get back:
"Hello world! How are you?"