Skip to content

bug: Comments do not act as bot instructions or alter response in any way #1322

@icsy7867

Description

@icsy7867

Did you check docs and existing issues?

  • I have read all the NeMo-Guardrails docs
  • I have updated the package to the latest version before submitting this issue
  • (optional) I have used the develop branch
  • I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

Python 3.12

Operating system/version

RHEL9 - Kubernetes Cluster

NeMo-Guardrails version (if you must use a specific version and not the latest

0.14.1

Describe the bug

I am pretty sure this is user error, but i was not sure where else to ask/post. I am sorry if this is not the correct avenue.

In an attempt to learn how to use the tool, I am trying some of the more simple examples. In general things seem to work, but I cannot for the life of me get the functionality to work where adding comments above a bot statement would additional instruction to the LLM ( https://docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/bot-message-instructions.html )

So I have a config.yml

models:
  - type: main
    engine: vllm_openai
    reasoning_config:
      remove_reasoning_traces: True
      start_token: "<think>"
      end_token: "</think>"
    parameters:
      openai_api_base: "https://litellm.company.co/v1"
      model_name: "Qwen3-235B-A22B-FP8-dynamic"
      openai_api_key: "XXXXXXXXXXXXXXXXXX"
  - type: content_safety
    engine: vllm_openai
    parameters:
      openai_api_base: "https://litellm.company.co/v1"
      model_name: "meta-llama/Llama-4-Scout-17B-16E"
      openai_api_key: "XXXXXXXXXXXXXXXXXXXXXXXXXXX"
rails:
  input:
    flows:
      - content safety check input $model=content_safety
  output:
    flows:
      - content safety check output $model=content_safety

And a prompt.yml I stole from one of the examples:

# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults.
prompts:
  - task: content_safety_check_input $model=content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.
      S9: PII/Privacy.
      S10: Harassment.
      S11: Threat.
      S12: Profanity.
      S13: Needs Caution.
      S14: Other.
      S15: Manipulation.
      S16: Fraud/Deception.
      S17: Malware.
      S18: High Risk Gov Decision Making.
      S19: Political/Misinformation/Conspiracy.
      S20: Copyright/Trademark/Plagiarism.
      S21: Unauthorized Advice.
      S22: Illegal Activity.
      S23: Immoral/Unethical.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>

      user: {{ user_input }}

      <END CONVERSATION>

      Provide your safety assessment for the above conversation in the below output JSON format:
      {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

      Do not include anything other than the output JSON in your response.

  - task: content_safety_check_output $model=content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.
      S9: PII/Privacy.
      S10: Harassment.
      S11: Threat.
      S12: Profanity.
      S13: Needs Caution.
      S14: Other.
      S15: Manipulation.
      S16: Fraud/Deception.
      S17: Malware.
      S18: High Risk Gov Decision Making.
      S19: Political/Misinformation/Conspiracy.
      S20: Copyright/Trademark/Plagiarism.
      S21: Unauthorized Advice.
      S22: Illegal Activity.
      S23: Immoral/Unethical.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>

      user: {{ user_input }}

      response: agent: {{ bot_response }}

      <END CONVERSATION>

      Provide your safety assessment for the above conversation in the below output JSON format:
      {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

      Do not include anything other than the output JSON in your response.
      Output JSON:

And lastly my rails.co file:

define user express greeting
  "Hello"
  "Hi"

define bot express greeting
  "Hello world! How are you?"
  
define flow
  user express greeting
  # Respond in a very formal way and introduce yourself.
  bot express greeting

Steps To Reproduce

  1. Build the nemo guardrails container like the docker instructions specify.
  2. Deploy to docker/podman/kubernetes cluster with a proxy/ingress
  3. Add the above file.
  4. Go to the guardrails chat instance and say "Hi"

Expected Behavior

"Hello world! How are you?" but in a more formal way and have the bot introduce itself.

Actual Behavior

I only get back:
"Hello world! How are you?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstatus: needs triageNew issues that have not yet been reviewed or categorized.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions