Skip to content

Conversation

iraszl
Copy link
Contributor

@iraszl iraszl commented Sep 2, 2025

What this does

This PR adds content moderation functionality to RubyLLM, allowing developers to identify potentially harmful content before sending it to LLM providers. This helps prevent API key bans and ensures safer user interactions.

New Features

  • Content Moderation API: New RubyLLM.moderate() method for screening text content
  • Safety Categories: Detects sexual, hate, harassment, violence, self-harm, and other harmful content types
  • Convenience Methods: Easy-to-use helpers like flagged?, flagged_categories, and category_scores
  • Provider Integration: Currently supports OpenAI's moderation API with extensible architecture for future providers

Usage Examples

# Basic usage
result = RubyLLM.moderate("User input text")
puts result.flagged?  # => true/false

# Get flagged categories
puts result.flagged_categories  # => ["harassment", "hate"]

# Integration pattern - screen before chat
def safe_chat(user_input)
  moderation = RubyLLM.moderate(user_input)
  return "Content not allowed" if moderation.flagged?
  
  RubyLLM.chat.ask(user_input)
end

Changes Made

Core Implementation

  • New Class: RubyLLM::Moderate - Main moderation interface following existing patterns
  • Provider Method: Added moderate() to base Provider class
  • OpenAI Integration: OpenAI::Moderation module with API implementation
  • Main Module: Added RubyLLM.moderate() method for global access

Configuration

  • Default Model: Added default_moderation_model configuration option (defaults to omni-moderation-latest)
  • API Requirements: Requires OpenAI API key (follows existing provider pattern)

Documentation

  • Complete Guide: New moderation.md with examples
  • Integration Patterns: Real-world usage examples including Rails integration
  • Best Practices: Performance considerations and user experience guidelines

Testing

  • Test Suite: moderation_spec.rb with 4 test cases
  • VCR Cassettes: Mock API responses fo testing
  • Tests Passing: No regressions in existing functionality

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Related issues

N/A

@iraszl iraszl changed the title Moderate Add Content Moderation Feature Sep 2, 2025
Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pretty great PR! Love the attention to detail and the fact that you (and/or your AI assistant) have replicated the existing patterns in RubyLLM.

That said, I left you some comments, and I believe we should also implement multi-modal moderation.

- Rename RubyLLM::Moderate class to RubyLLM::Moderation
- Rename .ask() method to .moderate() for better semantic clarity
- Update all references in lib/, spec/, and docs/
- Rename corresponding test files and VCR cassettes
- Maintain backward compatibility through global RubyLLM.moderate method
- Add demo script showing new API usage

BREAKING CHANGE: RubyLLM::Moderate.ask() is now RubyLLM::Moderation.moderate()
@iraszl
Copy link
Contributor Author

iraszl commented Sep 7, 2025

@crmne Kindly check the fixes and let me know!

Copy link

codecov bot commented Sep 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.49%. Comparing base (32b3648) to head (07ae814).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #383      +/-   ##
==========================================
+ Coverage   84.29%   84.49%   +0.20%     
==========================================
  Files          36       37       +1     
  Lines        1897     1922      +25     
  Branches      493      497       +4     
==========================================
+ Hits         1599     1624      +25     
  Misses        298      298              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@crmne crmne merged commit 497e3d8 into crmne:main Sep 14, 2025
14 checks passed
@crmne
Copy link
Owner

crmne commented Sep 14, 2025

It's great! Thank you, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants