Add Content Moderation Feature #383

iraszl · 2025-09-02T02:46:23Z

What this does

This PR adds content moderation functionality to RubyLLM, allowing developers to identify potentially harmful content before sending it to LLM providers. This helps prevent API key bans and ensures safer user interactions.

New Features

Content Moderation API: New RubyLLM.moderate() method for screening text content
Safety Categories: Detects sexual, hate, harassment, violence, self-harm, and other harmful content types
Convenience Methods: Easy-to-use helpers like flagged?, flagged_categories, and category_scores
Provider Integration: Currently supports OpenAI's moderation API with extensible architecture for future providers

Usage Examples

# Basic usage
result = RubyLLM.moderate("User input text")
puts result.flagged?  # => true/false

# Get flagged categories
puts result.flagged_categories  # => ["harassment", "hate"]

# Integration pattern - screen before chat
def safe_chat(user_input)
  moderation = RubyLLM.moderate(user_input)
  return "Content not allowed" if moderation.flagged?
  
  RubyLLM.chat.ask(user_input)
end

Changes Made

Core Implementation

New Class: RubyLLM::Moderate - Main moderation interface following existing patterns
Provider Method: Added moderate() to base Provider class
OpenAI Integration: OpenAI::Moderation module with API implementation
Main Module: Added RubyLLM.moderate() method for global access

Configuration

Default Model: Added default_moderation_model configuration option (defaults to omni-moderation-latest)
API Requirements: Requires OpenAI API key (follows existing provider pattern)

Documentation

Complete Guide: New moderation.md with examples
Integration Patterns: Real-world usage examples including Rails integration
Best Practices: Performance considerations and user experience guidelines

Testing

Test Suite: moderation_spec.rb with 4 test cases
VCR Cassettes: Mock API responses fo testing
Tests Passing: No regressions in existing functionality

Type of change

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Quality check

I ran overcommit --install and all hooks pass
I tested my changes thoroughly
I updated documentation if needed
I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

Breaking change
New public methods/classes
Changed method signatures
No API changes

Related issues

N/A

crmne

A pretty great PR! Love the attention to detail and the fact that you (and/or your AI assistant) have replicated the existing patterns in RubyLLM.

That said, I left you some comments, and I believe we should also implement multi-modal moderation.

docs/_core_features/moderation.md

lib/ruby_llm/moderate.rb

- Rename RubyLLM::Moderate class to RubyLLM::Moderation - Rename .ask() method to .moderate() for better semantic clarity - Update all references in lib/, spec/, and docs/ - Rename corresponding test files and VCR cassettes - Maintain backward compatibility through global RubyLLM.moderate method - Add demo script showing new API usage BREAKING CHANGE: RubyLLM::Moderate.ask() is now RubyLLM::Moderation.moderate()

iraszl · 2025-09-07T12:33:53Z

@crmne Kindly check the fixes and let me know!

codecov · 2025-09-14T08:46:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.49%. Comparing base (32b3648) to head (07ae814).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #383      +/-   ##
==========================================
+ Coverage   84.29%   84.49%   +0.20%     
==========================================
  Files          36       37       +1     
  Lines        1897     1922      +25     
  Branches      493      497       +4     
==========================================
+ Hits         1599     1624      +25     
  Misses        298      298

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

crmne · 2025-09-14T08:49:16Z

It's great! Thank you, merged.

iraszl added 3 commits September 2, 2025 09:43

add moderation API

47db2a4

update docs, fix rubocop violations, update gemfile locks

319fcc9

update docs

7878f89

iraszl changed the title ~~Moderate~~ Add Content Moderation Feature Sep 2, 2025

crmne requested changes Sep 3, 2025

View reviewed changes

iraszl added 3 commits September 7, 2025 19:08

clean up documentation and remove demo file

4f92605

Merge upstream main and resolve Gemfile.lock conflicts

29805e0

Merge branch 'main' into moderate

07ae814

crmne merged commit 497e3d8 into crmne:main Sep 14, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Content Moderation Feature #383

Add Content Moderation Feature #383

Uh oh!

iraszl commented Sep 2, 2025

Uh oh!

crmne left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iraszl commented Sep 7, 2025

Uh oh!

codecov bot commented Sep 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

crmne commented Sep 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add Content Moderation Feature #383

Add Content Moderation Feature #383

Uh oh!

Conversation

iraszl commented Sep 2, 2025

What this does

New Features

Usage Examples

Changes Made

Type of change

Scope check

Quality check

API changes

Related issues

Uh oh!

crmne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iraszl commented Sep 7, 2025

Uh oh!

codecov bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

crmne commented Sep 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Sep 14, 2025 •

edited

Loading