Profanity detection library

This project is inspired by github.com/TwiN/go-away and github.com/finnbear/moderation. It also uses language data from the libs (with modifications).

This project is still in development and more tests are needed to ensure the accuracy. However, you may use it in your work as it can produce good results.

Highlights

Fully supports Unicode
Utilizes radix tree to improve performance

Installation

go get github.com/tiendc/go-profanity-out

How to use

import (
    profanityout "github.com/tiendc/go-profanity-out"
    profanityDataEN "github.com/tiendc/go-profanity-out/data/en"
)

detector := profanityout.NewProfanityDetector().
    WithProfaneWords(profanityDataEN.DefaultProfanities).          // required
    WithFalsePositiveWords(profanityDataEN.DefaultFalsePositives). // required
    WithSuspectWords(profanityDataEN.DefaultSuspects).             // required
    WithLeetSpeakCharacters(profanityDataEN.LeetSpeakCharacters).  // required
    WithSpecialCharacters(profanityDataEN.SpecialCharacters).      // required
    WithWildcardCharacters(profanityDataEN.WildcardCharacters).    // required
    WithSanitizeLeetSpeak(true).                                   // default: true
    WithSanitizeSpecialCharacters(true).                           // default: true
    WithSanitizeSpaces(true).                                      // default: true
    WithSanitizeRepeatedCharacters(true).                          // default: true
    WithSanitizeWildcardCharacters(true).                          // default: true
    WithSanitizeAccents(true).                                     // default: true
    WithProcessInputAsHTML(false).                                 // default: false
    WithConfidenceCalculator(calculator).                          // default: built-in
    WithCensorCharacter('*')                                       // default: *

// Scan for at most one profanity (result may contain found suspect words and/or false positives)
matches := detector.ScanProfanity("fuck this $h!!t") // profane: true

// Scan for all profanities
matches := detector.ScanAllProfanities("fuck this $h!!t") // profane: true

// Censor the profanities
res, matches := detector.Censor("fuck this $h!!t") // res == "**** this *****"

// WithSanitizeLeetSpeak: true
ScanProfanity("$h!t") // profane: true
// WithSanitizeLeetSpeak: false
ScanProfanity("$h!t") // profane: false

// WithSanitizeSpecialCharacters: true
ScanProfanity("sh_it") // profane: true
// WithSanitizeSpecialCharacters: false
ScanProfanity("sh_it") // profane: false

// WithSanitizeSpaces: true
ScanProfanity("f u c k") // profane: true
// WithSanitizeSpaces: false
ScanProfanity("f u c k") // profane: false

// WithSanitizeRepeatedCharacters: true
ScanProfanity("fuuuuck") // profane: true
// WithSanitizeRepeatedCharacters: false
ScanProfanity("fuuuuck") // profane: false

// WithSanitizeWildcardCharacters: true
// NOTE: wildcard characters can be in both input and/or profanity dictionary
ScanProfanity("f**k") // profane: true
WithProfaneWords([]string{"f*ck"}).ScanProfanity("fxck")     // profane: true
WithProfaneWords([]string{"*fuck*"}).ScanProfanity("xfuckx") // profane: true
// WithSanitizeWildcardCharacters: false
ScanProfanity("f**k")   // profane: false
ScanProfanity("fxck")   // profane: false
ScanProfanity("xfuckx") // profane: false

// WithSanitizeAccents: true
ScanProfanity("fúck") // profane: true
// WithSanitizeAccents: false
ScanProfanity("fúck") // profane: false

// WithProcessInputAsHTML: true
ScanProfanity("&lt;ock") // profane: true
// WithProcessInputAsHTML: false
ScanProfanity("&lt;ock") // profane: false

Benchmarks

Benchmark code

tiendc/go-profanity-out
tiendc/go-profanity-out-10      9024        129919 ns/op       44038 B/op        306 allocs/op 
TwiN/go-away
TwiN/go-away-10                 2745        415685 ns/op      444899 B/op        498 allocs/op
finnbear/moderation
finnbear/moderation-10         15432         77601 ns/op        2496 B/op         22 allocs/op

Help wanted

You are welcome to make pull requests for new functions and bug fixes.
It's really nice if you can add more input data for English and other languages.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
data/en		data/en
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
detector.go		detector.go
detector_test.go		detector_test.go
go.mod		go.mod
go.sum		go.sum
html_processing.go		html_processing.go
html_processing_test.go		html_processing_test.go
match.go		match.go
match_test.go		match_test.go
radix_tree.go		radix_tree.go
scanner.go		scanner.go
settings.go		settings.go
settings_test.go		settings_test.go
util.go		util.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Profanity detection library

Highlights

Installation

How to use

Benchmarks

Help wanted

License

About

Uh oh!

Releases

Packages

Languages

License

tiendc/go-profanity-out

Folders and files

Latest commit

History

Repository files navigation

Profanity detection library

Highlights

Installation

How to use

Benchmarks

Help wanted

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages