sensitive

package module
v0.0.0-...-6e91ce5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 1, 2025 License: MIT Imports: 12 Imported by: 0

README

Sensitive

English | 简体中文

High-performance sensitive word detection library for Go using Aho-Corasick automaton.

Features

  • High Performance - Double Array Trie with AC automaton, O(n) complexity
  • High Concurrency - sync.RWMutex + sync.Pool, 6x faster than alternatives
  • Zero Allocation - Hot path (Contains, FindFirst) with 0 allocs
  • Multi-Language - Full Unicode support (CJK, Cyrillic, Arabic, etc.)
  • Thread-Safe - Concurrent reads after Build()
  • Fluent API - Clean builder pattern
  • Built-in Dictionaries - 64K+ words included
  • Flexible Filtering - Mask, replace, or remove matches
  • Chinese Support - Traditional/Simplified conversion, Full-width/Half-width

Installation

go get github.com/Done-0/sensitive

Quick Start

⚠️ Important: This library does NOT load any dictionaries by default. You must explicitly load dictionaries.

Option 1: Use Built-in Dictionaries
detector := sensitive.NewBuilder().
    LoadAllEmbedded().
    MustBuild()
Option 2: Use Your Own Dictionary Files
detector := sensitive.NewBuilder().
    LoadDict("path/to/your/dict.txt").
    LoadDict("path/to/another/dict.txt").
    MustBuild()
Option 3: Add Words Manually
detector := sensitive.NewBuilder().
    AddWord("badword", sensitive.LevelHigh).
    AddWord("spam", sensitive.LevelLow).
    MustBuild()
Option 4: Combine All
detector := sensitive.NewBuilder().
    LoadAllEmbedded().                // Built-in dictionaries
    LoadDict("custom/my_words.txt").  // Your dictionary files
    AddWord("special", sensitive.LevelHigh). // Manual words
    MustBuild()

Built-in Dictionaries

The library embeds 6 dictionaries:

Constant File Level Words Description
DictHighPolitics high_politics.txt High ~325 Political content
DictHighPornography high_pornography.txt High ~303 Pornographic content
DictHighViolence high_violence.txt High ~436 Violence/weapons/explosives
DictMediumGeneral medium_general.txt Medium ~48K General sensitive words
DictLowAd low_ad.txt Low ~122 Advertising
DictLowURL low_url.txt Low ~14K URL blacklist

Usage

1. Create Detector
detector := sensitive.NewBuilder().
    WithFilterStrategy(sensitive.StrategyMask).
    WithReplaceChar('*').
    WithCaseSensitive(false).
    LoadAllEmbedded().
    MustBuild()
2. Add Words
// Single word
detector.AddWord("badword", sensitive.LevelHigh)

// Multiple words
words := map[string]sensitive.Level{
    "illegal":  sensitive.LevelHigh,
    "violence": sensitive.LevelHigh,
    "abuse":    sensitive.LevelMedium,
    "spam":     sensitive.LevelLow,
}
detector.AddWords(words)
3. Load Dictionary

Built-in dictionaries:

detector.LoadAllEmbedded()  // All 6 dictionaries
detector.LoadEmbeddedDict(sensitive.DictHighPolitics, sensitive.LevelHigh)  // Specific

Custom dictionaries:

detector.LoadDict("custom/my_words.txt")  // Auto-detect level from filename
detector.LoadDictWithLevel("any_name.txt", sensitive.LevelHigh)  // Explicit level

From URL:

detector.LoadDictFromURL("https://example.com/dict.txt")

File naming (auto-level detection):

  • high_*.txt → LevelHigh
  • medium_*.txt → LevelMedium
  • low_*.txt → LevelLow
  • Other → LevelMedium (default)
4. Configure Options
// Filter strategy
detector.WithFilterStrategy(sensitive.StrategyMask)     // "bad" → "***"
detector.WithFilterStrategy(sensitive.StrategyReplace).WithReplaceChar('█')  // "bad" → "███"
detector.WithFilterStrategy(sensitive.StrategyRemove)    // "bad" → ""

// Case sensitivity
detector.WithCaseSensitive(false)  // "TEST", "test", "Test" all match (default)
detector.WithCaseSensitive(true)   // Only exact case

// Skip whitespace
detector.WithSkipWhitespace(true)  // "b a d" matches "bad"

// Traditional/Simplified Chinese
detector.WithVariant(true).LoadVariantMap("variant_map.txt")
5. Detect Content
// Simple validation
if detector.Validate(text) {
    return errors.New("content rejected")
}

// Get details
result := detector.Detect(text)
if result.HasSensitive {
    for _, match := range result.Matches {
        fmt.Printf("Word: %s, Level: %s, Position: %d-%d\n",
            match.Word, match.Level, match.Start, match.End)
    }
    fmt.Println("Filtered:", result.FilteredText)
}

// Filter only
filtered := detector.Filter(text)
6. Error Handling
// Build() returns error
detector, err := sensitive.NewBuilder().LoadAllEmbedded().Build()
if err != nil {
    log.Fatal(err)
}

// MustBuild() panics on error (use in init())
detector := sensitive.NewBuilder().LoadAllEmbedded().MustBuild()
7. Concurrent Usage
var detector *sensitive.Detector

func init() {
    detector = sensitive.NewBuilder().LoadAllEmbedded().MustBuild()
}

// Thread-safe after Build()
func handler(text string) error {
    if detector.Validate(text) {
        return errors.New("sensitive content")
    }
    return nil
}

⚠️ Not safe: Adding words after Build() in concurrent environment

8. Performance

Benchmark Environment: Apple M2 Max, Go 1.25, 1000 words dictionary, mixed Chinese/English text

Comparison with popular Go libraries:

Benchmark Done-0/sensitive importcjj/sensitive anknown/ahocorasick
Contains 36.6 μs, 0B, 0 allocs 89.4 μs, 42KB, 15 allocs 24.1 μs, 0B, 0 allocs
FindAll 37.0 μs, 752B, 2 allocs 21.5 μs, 13KB, 1 alloc 23.5 μs, 0B, 0 allocs
Filter 36.8 μs, 752B, 2 allocs 37.1 μs, 19KB, 2 allocs N/A
Parallel (12-core) 4.3 μs, ~0B, 0 allocs 27.0 μs, 46KB, 15 allocs 2.7 μs, 0B, 0 allocs
Short Text (100 chars) 678 ns, 0B, 0 allocs 1.59 μs, 461B, 6 allocs 398 ns, 0B, 0 allocs
Long Text (10K chars) 367 μs, ~0B, 0 allocs 1.35 ms, 393KB, 22 allocs 239 μs, 0B, 0 allocs

Key Advantages:

  • Zero allocation in hot path (Contains, FindFirst)
  • High concurrency: 4.3μs on 12-core parallel, 6x faster than importcjj
  • 26x less memory than importcjj/sensitive in Filter
  • 3.7x faster for long text vs importcjj
  • Full-featured: Filter, levels, variant support (vs ahocorasick's search-only)
  • Thread-safe: sync.RWMutex + sync.Pool optimization

Custom Dictionaries

Place your dictionary files anywhere in your project:

detector := sensitive.NewBuilder().
    LoadAllEmbedded().                   // Optional: built-in dictionaries
    LoadDict("dict/high_banned.txt").    // Your dictionary in project
    LoadDict("configs/custom.txt").      // Another location
    MustBuild()

Git exclusion: Files named custom_*.txt, local_*.txt, user_*.txt in configs/dict/ are auto-excluded.

Examples

See examples/ for production-ready code:

Example Description
fluent_api Fluent API chain calls
quickstart Simplest usage
web_api HTTP REST API service
comment_filter Content moderation system
dependency_injection DI pattern
high_concurrency Concurrent processing

Run example:

cd examples/fluent_api
go run main.go

License

MIT License

Documentation

Overview

Package sensitive provides high-performance sensitive word detection using AC automaton Creator: Done-0 Created: 2025-01-15

Package sensitive provides high-performance sensitive word detection using AC automaton Creator: Done-0 Created: 2025-01-15

Package sensitive provides high-performance sensitive word detection using AC automaton Creator: Done-0 Created: 2025-01-15

Package sensitive provides high-performance sensitive word detection using AC automaton Creator: Done-0 Created: 2025-01-15

Index

Constants

View Source
const (
	DictHighPolitics    = "high_politics.txt"
	DictHighPornography = "high_pornography.txt"
	DictHighViolence    = "high_violence.txt"
	DictMediumGeneral   = "medium_general.txt"
	DictLowAd           = "low_ad.txt"
	DictLowURL          = "low_url.txt"
)

Variables

This section is empty.

Functions

func LoadAllEmbedded

func LoadAllEmbedded(detector *Detector) error

func LoadDictDir

func LoadDictDir(dir string) (map[string]Level, error)

func LoadEmbeddedDict

func LoadEmbeddedDict(detector *Detector, name string, level Level) error

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

func NewBuilder

func NewBuilder(opts ...Option) *Builder

func (*Builder) AddWord

func (b *Builder) AddWord(word string, level Level) *Builder

func (*Builder) AddWords

func (b *Builder) AddWords(words map[string]Level) *Builder

func (*Builder) Build

func (b *Builder) Build() (*Detector, error)

func (*Builder) LoadAllEmbedded

func (b *Builder) LoadAllEmbedded() *Builder

func (*Builder) LoadDict

func (b *Builder) LoadDict(path string) *Builder

func (*Builder) LoadDictFromURL

func (b *Builder) LoadDictFromURL(url string) *Builder

func (*Builder) LoadDictFromURLWithLevel

func (b *Builder) LoadDictFromURLWithLevel(url string, level Level) *Builder

func (*Builder) LoadDictFromURLs

func (b *Builder) LoadDictFromURLs(urls []string) *Builder

func (*Builder) LoadDictWithLevel

func (b *Builder) LoadDictWithLevel(path string, level Level) *Builder

func (*Builder) LoadEmbeddedDict

func (b *Builder) LoadEmbeddedDict(name string, level Level) *Builder

func (*Builder) LoadVariantMap

func (b *Builder) LoadVariantMap(path string) *Builder

func (*Builder) MustBuild

func (b *Builder) MustBuild() *Detector

func (*Builder) WithCaseSensitive

func (b *Builder) WithCaseSensitive(sensitive bool) *Builder

func (*Builder) WithFilterStrategy

func (b *Builder) WithFilterStrategy(strategy FilterStrategy) *Builder

func (*Builder) WithReplaceChar

func (b *Builder) WithReplaceChar(char rune) *Builder

func (*Builder) WithSkipWhitespace

func (b *Builder) WithSkipWhitespace(skip bool) *Builder

func (*Builder) WithVariant

func (b *Builder) WithVariant(enable bool) *Builder

type Detector

type Detector struct {
	// contains filtered or unexported fields
}

func New

func New(opts ...Option) *Detector

func (*Detector) AddWord

func (d *Detector) AddWord(word string, level Level) error

func (*Detector) AddWords

func (d *Detector) AddWords(words map[string]Level) error

func (*Detector) Build

func (d *Detector) Build() error

func (*Detector) Contains

func (d *Detector) Contains(text string) bool

func (*Detector) Detect

func (d *Detector) Detect(text string) *Result

func (*Detector) Filter

func (d *Detector) Filter(text string) string

func (*Detector) FindAll

func (d *Detector) FindAll(text string) []string

func (*Detector) FindFirst

func (d *Detector) FindFirst(text string) *Match

func (*Detector) IsVariantEnabled

func (d *Detector) IsVariantEnabled() bool

func (*Detector) LoadDict

func (d *Detector) LoadDict(path string) error

func (*Detector) LoadDictFromURL

func (d *Detector) LoadDictFromURL(url string) error

func (*Detector) LoadDictFromURLWithLevel

func (d *Detector) LoadDictFromURLWithLevel(url string, level Level) error

func (*Detector) LoadDictFromURLs

func (d *Detector) LoadDictFromURLs(urls []string) error

func (*Detector) LoadDictWithLevel

func (d *Detector) LoadDictWithLevel(path string, level Level) error

func (*Detector) LoadVariantMap

func (d *Detector) LoadVariantMap(path string) error

func (*Detector) Stats

func (d *Detector) Stats() *Stats

func (*Detector) Validate

func (d *Detector) Validate(text string) bool

type FilterStrategy

type FilterStrategy int
const (
	StrategyMask FilterStrategy = iota
	StrategyRemove
	StrategyReplace
)

type Level

type Level int
const (
	LevelLow    Level = 1
	LevelMedium Level = 2
	LevelHigh   Level = 3
)

func (Level) IsValid

func (l Level) IsValid() bool

func (Level) String

func (l Level) String() string

type Match

type Match struct {
	Word  string
	Start int
	End   int
	Level Level
}

type Option

type Option func(*Options)

func WithCaseSensitive

func WithCaseSensitive(sensitive bool) Option

func WithFilterStrategy

func WithFilterStrategy(s FilterStrategy) Option

func WithReplaceChar

func WithReplaceChar(c rune) Option

func WithSkipWhitespace

func WithSkipWhitespace(skip bool) Option

func WithVariant

func WithVariant(enable bool) Option

type Options

type Options struct {
	FilterStrategy FilterStrategy
	ReplaceChar    rune
	SkipWhitespace bool
	EnableVariant  bool
	CaseSensitive  bool
}

type Result

type Result struct {
	HasSensitive bool
	Matches      []Match
	FilteredText string
}

type Stats

type Stats struct {
	TotalWords int
	TreeDepth  int
	MemorySize int64
}

Directories

Path Synopsis
examples
comment_filter command
Package main demonstrates user-generated content filtering system Creator: Done-0 Created: 2025-01-15
Package main demonstrates user-generated content filtering system Creator: Done-0 Created: 2025-01-15
dependency_injection command
Package main demonstrates dependency injection pattern for production applications Creator: Done-0 Created: 2025-01-15
Package main demonstrates dependency injection pattern for production applications Creator: Done-0 Created: 2025-01-15
fluent_api command
Package main demonstrates Fluent API pattern for elegant chain calls Creator: Done-0 Created: 2025-01-15
Package main demonstrates Fluent API pattern for elegant chain calls Creator: Done-0 Created: 2025-01-15
high_concurrency command
Package main demonstrates high concurrency usage for production environments Creator: Done-0 Created: 2025-01-15
Package main demonstrates high concurrency usage for production environments Creator: Done-0 Created: 2025-01-15
quickstart command
Package main demonstrates the simplest production usage Creator: Done-0 Created: 2025-01-15
Package main demonstrates the simplest production usage Creator: Done-0 Created: 2025-01-15
web_api command
Package main demonstrates content moderation API for production web services Creator: Done-0 Created: 2025-01-15
Package main demonstrates content moderation API for production web services Creator: Done-0 Created: 2025-01-15
internal
normalizer
Package normalizer provides text normalization for sensitive word detection Creator: Done-0 Created: 2025-01-15
Package normalizer provides text normalization for sensitive word detection Creator: Done-0 Created: 2025-01-15
pool
Package pool provides memory pool optimization Creator: Done-0 Created: 2025-01-15
Package pool provides memory pool optimization Creator: Done-0 Created: 2025-01-15
trie
Package trie implements Double Array Trie and AC automaton for high-performance sensitive word detection Creator: Done-0 Created: 2025-01-15
Package trie implements Double Array Trie and AC automaton for high-performance sensitive word detection Creator: Done-0 Created: 2025-01-15

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL