Documentation
¶
Overview ¶
CLAUDE:SUMMARY Détection et décodage de segments Base64 embarqués dans du texte brut (token smuggling). CLAUDE:EXPORTS DecodeBase64Segments
CLAUDE:SUMMARY Fuzzy string matching par Levenshtein pour résistance à la typoglycémie. CLAUDE:EXPORTS FuzzyContains
CLAUDE:SUMMARY Normalisation Unicode multi-couche pour détection d'injection — NFKD, confusables, leet, invisible strip, markup strip. CLAUDE:DEPENDS golang.org/x/text/unicode/norm CLAUDE:EXPORTS Normalize, StripInvisible, StripMarkup, FoldConfusables, FoldLeet
CLAUDE:SUMMARY Scan d'injection 3 couches : exact, fuzzy, base64 — zero regex, zero ReDoS. CLAUDE:DEPENDS injection/normalize.go, injection/fuzzy.go, injection/base64.go CLAUDE:EXPORTS Scan, Intent, Result, Match, LoadIntents, DefaultIntents
Index ¶
- func DecodeBase64Segments(s string) string
- func DecodeEscapes(s string) string
- func DecodeROT13(s string) string
- func FoldConfusables(s string) string
- func FoldLeet(s string) string
- func FuzzyContains(text string, phrase string, maxEditPerWord int) bool
- func HasHomoglyphMixing(text string) bool
- func Normalize(s string) string
- func ReorderMatch(text string, phrase string) bool
- func StripInvisible(s string) string
- func StripMarkup(s string) string
- type Intent
- type Match
- type Result
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DecodeBase64Segments ¶
DecodeBase64Segments scans text for base64-encoded tokens and decodes them in-place. Only tokens >= 16 characters that decode to valid, mostly-printable UTF-8 are replaced.
func DecodeEscapes ¶
DecodeEscapes decodes common encoding escapes in text:
- \xHH (C-style hex escape)
- %HH (URL percent-encoding)
- &#DDD; (HTML decimal entity)
- &#xHH; (HTML hex entity)
func DecodeROT13 ¶
DecodeROT13 applies ROT13 rotation to all ASCII letters.
func FoldConfusables ¶
FoldConfusables maps homoglyph characters (Cyrillic/Greek/IPA) to their ASCII equivalents.
func FuzzyContains ¶
FuzzyContains checks if text contains a fuzzy match for phrase using Levenshtein distance per word with a sliding window approach. Returns true only if every word in phrase matches within maxEditPerWord edits AND the total distance is > 0 (exact matches are handled by strings.Contains).
func HasHomoglyphMixing ¶
HasHomoglyphMixing detects mixed Latin/Cyrillic or Latin/Greek in single words (visual obfuscation).
func Normalize ¶
Normalize applies the full normalization pipeline to text: strip invisible → strip markup → NFKD → strip combining marks → fold confusables → fold leet → lower → collapse whitespace.
func ReorderMatch ¶
ReorderMatch checks if text contains a window of words that, when sorted alphabetically, match the sorted words of phrase. Catches word-reordered injections like "instructions previous ignore all". Only returns true when words are actually reordered (not in original order).
func StripInvisible ¶
StripInvisible removes all Unicode format (Cf) and control (Cc) characters except newline, tab, and carriage return.
func StripMarkup ¶
StripMarkup removes HTML/XML tags, Markdown formatting, and LaTeX commands, preserving the text content.
Types ¶
type Intent ¶
type Intent struct {
ID string `json:"id"`
Canonical string `json:"canonical"` // already normalized (lowercase, no accents, no punctuation)
Category string `json:"category"`
Lang string `json:"lang"`
Severity string `json:"severity"` // "high", "medium", "low"
}
Intent represents a canonical prompt injection pattern.
func DefaultIntents ¶
func DefaultIntents() []Intent
DefaultIntents returns the embedded intent list, loaded once.
func LoadIntents ¶
LoadIntents parses a JSON intent list from external data (for reload/feed).
type Match ¶
type Match struct {
IntentID string `json:"intent_id"`
Category string `json:"category"`
Severity string `json:"severity"`
Method string `json:"method"` // "exact", "fuzzy", "base64", "structural"
}
Match describes a single detected injection pattern.
type Result ¶
type Result struct {
Risk string `json:"risk"` // "none", "medium", "high"
Matches []Match `json:"matches,omitempty"`
}
Result holds the outcome of an injection scan.
func Scan ¶
Scan runs the full injection detection pipeline on text: 1. Structural detection (zero-width clusters, homoglyph mixing) on original text 2. Normalize text 3. Exact matching (strings.Contains) against all intents 4. Fuzzy matching (Levenshtein) for unmatched intents 5. Base64 decoding and re-scan of decoded segments
Scan is designed to be called on both inputs AND outputs of LLM agents.