injection

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

CLAUDE:SUMMARY Détection et décodage de segments Base64 embarqués dans du texte brut (token smuggling). CLAUDE:EXPORTS DecodeBase64Segments

CLAUDE:SUMMARY Fuzzy string matching par Levenshtein pour résistance à la typoglycémie. CLAUDE:EXPORTS FuzzyContains

CLAUDE:SUMMARY Normalisation Unicode multi-couche pour détection d'injection — NFKD, confusables, leet, invisible strip, markup strip. CLAUDE:DEPENDS golang.org/x/text/unicode/norm CLAUDE:EXPORTS Normalize, StripInvisible, StripMarkup, FoldConfusables, FoldLeet

CLAUDE:SUMMARY Scan d'injection 3 couches : exact, fuzzy, base64 — zero regex, zero ReDoS. CLAUDE:DEPENDS injection/normalize.go, injection/fuzzy.go, injection/base64.go CLAUDE:EXPORTS Scan, Intent, Result, Match, LoadIntents, DefaultIntents

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DecodeBase64Segments

func DecodeBase64Segments(s string) string

DecodeBase64Segments scans text for base64-encoded tokens and decodes them in-place. Only tokens >= 16 characters that decode to valid, mostly-printable UTF-8 are replaced.

func DecodeEscapes

func DecodeEscapes(s string) string

DecodeEscapes decodes common encoding escapes in text:

  • \xHH (C-style hex escape)
  • %HH (URL percent-encoding)
  • &#DDD; (HTML decimal entity)
  • &#xHH; (HTML hex entity)

func DecodeROT13

func DecodeROT13(s string) string

DecodeROT13 applies ROT13 rotation to all ASCII letters.

func FoldConfusables

func FoldConfusables(s string) string

FoldConfusables maps homoglyph characters (Cyrillic/Greek/IPA) to their ASCII equivalents.

func FoldLeet

func FoldLeet(s string) string

FoldLeet maps leet speak characters to their ASCII letter equivalents.

func FuzzyContains

func FuzzyContains(text string, phrase string, maxEditPerWord int) bool

FuzzyContains checks if text contains a fuzzy match for phrase using Levenshtein distance per word with a sliding window approach. Returns true only if every word in phrase matches within maxEditPerWord edits AND the total distance is > 0 (exact matches are handled by strings.Contains).

func HasHomoglyphMixing

func HasHomoglyphMixing(text string) bool

HasHomoglyphMixing detects mixed Latin/Cyrillic or Latin/Greek in single words (visual obfuscation).

func Normalize

func Normalize(s string) string

Normalize applies the full normalization pipeline to text: strip invisible → strip markup → NFKD → strip combining marks → fold confusables → fold leet → lower → collapse whitespace.

func ReorderMatch

func ReorderMatch(text string, phrase string) bool

ReorderMatch checks if text contains a window of words that, when sorted alphabetically, match the sorted words of phrase. Catches word-reordered injections like "instructions previous ignore all". Only returns true when words are actually reordered (not in original order).

func StripInvisible

func StripInvisible(s string) string

StripInvisible removes all Unicode format (Cf) and control (Cc) characters except newline, tab, and carriage return.

func StripMarkup

func StripMarkup(s string) string

StripMarkup removes HTML/XML tags, Markdown formatting, and LaTeX commands, preserving the text content.

Types

type Intent

type Intent struct {
	ID        string `json:"id"`
	Canonical string `json:"canonical"` // already normalized (lowercase, no accents, no punctuation)
	Category  string `json:"category"`
	Lang      string `json:"lang"`
	Severity  string `json:"severity"` // "high", "medium", "low"
}

Intent represents a canonical prompt injection pattern.

func DefaultIntents

func DefaultIntents() []Intent

DefaultIntents returns the embedded intent list, loaded once.

func LoadIntents

func LoadIntents(data []byte) ([]Intent, error)

LoadIntents parses a JSON intent list from external data (for reload/feed).

type Match

type Match struct {
	IntentID string `json:"intent_id"`
	Category string `json:"category"`
	Severity string `json:"severity"`
	Method   string `json:"method"` // "exact", "fuzzy", "base64", "structural"
}

Match describes a single detected injection pattern.

type Result

type Result struct {
	Risk    string  `json:"risk"` // "none", "medium", "high"
	Matches []Match `json:"matches,omitempty"`
}

Result holds the outcome of an injection scan.

func Scan

func Scan(text string, intents []Intent) *Result

Scan runs the full injection detection pipeline on text: 1. Structural detection (zero-width clusters, homoglyph mixing) on original text 2. Normalize text 3. Exact matching (strings.Contains) against all intents 4. Fuzzy matching (Levenshtein) for unmatched intents 5. Base64 decoding and re-scan of decoded segments

Scan is designed to be called on both inputs AND outputs of LLM agents.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL