chunk

package
v1.9.29 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 28, 2025 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CollectFiles added in v1.9.28

func CollectFiles(paths []string) ([]string, error)

CollectFiles recursively collects all files from given paths Skips paths that don't exist instead of returning an error

func FileHash

func FileHash(path string) (string, error)

FileHash calculates SHA256 hash of a file

func Matches added in v1.9.28

func Matches(path string, patterns []string) (bool, error)

Matches reports whether the given path matches any configured document path or glob pattern. To be used in file watchers to determine if a new/changed file matches the glob patterns or not.

Types

type Chunk

type Chunk struct {
	Index    int
	Content  string
	Metadata map[string]string
}

Chunk represents a piece of text from a document

func ProcessFile

func ProcessFile(dp DocumentProcessor, path string) ([]Chunk, error)

ProcessFile reads a file and processes it using the given document processor

type DocumentProcessor

type DocumentProcessor interface {
	Process(path string, content []byte) ([]Chunk, error)
}

DocumentProcessor takes file content and returns chunks. Config (size, overlap, etc.) is set at construction time.

type TextDocumentProcessor

type TextDocumentProcessor struct {
	// contains filtered or unexported fields
}

TextDocumentProcessor is the default text-based chunker

func NewTextDocumentProcessor

func NewTextDocumentProcessor(size, overlap int, respectWordBoundaries bool) *TextDocumentProcessor

NewTextDocumentProcessor creates a text-based document processor

func (*TextDocumentProcessor) Process

func (t *TextDocumentProcessor) Process(_ string, content []byte) ([]Chunk, error)

Process implements DocumentProcessor for text-based chunking

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL