Documentation
¶
Index ¶
- func DetectFormat(filename string) (format string, warn bool)
- func ExtractDateFromHeader(header string) string
- func FindTyposInMessages(messages []db.TextMessage) map[string]string
- func NormalizeText(text string) string
- func RunIngestCLI(args []string, kisekiDB, ollamaHost, embedModel string)
- func UpdateTyposFile(newTypos map[string]string) error
- type ChunkData
- type IngestResult
- type Section
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DetectFormat ¶
DetectFormat returns the ingestion format based on file extension. Returns "markdown" for .md files, "text" for .txt, and "text" with a warning flag for unknown extensions.
func ExtractDateFromHeader ¶
func FindTyposInMessages ¶
func FindTyposInMessages(messages []db.TextMessage) map[string]string
func NormalizeText ¶
func RunIngestCLI ¶
RunIngestCLI handles the "kiseki ingest" CLI command.
func UpdateTyposFile ¶
Types ¶
type ChunkData ¶
type ChunkData struct {
Text string
SourceFile string
SectionTitle string
HeaderLevel int
ParentTitle string
SectionSequence int
ChunkSequence int
ChunkTotal int
ValidAt string
}
func ChunkSection ¶
type IngestResult ¶
type IngestResult struct {
SectionsFound int
ChunksCreated int
SubChunksCreated int
DeletedChunks int64
}
func IngestContent ¶
func IngestContent(db *sql.DB, embedder ollama.Embedder, content string, sourceName string, validAt string, format string) (IngestResult, error)
IngestContent parses content in the given format and ingests it into the database. sourceName is stored as the source_file for each chunk (typically a file path or label).
type Section ¶
type Section struct {
Title string
HeaderLevel int
ParentTitle string
Content string
Sequence int
ValidAt string
}
func ParseContent ¶
ParseContent dispatches to the appropriate parser based on format. Valid formats: "markdown", "text". Defaults to "markdown" for unknown formats.
func ParseMarkdown ¶
func ParsePlainText ¶
ParsePlainText treats the entire content as a single section. The sourceName (typically the filename) is used as the section title.