comb

package module
v0.0.0-...-77cd799 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2025 License: MIT Imports: 11 Imported by: 0

README

A Parser Combinator Library For Go

comb logo

Comb is a library that simplifies building parsers in Go.

For me, it has got the optimal feature set:

  1. Simple maintainability of a normal library thanks to being a parser combinator library.
  2. Report errors with the line and column position and code snippet.
  3. Reporting of multiple errors.
  4. UNICODE support (working on UTF-8 encoded UNICODE code points instead of bytes if requested).
  5. Support for binary input (including byte position and hex dump for errors).
  6. Type safety (including filling arbitrary typed data) using generics.
  7. Idiomatic Go code (no generated code, ...).
  8. Good performance (for the feature set).

It's based on Gomme that showed how to get the general developer experience and type safety right.

Table of content

Getting started

Here's how to quickly parse hexadecimal color codes using Gomme:

// RGBColor stores the three bytes describing a color in the RGB space.
type RGBColor struct {
    red   uint8
    green uint8
    blue  uint8
}

// ParseRGBColor creates a new RGBColor from a hexadecimal color string.
// The string must be a six-digit hexadecimal number, prefixed with a "#".
func ParseRGBColor(input string) (RGBColor, error) {
    parse := cmb.Map4(
        SaveSpot(C('#')),
        HexColorComponent("red hex color"),
        HexColorComponent("green hex color"),
        HexColorComponent("blue hex color"),
        func(_ rune, r, g, b string) (RGBColor, error) {
            return RGBColor{fromHex(r), fromHex(g), fromHex(b)}, nil
        },
    )

    return comb.RunOnString(input, parse)
}

// HexColorComponent produces a parser that parses a single hex color component,
// which is a two-digit hexadecimal number.
func HexColorComponent() comb.Parser[string] {
    return SaveSpot(cmb.SatisfyMN(expected, 2, 2, cmb.IsHexDigit))
}

// fromHex converts a two digits hexadecimal number to its decimal value.
func fromHex(input string) uint8 {
    res, _ := strconv.ParseUint(input, 16, 8) // errors have been caught by the parser
    return uint8(res)
}

It's as simple as that! Feel free to explore more in the examples directory.

Examples

See Comb in action with these handy examples:

Documentation

For more detailed information, refer to the official documentation.

Installation

Like any other library:

go get github.com/flowdev/comb

Guide

In this guide, we provide a detailed overview of the various combinators available in Comb. Combinators are fundamental building blocks in parser construction, each designed for a specific task. By combining them, you can create complex parsers suited to your specific needs. For each combinator, we've provided a brief description and a usage example. Let's explore!

List of combinators
Base combinators
Combinator Description Example
Map Applies a function to the result of the provided parser, allowing you to transform the parser's result. Map(Digit1(), func(s string)int { return 123 })
Optional Makes a parser optional. If unsuccessful, the parser returns a nil Result.Output.Output`. Optional(CRLF())
Peek Applies the provided parser without consuming the input.
Recognize Returns the consumed input as the produced value when the provided parser is successful. Recognize(SeparatedPair(Token("key"), Char(':'), Token("value"))
Assign Returns the assigned value when the provided parser is successful. Assign(true, Token("true"))
Bytes combinators
Combinator Description Example
Take Parses the first N elements of the input. Take(5)
TakeUntil Parses the input until the provided parser argument succeeds. TakeUntil(CRLF()))
TakeWhileMN Parses the longest input slice fitting the length expectation (m <= input length <= n) and matching the predicate. The parser argument is a function taking a rune as input and returning a bool. TakeWhileMN(2, 6, gomme.isHexDigit)
Token Recognizes a specific pattern. Compares the input with the token's argument and returns the matching part. Token("tolkien")
Character combinators
Combinator Description Example
Char Parses a single instance of a provided character. Char('$')
AnyChar Parses a single instance of any character. AnyChar()
Alpha0 Parses zero or more alphabetical ASCII characters (case insensitive). Alpha0()
Alpha1 Parses one or more alphabetical ASCII characters (case insensitive). Alpha1()
Alphanumeric0 Parses zero or more alphabetical and numerical ASCII characters (case insensitive). Alphanumeric0()
Alphanumeric1 Parses one or more alphabetical and numerical ASCII characters (case insensitive). Alphanumeric1()
Digit0 Parses zero or more numerical ASCII characters: 0-9. Digit0()
Digit1 Parses one or more numerical ASCII characters: 0-9. Digit1()
HexDigit0 Parses zero or more hexadecimal ASCII characters (case insensitive). HexDigit0()
HexDigit1 Parses one or more hexadecimal ASCII characters (case insensitive). HexDigit1()
Whitespace0 Parses zero or more whitespace ASCII characters: space, tab, carriage return, line feed. Whitespace0()
Whitespace1 Parses one or more whitespace ASCII characters: space, tab, carriage return, line feed. Whitespace1()
LF Parses a single new line character '\n'. LF()
CRLF Parses a '\r\n' string. CRLF()
OneOf Parses one of the provided characters. Equivalent to using Alternative over a series of Char parsers. OneOf('a', 'b' , 'c')
Satisfy Parses a single character, asserting that it matches the provided predicate. The predicate function takes a rune as input and returns a bool. Satisfy is useful for building custom character matchers. `Satisfy(func(c rune)bool { return c == '{'
Space Parses a single space character ' '. Space()
Tab Parses a single tab character '\t'. Tab()
Int64 Parses an int64 from its textual representation. Int64()
Int8 Parses an int8 from its textual representation. Int8()
UInt8 Parses a uint8 from its textual representation. UInt8()
Combinators for Sequences
Combinator Description Example
Preceded Applies the prefix parser and discards its result. It then applies the main parser and returns its result. It discards the prefix value. It proves useful when looking for data prefixed with a pattern. For instance, when parsing a value, prefixed with its name. Preceded(Token("name:"), Alpha1())
Terminated Applies the main parser, followed by the suffix parser whom it discards the result of, and returns the result of the main parser. Note that if the suffix parser fails, the whole operation fails, regardless of the result of the main parser. It proves useful when looking for suffixed data while not interested in retaining the suffix value itself. For instance, when parsing a value followed by a control character. Terminated(Digit1(), LF())
Delimited Applies the prefix parser, the main parser, followed by the suffix parser, discards the result of both the prefix and suffix parsers, and returns the result of the main parser. Note that if any of the prefix or suffix parsers fail, the whole operation fails, regardless of the result of the main parser. It proves useful when looking for data surrounded by patterns helping them identify it without retaining its value. For instance, when parsing a value, prefixed by its name and followed by a control character. Delimited(Tag("name:"), Digit1(), LF())
Pair Applies two parsers in a row and returns a pair container holding both their result values. Pair(Alpha1(), Tag("cm"))
SeparatedPair Applies a left parser, a separator parser, and a right parser discards the result of the separator parser, and returns the result of the left and right parsers as a pair container holding the result values. SeparatedPair(Alpha1(), Tag(":"), Alpha1())
Sequence Applies a sequence of parsers sharing the same signature. If any of the provided parsers fail, the whole operation fails. Sequence(SeparatedPair(Tag("name"), Char(':'), Alpha1()), SeparatedPair(Tag("height"), Char(':'), Digit1()))
Combinators for Applying Parsers Many Times
Combinator Description Example
Count Applies the provided parser count times. If the parser fails before it can be applied count times, the operation fails. It proves useful whenever one needs to parse the same pattern many times in a row. Count(3, OneOf('a', 'b', 'c'))
Many0 Keeps applying the provided parser until it fails and returns a slice of all the results. Specifically, if the parser fails to match, Many0 still succeeds, returning an empty slice of results. It proves useful when trying to consume a repeated pattern, regardless of whether there's any match, like when trying to parse any number of whitespaces in a row. Many0(Char(' '))
Many1 Keeps applying the provided parser until it fails and returns a slice of all the results. If the parser fails to match at least once, Many1 fails. It proves useful when trying to consume a repeated pattern, like any number of whitespaces in a row, ensuring that it appears at least once. Many1(LF())
SeparatedList0
SeparatedList1
Combinators for Choices
Combinator Description Example
Alternative Tests a list of parsers, one by one, until one succeeds. Note that all parsers must share the same signature (Parser[I, O]). Alternative(Token("abc"), Token("123"))

Frequently asked questions

Q: What's the name?

A: Comb first of all got its name from being a parser COMBinatior library. But it is also very good at "combing" through input, finding errors. Since the error handling system is by far the hardest part of the project, the name feels right.

Q: What are parser combinators?

A: Parser combinators offer a new way of building parsers. Instead of writing a complex parser that analyzes an entire format, you create small, simple parsers that handle the smallest units of the format. These small parsers can then be combined to build more complex parsers. It's a bit like using building blocks to construct whatever structure you want.

Q: Why would I use parser combinators instead of a specific parser?

A: Parser combinators are incredibly flexible and intuitive. Once you're familiar with them, they enable you to quickly create, maintain, and modify parsers. They offer you a high degree of freedom in designing your parser and how it's used.

Q: Where can I learn more about parser combinators?

A: Here are some resources we recommend:

Acknowledgements

We've stood on the shoulders of giants to create Comb. The library draws heavily on the extensive theoretical work done in the parser combinators space, and we owe a huge thanks to Gomme that solved two important problems for us and gave hints for others.

Getting the error recovery mechanism right took at least 90% of the time put into the project until now.

Authors

  • @ole108 (main developer behind the flowdev organization)
  • @oleiade (for Gomme)

Documentation

Overview

Package comb implements a parser combinator library. It provides a toolkit for developers to build reliable, fast, flexible, and easy-to-develop and maintain parsers for both textual and binary formats. It extensively uses the recent introduction of Generics in the Go programming language to offer flexibility in how combinators can be mixed and matched to produce the desired output while providing as much compile-time type safety as possible.

Index

Constants

View Source
const (
	ParentUndefined = math.MinInt32 + iota // used for calling the root parser
	ParentUnknown                          // used for bottom-up parsing
)
View Source
const DefaultMaxErrors = 10 // the maximum number of errors to recover from (same as for the Go compiler)
View Source
const RecoverNever = -3
View Source
const RecoverWasteTooMuch = -1 // has to be -1 because of Go Index... functions
View Source
const RecoverWasteUnknown = -2 // default value; 0 can't be used because it's a valid normal value
View Source
const SyntaxErrorStart = "expected "

Variables

This section is empty.

Functions

func Debugf

func Debugf(msg string, args ...interface{})

Debugf logs the given message using `log.Printf` if the debug level is enabled.

func RunOnBytes

func RunOnBytes[Output any](input []byte, parse Parser[Output]) (Output, error)

RunOnBytes runs a parser on binary input and returns the output and error(s). This is useful for binary or mixed binary/text parsers.

func RunOnState

func RunOnState[Output any](state State, parser *PreparedParser[Output]) (Output, error)

RunOnState runs a parser on a given state and returns the output and error(s). RunOnString and RunOnBytes are just convenience wrappers around RunOnState. RunOnState is the only one that is concurrent-safe because preparing the parser is NOT.

func RunOnString

func RunOnString[Output any](input string, parse Parser[Output]) (Output, error)

RunOnString runs a parser on text input and returns the output and error(s).

func SetDebug

func SetDebug(enable bool)

SetDebug sets the log level to debug if enabled or info otherwise.

func UnwrapErrors

func UnwrapErrors(err error) []error

func ZeroOf

func ZeroOf[T any]() T

ZeroOf returns the zero value of some type.

Types

type AnyParser

type AnyParser interface {
	ID() int32
	ParseAny(parentID int32, state State) (State, interface{}, *ParserError) // top -> down

	IsSafeSpot() bool
	Recover(State, interface{}) (int, interface{})
	IsStepRecoverer() bool
	// contains filtered or unexported methods
}

AnyParser is an internal interface used by PreparedParser. It intentionally avoids generics for easy storage of parsers in collections (slices, maps, ...).

type BranchParser

type BranchParser interface {
	// contains filtered or unexported methods
}

BranchParser is a more internal interface used by orchestrators. It intentionally avoids generics for easy storage of parsers in collections (slices, maps, ...). BranchParser just adds 2 methods to the Parser and AnyParser interfaces.

type ConstState

type ConstState struct {
	// contains filtered or unexported fields
}

ConstState is the constant data for all the parsers. E.g., the input and data derived from it. The input can be either UTF-8 encoded text (a.k.a. string) or raw bytes. The parsers store and advance the position within the data but never change the data itself. This allows good error reporting, including the full line of text containing the error.

type Parser

type Parser[Output any] interface {
	ID() int32
	Expected() string
	Parse(state State) (State, Output, *ParserError)                         // used by compiler (for type inference) and tests
	ParseAny(parentID int32, state State) (State, interface{}, *ParserError) // used by PreparedParser (top -> down)

	IsSafeSpot() bool

	Recover(State, interface{}) (int, interface{})
	IsStepRecoverer() bool
	SwapRecoverer(Recoverer) // called during the construction phase
	// contains filtered or unexported methods
}

Parser defines the type of a generic Parser. A few rules should be followed to prevent unexpected behaviour:

  • A parser that errors must return the error
  • A parser that errors should not change the position of the states input
  • A parser that consumes some input must advance with state.MoveBy()

func LazyBranchParser

func LazyBranchParser[Output any](makeParser func() Parser[Output]) Parser[Output]

LazyBranchParser just stores a function that creates the parser and evaluates the function later. This allows deferring the call to NewParser() and thus to define recursive grammars. Only branch parsers need this ability. A leaf parser can't be recursive by definition.

func NewBranchParser

func NewBranchParser[Output any](
	expected string,
	children func() []AnyParser,
	parseAfterChild func(childID int32, childStartState, childState State, childOut interface{}, childErr *ParserError, data interface{},
	) (State, Output, *ParserError, interface{}),
) Parser[Output]

NewBranchParser is THE way to create branch parsers. parseAfterChild is called with a `childID < 0` during normal (top -> down) parsing. It will be called with a `childID >= 0` during error recovery (bottom -> up).

func NewParser

func NewParser[Output any](
	expected string,
	parse func(State) (State, Output, *ParserError),
	recover Recoverer,
) Parser[Output]

NewParser is THE way to create simple leaf parsers. recover can be nil to signal that there is no optimized recoverer available. In case of an error, the parser will be called again and again moving forward one byte/rune at a time instead.

func NewParserWithData

func NewParserWithData[Output any](
	expected string,
	parse func(State, interface{}) (State, Output, *ParserError, interface{}),
	recover Recoverer,
) Parser[Output]

NewParserWithData is the way to create leaf parsers that have partial results they want to save in case of an error. recover can be nil to signal that there is no optimized recoverer available. In case of an error, the parser will be called again and again moving forward one byte/rune at a time instead.

func SafeSpot

func SafeSpot[Output any](p Parser[Output]) Parser[Output]

SafeSpot applies a sub-parser and marks the new state as a point of no return if successful. It really serves 3 slightly different purposes:

  1. Prevent a `FirstSuccessful` parser from trying later sub-parsers even in case of an error.
  2. Prevent other unnecessary backtracking in case of an error.
  3. Mark a parser as a potential safe place to recover to when recovering from an error.

So you don't need this parser at all if your input is always correct. SafeSpot is THE cornerstone of good and performant parsing otherwise.

NOTE:

  • Parsers that accept the empty input or only perform look ahead are NOT allowed as sub-parsers. SafeSpot tests the optional recoverer of the parser during the construction phase to do a timely panic. This way we won't have to panic at the runtime of the parser.
  • Only leaf parsers MUST be given to SafeSpot as sub-parsers. SafeSpot will treat the sub-parser as a leaf parser. Any error will look as if coming from SafeSpot itself.

type ParserError

type ParserError struct {
	// contains filtered or unexported fields
}

ParserError is an error message from the parser. It consists of the text itself and the position in the input where it happened.

func ClaimError

func ClaimError(err *ParserError) *ParserError

ClaimError takes over an error from a sub-parser. This is used for sub-parsers that aren't reported as children.

func (*ParserError) Error

func (e *ParserError) Error() string

func (*ParserError) ParserData

func (e *ParserError) ParserData(parserID int32) interface{}

func (*ParserError) PatchMessage

func (e *ParserError) PatchMessage(subMsg string)

func (*ParserError) StoreParserData

func (e *ParserError) StoreParserData(parserID int32, data interface{})

type ParserIDs

type ParserIDs struct {
	// contains filtered or unexported fields
}

ParserIDs is the base of every comb parser. It enables registering of all parsers and error recovery.

func (*ParserIDs) ID

func (pids *ParserIDs) ID() int32

type PreparedParser

type PreparedParser[Output any] struct {
	// contains filtered or unexported fields
}

func NewPreparedParser

func NewPreparedParser[Output any](p Parser[Output]) *PreparedParser[Output]

NewPreparedParser prepares a parser for error recovery. Call this directly if you have a parser that you want to run on many inputs. You can use this together with RunOnState.

type Recoverer

type Recoverer func(state State, data interface{}) (int, interface{})

Recoverer is a simplified parser that returns the number of bytes to reach a SafeSpot. If it can't recover from the given state, it should return RecoverWasteTooMuch. If it can't recover AT ALL, it should return RecoverNever. The data given to it and returned from it is arbitrary internal data of the parser. Usually it is a partial result of the parser.

If no special recoverer is given, we will try the parser until it succeeds moving forward 1 rune/byte at a time. :(

type Separator

type Separator interface {
	~rune | ~byte | ~string | ~[]byte
}

Separator is a generic type for separators (byte, rune, []byte or string)

type State

type State struct {
	// contains filtered or unexported fields
}

State represents the current state of a parser.

func BetterOf

func BetterOf(state, other State) (State, bool)

BetterOf returns the more advanced (in the input) state of the two and true iff it is the other. This should be used for parsers that are alternatives. So the best error is handled.

func NewFromBytes

func NewFromBytes(input []byte, maxErrors int) State

NewFromBytes creates a new parser state from the input data.

func NewFromString

func NewFromString(input string, maxErrors int) State

NewFromString creates a new parser state from the input data.

func (State) AtEnd

func (st State) AtEnd() bool

func (State) ByteCount

func (st State) ByteCount(remaining State) int

func (State) BytesRemaining

func (st State) BytesRemaining() int

func (State) BytesTo

func (st State) BytesTo(remaining State) []byte

func (State) CurrentBytes

func (st State) CurrentBytes() []byte

func (State) CurrentPos

func (st State) CurrentPos() int

func (State) CurrentSourceLine

func (st State) CurrentSourceLine() string

CurrentSourceLine returns the source line corresponding to the current position including [line:column] at the start and a marker at the exact error position. This should be used for reporting errors that are detected later. The binary case is handled accordingly.

func (State) CurrentString

func (st State) CurrentString() string

func (State) Delete1

func (st State) Delete1() State

Delete1 moves forward in the input, thus simulating deletion of input. For binary input it moves forward by a byte otherwise by a UNICODE rune.

func (State) Errors

func (st State) Errors() error

Errors returns all error messages accumulated by the state as a Go error. Multiple errors have been joined (by errors.Join()).

func (State) GetFromCache

func (st State) GetFromCache(pID int32) interface{}

func (State) HasError

func (st State) HasError() bool

HasError returns true if any errors are registered. (Errors that would be returned by State.Errors())

func (State) MoveBackTo

func (st State) MoveBackTo(pos int) State

func (State) MoveBy

func (st State) MoveBy(countBytes int) State

func (State) MoveSafeSpot

func (st State) MoveSafeSpot() State

MoveSafeSpot returns the state with the safe spot moved to the current position.

func (State) Moved

func (st State) Moved(other State) bool

func (State) NewSemanticError

func (st State) NewSemanticError(msg string, args ...interface{}) *ParserError

NewSemanticError creates a semantic error with the message and arguments at the current state position. The usual position and source line including marker are appended to the message.

func (State) NewSyntaxError

func (st State) NewSyntaxError(msg string, args ...interface{}) *ParserError

NewSyntaxError creates a syntax error with the message and arguments at the current state position. For syntax errors `expected ` is prepended to the message, and the usual position and source line including marker are appended.

func (State) PutIntoCache

func (st State) PutIntoCache(pID int32, value interface{})

func (State) SafeSpotMoved

func (st State) SafeSpotMoved(other State) bool

SafeSpotMoved is true iff the safe spot is different between the 2 states.

func (State) SaveError

func (st State) SaveError(err *ParserError) State

SaveError saves an error and returns the new state.

func (State) StringTo

func (st State) StringTo(remaining State) string

Directories

Path Synopsis
Package cmb contains all the standard parsers and all recoverers.
Package cmb contains all the standard parsers and all recoverers.
examples
csv
Package csv implements a parser for CSV files.
Package csv implements a parser for CSV files.
hexcolor
Package hexcolor implements a parser for hexadecimal color strings.
Package hexcolor implements a parser for hexadecimal color strings.
redis
Package redis demonstrates the usage of the comb package to parse Redis' [RESP protocol] messages.
Package redis demonstrates the usage of the comb package to parse Redis' [RESP protocol] messages.
experiments
x
omap
Package omap implements a very simple ordered map with just the absolute minimum features for our purpose.
Package omap implements a very simple ordered map with just the absolute minimum features for our purpose.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL