Documentation
¶
Overview ¶
Package comb implements a parser combinator library. It provides a toolkit for developers to build reliable, fast, flexible, and easy-to-develop and maintain parsers for both textual and binary formats. It extensively uses the recent introduction of Generics in the Go programming language to offer flexibility in how combinators can be mixed and matched to produce the desired output while providing as much compile-time type safety as possible.
Index ¶
- Constants
- func Debugf(msg string, args ...interface{})
- func RunOnBytes[Output any](input []byte, parse Parser[Output]) (Output, error)
- func RunOnState[Output any](state State, parser *PreparedParser[Output]) (Output, error)
- func RunOnString[Output any](input string, parse Parser[Output]) (Output, error)
- func SetDebug(enable bool)
- func UnwrapErrors(err error) []error
- func ZeroOf[T any]() T
- type AnyParser
- type BranchParser
- type ConstState
- type Parser
- func LazyBranchParser[Output any](makeParser func() Parser[Output]) Parser[Output]
- func NewBranchParser[Output any](expected string, children func() []AnyParser, ...) Parser[Output]
- func NewParser[Output any](expected string, parse func(State) (State, Output, *ParserError), ...) Parser[Output]
- func NewParserWithData[Output any](expected string, ...) Parser[Output]
- func SafeSpot[Output any](p Parser[Output]) Parser[Output]
- type ParserError
- type ParserIDs
- type PreparedParser
- type Recoverer
- type Separator
- type State
- func (st State) AtEnd() bool
- func (st State) ByteCount(remaining State) int
- func (st State) BytesRemaining() int
- func (st State) BytesTo(remaining State) []byte
- func (st State) CurrentBytes() []byte
- func (st State) CurrentPos() int
- func (st State) CurrentSourceLine() string
- func (st State) CurrentString() string
- func (st State) Delete1() State
- func (st State) Errors() error
- func (st State) GetFromCache(pID int32) interface{}
- func (st State) HasError() bool
- func (st State) MoveBackTo(pos int) State
- func (st State) MoveBy(countBytes int) State
- func (st State) MoveSafeSpot() State
- func (st State) Moved(other State) bool
- func (st State) NewSemanticError(msg string, args ...interface{}) *ParserError
- func (st State) NewSyntaxError(msg string, args ...interface{}) *ParserError
- func (st State) PutIntoCache(pID int32, value interface{})
- func (st State) SafeSpotMoved(other State) bool
- func (st State) SaveError(err *ParserError) State
- func (st State) StringTo(remaining State) string
Constants ¶
const ( ParentUndefined = math.MinInt32 + iota // used for calling the root parser ParentUnknown // used for bottom-up parsing )
const DefaultMaxErrors = 10 // the maximum number of errors to recover from (same as for the Go compiler)
const RecoverNever = -3
const RecoverWasteTooMuch = -1 // has to be -1 because of Go Index... functions
const RecoverWasteUnknown = -2 // default value; 0 can't be used because it's a valid normal value
const SyntaxErrorStart = "expected "
Variables ¶
This section is empty.
Functions ¶
func Debugf ¶
func Debugf(msg string, args ...interface{})
Debugf logs the given message using `log.Printf` if the debug level is enabled.
func RunOnBytes ¶
RunOnBytes runs a parser on binary input and returns the output and error(s). This is useful for binary or mixed binary/text parsers.
func RunOnState ¶
func RunOnState[Output any](state State, parser *PreparedParser[Output]) (Output, error)
RunOnState runs a parser on a given state and returns the output and error(s). RunOnString and RunOnBytes are just convenience wrappers around RunOnState. RunOnState is the only one that is concurrent-safe because preparing the parser is NOT.
func RunOnString ¶
RunOnString runs a parser on text input and returns the output and error(s).
func SetDebug ¶
func SetDebug(enable bool)
SetDebug sets the log level to debug if enabled or info otherwise.
func UnwrapErrors ¶
Types ¶
type AnyParser ¶
type AnyParser interface {
ID() int32
ParseAny(parentID int32, state State) (State, interface{}, *ParserError) // top -> down
IsSafeSpot() bool
Recover(State, interface{}) (int, interface{})
IsStepRecoverer() bool
// contains filtered or unexported methods
}
AnyParser is an internal interface used by PreparedParser. It intentionally avoids generics for easy storage of parsers in collections (slices, maps, ...).
type BranchParser ¶
type BranchParser interface {
// contains filtered or unexported methods
}
BranchParser is a more internal interface used by orchestrators. It intentionally avoids generics for easy storage of parsers in collections (slices, maps, ...). BranchParser just adds 2 methods to the Parser and AnyParser interfaces.
type ConstState ¶
type ConstState struct {
// contains filtered or unexported fields
}
ConstState is the constant data for all the parsers. E.g., the input and data derived from it. The input can be either UTF-8 encoded text (a.k.a. string) or raw bytes. The parsers store and advance the position within the data but never change the data itself. This allows good error reporting, including the full line of text containing the error.
type Parser ¶
type Parser[Output any] interface { ID() int32 Expected() string Parse(state State) (State, Output, *ParserError) // used by compiler (for type inference) and tests ParseAny(parentID int32, state State) (State, interface{}, *ParserError) // used by PreparedParser (top -> down) IsSafeSpot() bool Recover(State, interface{}) (int, interface{}) IsStepRecoverer() bool SwapRecoverer(Recoverer) // called during the construction phase // contains filtered or unexported methods }
Parser defines the type of a generic Parser. A few rules should be followed to prevent unexpected behaviour:
- A parser that errors must return the error
- A parser that errors should not change the position of the states input
- A parser that consumes some input must advance with state.MoveBy()
func LazyBranchParser ¶
LazyBranchParser just stores a function that creates the parser and evaluates the function later. This allows deferring the call to NewParser() and thus to define recursive grammars. Only branch parsers need this ability. A leaf parser can't be recursive by definition.
func NewBranchParser ¶
func NewBranchParser[Output any]( expected string, children func() []AnyParser, parseAfterChild func(childID int32, childStartState, childState State, childOut interface{}, childErr *ParserError, data interface{}, ) (State, Output, *ParserError, interface{}), ) Parser[Output]
NewBranchParser is THE way to create branch parsers. parseAfterChild is called with a `childID < 0` during normal (top -> down) parsing. It will be called with a `childID >= 0` during error recovery (bottom -> up).
func NewParser ¶
func NewParser[Output any]( expected string, parse func(State) (State, Output, *ParserError), recover Recoverer, ) Parser[Output]
NewParser is THE way to create simple leaf parsers. recover can be nil to signal that there is no optimized recoverer available. In case of an error, the parser will be called again and again moving forward one byte/rune at a time instead.
func NewParserWithData ¶
func NewParserWithData[Output any]( expected string, parse func(State, interface{}) (State, Output, *ParserError, interface{}), recover Recoverer, ) Parser[Output]
NewParserWithData is the way to create leaf parsers that have partial results they want to save in case of an error. recover can be nil to signal that there is no optimized recoverer available. In case of an error, the parser will be called again and again moving forward one byte/rune at a time instead.
func SafeSpot ¶
SafeSpot applies a sub-parser and marks the new state as a point of no return if successful. It really serves 3 slightly different purposes:
- Prevent a `FirstSuccessful` parser from trying later sub-parsers even in case of an error.
- Prevent other unnecessary backtracking in case of an error.
- Mark a parser as a potential safe place to recover to when recovering from an error.
So you don't need this parser at all if your input is always correct. SafeSpot is THE cornerstone of good and performant parsing otherwise.
NOTE:
- Parsers that accept the empty input or only perform look ahead are NOT allowed as sub-parsers. SafeSpot tests the optional recoverer of the parser during the construction phase to do a timely panic. This way we won't have to panic at the runtime of the parser.
- Only leaf parsers MUST be given to SafeSpot as sub-parsers. SafeSpot will treat the sub-parser as a leaf parser. Any error will look as if coming from SafeSpot itself.
type ParserError ¶
type ParserError struct {
// contains filtered or unexported fields
}
ParserError is an error message from the parser. It consists of the text itself and the position in the input where it happened.
func ClaimError ¶
func ClaimError(err *ParserError) *ParserError
ClaimError takes over an error from a sub-parser. This is used for sub-parsers that aren't reported as children.
func (*ParserError) Error ¶
func (e *ParserError) Error() string
func (*ParserError) ParserData ¶
func (e *ParserError) ParserData(parserID int32) interface{}
func (*ParserError) PatchMessage ¶
func (e *ParserError) PatchMessage(subMsg string)
func (*ParserError) StoreParserData ¶
func (e *ParserError) StoreParserData(parserID int32, data interface{})
type ParserIDs ¶
type ParserIDs struct {
// contains filtered or unexported fields
}
ParserIDs is the base of every comb parser. It enables registering of all parsers and error recovery.
type PreparedParser ¶
type PreparedParser[Output any] struct { // contains filtered or unexported fields }
func NewPreparedParser ¶
func NewPreparedParser[Output any](p Parser[Output]) *PreparedParser[Output]
NewPreparedParser prepares a parser for error recovery. Call this directly if you have a parser that you want to run on many inputs. You can use this together with RunOnState.
type Recoverer ¶
Recoverer is a simplified parser that returns the number of bytes to reach a SafeSpot. If it can't recover from the given state, it should return RecoverWasteTooMuch. If it can't recover AT ALL, it should return RecoverNever. The data given to it and returned from it is arbitrary internal data of the parser. Usually it is a partial result of the parser.
If no special recoverer is given, we will try the parser until it succeeds moving forward 1 rune/byte at a time. :(
type State ¶
type State struct {
// contains filtered or unexported fields
}
State represents the current state of a parser.
func BetterOf ¶
BetterOf returns the more advanced (in the input) state of the two and true iff it is the other. This should be used for parsers that are alternatives. So the best error is handled.
func NewFromBytes ¶
NewFromBytes creates a new parser state from the input data.
func NewFromString ¶
NewFromString creates a new parser state from the input data.
func (State) BytesRemaining ¶
func (State) CurrentBytes ¶
func (State) CurrentPos ¶
func (State) CurrentSourceLine ¶
CurrentSourceLine returns the source line corresponding to the current position including [line:column] at the start and a marker at the exact error position. This should be used for reporting errors that are detected later. The binary case is handled accordingly.
func (State) CurrentString ¶
func (State) Delete1 ¶
Delete1 moves forward in the input, thus simulating deletion of input. For binary input it moves forward by a byte otherwise by a UNICODE rune.
func (State) Errors ¶
Errors returns all error messages accumulated by the state as a Go error. Multiple errors have been joined (by errors.Join()).
func (State) GetFromCache ¶
func (State) HasError ¶
HasError returns true if any errors are registered. (Errors that would be returned by State.Errors())
func (State) MoveBackTo ¶
func (State) MoveSafeSpot ¶
MoveSafeSpot returns the state with the safe spot moved to the current position.
func (State) NewSemanticError ¶
func (st State) NewSemanticError(msg string, args ...interface{}) *ParserError
NewSemanticError creates a semantic error with the message and arguments at the current state position. The usual position and source line including marker are appended to the message.
func (State) NewSyntaxError ¶
func (st State) NewSyntaxError(msg string, args ...interface{}) *ParserError
NewSyntaxError creates a syntax error with the message and arguments at the current state position. For syntax errors `expected ` is prepended to the message, and the usual position and source line including marker are appended.
func (State) PutIntoCache ¶
func (State) SafeSpotMoved ¶
SafeSpotMoved is true iff the safe spot is different between the 2 states.
func (State) SaveError ¶
func (st State) SaveError(err *ParserError) State
SaveError saves an error and returns the new state.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package cmb contains all the standard parsers and all recoverers.
|
Package cmb contains all the standard parsers and all recoverers. |
|
examples
|
|
|
csv
Package csv implements a parser for CSV files.
|
Package csv implements a parser for CSV files. |
|
hexcolor
Package hexcolor implements a parser for hexadecimal color strings.
|
Package hexcolor implements a parser for hexadecimal color strings. |
|
redis
Package redis demonstrates the usage of the comb package to parse Redis' [RESP protocol] messages.
|
Package redis demonstrates the usage of the comb package to parse Redis' [RESP protocol] messages. |
|
experiments
|
|
|
x
|
|
|
omap
Package omap implements a very simple ordered map with just the absolute minimum features for our purpose.
|
Package omap implements a very simple ordered map with just the absolute minimum features for our purpose. |
