xorfilter

package module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: Apache-2.0 Imports: 6 Imported by: 12

README

xorfilter: Go library implementing xor and binary fuse filters

GoDoc Test

Bloom filters are used to quickly check whether an element is part of a set. Xor and binary fuse filters are a faster and more concise alternative to Bloom filters. Furthermore, unlike Bloom filters, xor and binary fuse filters are naturally compressible using standard techniques (gzip, zstd, etc.). They are also smaller than cuckoo filters. They are used in production systems.

This Go library is used by

We are assuming that your set is made of 64-bit integers. If you have strings or other data structures, you need to hash them first to a 64-bit integer. It is not important to have a good hash function, but collision should be unlikely (~1/2^64). A few collisions are acceptable, but we expect that your initial set should have no duplicated entry.

The current implementation has a false positive rate of about 0.4% and a memory usage of less than 9 bits per entry for sizeable sets.

You construct the filter as follows starting from a slice of 64-bit integers:

filter,_ := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64

It returns an object of type BinaryFuse8. The 64-bit integers would typically be hash values of your objects.

You can then query it as follows:

filter.Contains(v) // v is of type uint64

It will always return true if v was part of the initial construction (Populate) and almost always return false otherwise.

An xor filter is immutable, it is concurrent. The expectation is that you build it once and use it many times.

Though the filter itself does not use much memory, the construction of the filter needs many bytes of memory per set entry.

For persistence, you can use Save and LoadBinaryFuse8. It is uses a portable format over different systems (little/big endian).

errsave := filter.Save(...)
//...
filter, errload := LoadBinaryFuse8(&buf)

Note that it is a direct binary save/restore. There is not data integrity check: loading from corrupted sources might result in runtime errors. We recommend that you use hash codes for integrity checks.

When constructing the filter, you should ensure that there are not too many duplicate keys for best results.

Generic (8-bit, 16-bit, 32-bit)

By default, we use 8-bit fingerprints which provide a 0.4% false positive rate. Some user might want to reduce this false positive rate at the expense of more memory usage. For this purpose, we provide a generic type (NewBinaryFuse[T]).

filter8, _ := xorfilter.NewBinaryFuse[uint8](keys) // 0.39% false positive rate, uses about 9 bits per key
filter16, _ := xorfilter.NewBinaryFuse[uint16](keys) // 0.0015% false positive rate, uses about 18 bits per key
filter32, _ := xorfilter.NewBinaryFuse[uint32](keys) // 2e-08% false positive rate, uses about 36 bits per key

You can similarly save or load the data with Save and LoadBinaryFuse[uint16](...).

The 32-bit fingerprints are provided but not recommended. Most users will want to use either the 8-bit or 16-bit fingerprints.

The Binary Fuse filters have memory usages of about 9 bits per key in the 8-bit case, 18 bits per key in the 16-bit case, for sufficiently large sets (hundreds of thousands of keys). There is more per-key memory usage when the set is smaller.

Memory reuse for repeated builds

When building many filters, memory can be reused (reducing allocation and GC overhead) with a BinaryFuseBuilder:

var builder xorfilter.BinaryFuseBuilder
for {
    filter8, _ := BuildBinaryFuse[uint8](&builder, keys)
    filter16, _ := BuildBinaryFuse[uint16](&builder, keys)
    ...
}

Implementations of xor filters in other programming languages

Further reading

Mastering Programming: From Testing to Performance in Go

Documentation

Index

Constants

This section is empty.

Variables

View Source
var MaxIterations = 1024

MaxIterations is the maximum number of iterations allowed before the populate function returns an error.

Functions

This section is empty.

Types

type BinaryFuse added in v0.2.0

type BinaryFuse[T Unsigned] struct {
	Seed               uint64
	SegmentLength      uint32
	SegmentLengthMask  uint32
	SegmentCount       uint32
	SegmentCountLength uint32

	Fingerprints []T
}

func BuildBinaryFuse added in v0.3.0

func BuildBinaryFuse[T Unsigned](b *BinaryFuseBuilder, keys []uint64) (BinaryFuse[T], error)

BuildBinaryFuse creates a binary fuse filter with provided keys, reusing buffers from the BinaryFuseBuilder if possible. For best results, the caller should avoid having too many duplicated keys.

The function can mutate the given keys slice to remove duplicates.

The function may return an error if the set is empty.

func LoadBinaryFuse added in v0.4.0

func LoadBinaryFuse[T Unsigned](r io.Reader) (*BinaryFuse[T], error)

LoadBinaryFuse reads the filter from the reader assuming little endian system, using direct byte copy for performance.

func NewBinaryFuse added in v0.2.0

func NewBinaryFuse[T Unsigned](keys []uint64) (*BinaryFuse[T], error)

NewBinaryFuse creates a binary fuse filter with provided keys. For best results, the caller should avoid having too many duplicated keys.

The function can mutate the given keys slice to remove duplicates.

The function may return an error if the set is empty.

func (*BinaryFuse[T]) Contains added in v0.2.0

func (filter *BinaryFuse[T]) Contains(key uint64) bool

Contains returns `true` if key is part of the set with a false positive probability.

func (*BinaryFuse[T]) Save added in v0.4.0

func (f *BinaryFuse[T]) Save(w io.Writer) error

Save writes the filter to the writer assuming little endian system, using direct byte copy for performance.

type BinaryFuse8

type BinaryFuse8 BinaryFuse[uint8]

func LoadBinaryFuse8 added in v0.4.0

func LoadBinaryFuse8(r io.Reader) (*BinaryFuse8, error)

LoadBinaryFuse8 reads the filter from the reader in little endian format.

func PopulateBinaryFuse8

func PopulateBinaryFuse8(keys []uint64) (*BinaryFuse8, error)

PopulateBinaryFuse8 fills the filter with provided keys. For best results, the caller should avoid having too many duplicated keys. The function may return an error if the set is empty.

func (*BinaryFuse8) Contains

func (filter *BinaryFuse8) Contains(key uint64) bool

Contains returns `true` if key is part of the set with a false positive probability of <0.4%.

func (*BinaryFuse8) Save added in v0.4.0

func (f *BinaryFuse8) Save(w io.Writer) error

Save writes the filter to the writer in little endian format.

type BinaryFuseBuilder added in v0.3.0

type BinaryFuseBuilder struct {
	// contains filtered or unexported fields
}

BinaryFuseBuilder can be used to reuse memory allocations across multiple BinaryFuse builds.

type Unsigned added in v0.2.0

type Unsigned interface {
	~uint8 | ~uint16 | ~uint32
}

type Xor8

type Xor8 struct {
	Seed         uint64
	BlockLength  uint32
	Fingerprints []uint8
}

Xor8 offers a 0.3% false-positive probability

func Populate

func Populate(keys []uint64) (*Xor8, error)

Populate fills the filter with provided keys. For best results, the caller should avoid having too many duplicated keys. The function may return an error if the set is empty.

func (*Xor8) Contains

func (filter *Xor8) Contains(key uint64) bool

Contains tell you whether the key is likely part of the set

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL