cdb

package module
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 19, 2025 License: MIT Imports: 11 Imported by: 1

README

CDB - 64-bit Constant Database

A native Go implementation of CDB,Constant Database, with 64-bit support and memory-mapped reading. Originates from github.com/colinmarc/cdb. Although there is little left of the original code, the algorithm is the same.

Originally, CDB was described as:

CDB is a fast, reliable, simple package for creating and reading constant databases. Its database structure provides several features:

  • Fast lookups: A successful lookup in a large database normally takes just two disk accesses. An unsuccessful lookup takes only one.
  • Low overhead: A database uses 4096 bytes for the index, plus 16 bytes per hash table entry, plus the space for keys and data.
  • Large file support: This 64-bit implementation can handle databases up to 8 exabytes (2^63 bytes). There are no other restrictions; records don't even have to fit into memory.
  • Machine-independent format: Databases are stored in a consistent binary format across platforms.

With mmap reads, a further improvement is gained. Care should be taken when using this on many large databases, as the memory pressure will be different from what you might be used to.

Features

  • 64-bit only: Simplified implementation supporting only 64-bit databases. These are only marginally larger than the 32-bit equivalent and have no size restrictions.
  • Memory-mapped reads: Zero-copy access using mmap for optimal read performance. Reduces allocations by 90%.
  • In-memory support: Read CDB data from byte slices without file I/O or mmap.
  • Native Go iterators: Support for Go 1.23+ range syntax over keys, values, and key-value pairs
  • Buffered writes: 64KB write buffer for efficient database creation

Quick Start

package main

import (
	"log"

	"github.com/perbu/cdb"
)

func main() {
	// Create a new database
	writer, err := cdb.Create("/tmp/example.cdb")
	if err != nil {
		log.Fatal(err)
	}

	// Write some key/value pairs
	writer.Put([]byte("Alice"), []byte("Practice"))
	writer.Put([]byte("Bob"), []byte("Hope"))
	writer.Put([]byte("Charlie"), []byte("Horse"))

	// Freeze the database and open for reads
	db, err := writer.Freeze()
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Read a value
	value, err := db.Get([]byte("Alice"))
	if err != nil {
		log.Fatal(err)
	}
	log.Println(string(value)) // Output: Practice

	// Iterate over all key-value pairs (Go 1.23+)
	for key, value := range db.All() {
		log.Printf("%s: %s", key, value)
	}

	// Iterate over just keys
	for key := range db.Keys() {
		log.Printf("Key: %s", key)
	}

	// Iterate over just values  
	for value := range db.Values() {
		log.Printf("Value: %s", value)
	}
}

File Format

This implementation uses a 64-bit CDB format:

  • Index: 4096 bytes at file start (256 tables × 16 bytes each)
  • Data section: Key-value pairs with 64-bit length prefixes (16 bytes per record header)
  • Hash tables: Linear probing collision resolution with 64-bit offsets

Performance

The performance goal was to get rid of the context switching and allocations that came with the original CDB implementation. A read through a memory map will have no slowdown if the content is in the page cache already, avoiding both the seek and read syscalls.

The most important metric for me is the time to iterate over the database. At below 2 ns, it is hard to see how it can be faster.

goos: darwin
goarch: arm64
pkg: github.com/perbu/cdb
cpu: Apple M4
BenchmarkGet-10                         54848398                22.04 ns/op            0 B/op          0 allocs/op
BenchmarkMmapIteratorAll-10             628005786                1.896 ns/op           0 B/op          0 allocs/op
BenchmarkMmapIteratorKeys-10            708844383                1.678 ns/op           0 B/op          0 allocs/op
BenchmarkMmapIteratorValues-10          708912603                1.677 ns/op           0 B/op          0 allocs/op
BenchmarkPut-10                          1731052               715.6 ns/op           730 B/op          8 allocs/op

Performance on a 64-bit Linux machine is similar:

goos: linux
goarch: amd64
pkg: github.com/perbu/cdb
cpu: AMD Ryzen 7 9800X3D 8-Core Processor
BenchmarkGet-16                         45158083                26.57 ns/op            0 B/op          0 allocs/op
BenchmarkMmapIteratorAll-16             445780636                2.675 ns/op           0 B/op          0 allocs/op
BenchmarkMmapIteratorKeys-16            452996073                2.677 ns/op           0 B/op          0 allocs/op
BenchmarkMmapIteratorValues-16          451605426                2.650 ns/op           0 B/op          0 allocs/op
BenchmarkPut-16                          1572086               745.9 ns/op           734 B/op          8 allocs/op```

API Reference

Writing
package main

import (
	"log"
	"os"

	"github.com/perbu/cdb"
)

func main() {
	path := "/tmp/example.cdb"

	// Create new database file
	writer, err := cdb.Create(path)
	if err != nil {
		log.Fatal(err)
	}

	// Add key-value pair
	key := []byte("example")
	value := []byte("data")
	err = writer.Put(key, value)
	if err != nil {
		log.Fatal(err)
	}

	// Finalize and return reader
	db, err := writer.Freeze()
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Alternative: use custom WriteSeeker
	file, err := os.Create("/tmp/custom.cdb")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	writer2, err := cdb.NewWriter(file)
	if err != nil {
		log.Fatal(err)
	}

	err = writer2.Put([]byte("test"), []byte("value"))
	if err != nil {
		log.Fatal(err)
	}

	err = writer2.Close()
	if err != nil {
		log.Fatal(err)
	}
}
Reading
package main

import (
	"fmt"
	"log"
	"os"

	"github.com/perbu/cdb"
)

func main() {
	path := "/tmp/example.cdb"

	// Open with memory mapping
	db, err := cdb.Open(path)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Lookup value
	key := []byte("example")
	value, err := db.Get(key)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Value: %s\n", value)

	// Get file size
	size := db.Size()
	fmt.Printf("Database size: %d bytes\n", size)

	// Alternative: Create from open file
	file, err := os.Open(path)
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	db2, err := cdb.Mmap(file)
	if err != nil {
		log.Fatal(err)
	}
	defer db2.Close()
}
In-Memory Reading
package main

import (
	"fmt"
	"log"
	"os"

	"github.com/perbu/cdb"
)

func main() {
	// Read CDB file into memory
	data, err := os.ReadFile("/tmp/example.cdb")
	if err != nil {
		log.Fatal(err)
	}

	// Create in-memory CDB from byte slice
	db, err := cdb.NewInMemory(data)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Use same API as memory-mapped version
	value, err := db.Get([]byte("example"))
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Value: %s\n", value)

	// Iterate over all entries
	for key, value := range db.All() {
		fmt.Printf("%s: %s\n", key, value)
	}
}

Note: The byte slice must remain valid for the lifetime of the InMemoryCDB. Data is still accessible after Close().

Iteration (Go 1.23+)
package main

import (
	"fmt"
	"log"

	"github.com/perbu/cdb"
)

func main() {
	// Create a sample database first
	writer, err := cdb.Create("/tmp/iter_example.cdb")
	if err != nil {
		log.Fatal(err)
	}

	writer.Put([]byte("key1"), []byte("value1"))
	writer.Put([]byte("key2"), []byte("value2"))
	writer.Put([]byte("key3"), []byte("value3"))

	db, err := writer.Freeze()
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// Iterate key-value pairs
	fmt.Println("Key-Value pairs:")
	for key, value := range db.All() {
		fmt.Printf("  %s: %s\n", key, value)
	}

	// Iterate keys only
	fmt.Println("Keys:")
	for key := range db.Keys() {
		fmt.Printf("  %s\n", key)
	}

	// Iterate values only
	fmt.Println("Values:")
	for value := range db.Values() {
		fmt.Printf("  %s\n", value)
	}
}

Based on the original CDB specification by D. J. Bernstein.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrTooMuchData = errors.New("CDB files are limited to 8EB of data")

Functions

This section is empty.

Types

type InMemoryCDB added in v0.2.0

type InMemoryCDB struct {
	// contains filtered or unexported fields
}

InMemoryCDB represents an in-memory 64-bit CDB database. The data slice must remain valid for the lifetime of the InMemoryCDB. The returned key and value slices from its methods point directly to the underlying data and are valid as long as the data slice remains valid. Do not modify the contents of the returned slices.

func NewInMemory added in v0.2.0

func NewInMemory(data []byte) (*InMemoryCDB, error)

NewInMemory creates an in-memory 64-bit CDB from a byte slice containing a complete CDB database. The caller must ensure the data slice remains valid for the lifetime of the InMemoryCDB and is not modified.

func (*InMemoryCDB) All added in v0.2.0

func (cdb *InMemoryCDB) All() iter.Seq2[[]byte, []byte]

All returns an iterator over all key-value pairs in the database.

func (*InMemoryCDB) Close added in v0.2.0

func (cdb *InMemoryCDB) Close() error

Close is a no-op for InMemoryCDB since there are no resources to release. The caller is responsible for managing the lifetime of the underlying data slice.

func (*InMemoryCDB) Get added in v0.2.0

func (cdb *InMemoryCDB) Get(key []byte) ([]byte, error)

Get returns the value for a given key from the in-memory CDB.

func (*InMemoryCDB) Keys added in v0.2.0

func (cdb *InMemoryCDB) Keys() iter.Seq[[]byte]

Keys returns an iterator over all keys in the database.

func (*InMemoryCDB) Size added in v0.2.0

func (cdb *InMemoryCDB) Size() int

Size returns the size of the in-memory data.

func (*InMemoryCDB) Values added in v0.2.0

func (cdb *InMemoryCDB) Values() iter.Seq[[]byte]

Values returns an iterator over all values in the database.

type MmapCDB

type MmapCDB struct {
	// contains filtered or unexported fields
}

MmapCDB represents a memory-mapped 64-bit CDB database. The returned key and value slices from its methods point directly to the memory-mapped file data and are valid only until the database is closed. Do not modify the contents of the returned slices.

func Mmap

func Mmap(file *os.File) (*MmapCDB, error)

Mmap creates a memory-mapped 64-bit CDB from an open file.

func Open

func Open(path string) (*MmapCDB, error)

Open opens a 64-bit CDB file at the given path using memory mapping for reads.

func (*MmapCDB) All

func (cdb *MmapCDB) All() iter.Seq2[[]byte, []byte]

All returns an iterator over all key-value pairs in the database.

func (*MmapCDB) Close

func (cdb *MmapCDB) Close() error

Close unmaps the file and closes the file descriptor.

func (*MmapCDB) Get

func (cdb *MmapCDB) Get(key []byte) ([]byte, error)

Get returns the value for a given key using memory-mapped access.

func (*MmapCDB) Keys

func (cdb *MmapCDB) Keys() iter.Seq[[]byte]

Keys returns an iterator over all keys in the database.

func (*MmapCDB) Size

func (cdb *MmapCDB) Size() int

Size returns the size of the memory-mapped data.

func (*MmapCDB) Values

func (cdb *MmapCDB) Values() iter.Seq[[]byte]

Values returns an iterator over all values in the database.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer provides an API for creating a 64-bit CDB database record by record.

Close or Freeze must be called to finalize the database, or the resulting file will be invalid.

func Create

func Create(path string) (*Writer, error)

Create opens a 64-bit CDB database at the given path. If the file exists, it will be overwritten. The returned database is not safe for concurrent writes.

func NewWriter

func NewWriter(writer io.WriteSeeker) (*Writer, error)

NewWriter opens a 64-bit CDB database for the given io.WriteSeeker.

func (*Writer) Close

func (cdb *Writer) Close() error

Close finalizes the database and closes the underlying io.WriteSeeker.

func (*Writer) Freeze

func (cdb *Writer) Freeze() (*MmapCDB, error)

Freeze finalizes the database and returns an MmapCDB instance for reading.

func (*Writer) Put

func (cdb *Writer) Put(key, value []byte) error

Put adds a key/value pair to the database. If the amount of data written would exceed the limit, Put returns ErrTooMuchData.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL