CDB - 64-bit Constant Database
A native Go implementation of CDB,Constant Database, with 64-bit support and memory-mapped reading. Originates
from github.com/colinmarc/cdb. Although there is little left of the original code,
the algorithm is the same.
Originally, CDB was described as:
CDB is a fast, reliable, simple package for creating and reading constant databases. Its database structure provides
several features:
- Fast lookups: A successful lookup in a large database normally takes just two disk accesses. An unsuccessful
lookup takes only one.
- Low overhead: A database uses 4096 bytes for the index, plus 16 bytes per hash table entry, plus the space for
keys and data.
- Large file support: This 64-bit implementation can handle databases up to 8 exabytes (2^63 bytes). There are no
other restrictions; records don't even have to fit into memory.
- Machine-independent format: Databases are stored in a consistent binary format across platforms.
With mmap reads, a further improvement is gained. Care should be taken when using this on many large databases, as the
memory pressure will be different from what you might be used to.
Features
- 64-bit only: Simplified implementation supporting only 64-bit databases. These are only marginally larger than the
32-bit equivalent and have no size restrictions.
- Memory-mapped reads: Zero-copy access using mmap for optimal read performance. Reduces allocations by 90%.
- In-memory support: Read CDB data from byte slices without file I/O or mmap.
- Native Go iterators: Support for Go 1.23+
range syntax over keys, values, and key-value pairs
- Buffered writes: 64KB write buffer for efficient database creation
Quick Start
package main
import (
"log"
"github.com/perbu/cdb"
)
func main() {
// Create a new database
writer, err := cdb.Create("/tmp/example.cdb")
if err != nil {
log.Fatal(err)
}
// Write some key/value pairs
writer.Put([]byte("Alice"), []byte("Practice"))
writer.Put([]byte("Bob"), []byte("Hope"))
writer.Put([]byte("Charlie"), []byte("Horse"))
// Freeze the database and open for reads
db, err := writer.Freeze()
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Read a value
value, err := db.Get([]byte("Alice"))
if err != nil {
log.Fatal(err)
}
log.Println(string(value)) // Output: Practice
// Iterate over all key-value pairs (Go 1.23+)
for key, value := range db.All() {
log.Printf("%s: %s", key, value)
}
// Iterate over just keys
for key := range db.Keys() {
log.Printf("Key: %s", key)
}
// Iterate over just values
for value := range db.Values() {
log.Printf("Value: %s", value)
}
}
This implementation uses a 64-bit CDB format:
- Index: 4096 bytes at file start (256 tables × 16 bytes each)
- Data section: Key-value pairs with 64-bit length prefixes (16 bytes per record header)
- Hash tables: Linear probing collision resolution with 64-bit offsets
The performance goal was to get rid of the context switching and allocations that came with the original
CDB implementation. A read through a memory map will have no slowdown if the content is in the page cache already,
avoiding both the seek and read syscalls.
The most important metric for me is the time to iterate over the database. At below 2 ns, it is hard to see how it can
be faster.
goos: darwin
goarch: arm64
pkg: github.com/perbu/cdb
cpu: Apple M4
BenchmarkGet-10 54848398 22.04 ns/op 0 B/op 0 allocs/op
BenchmarkMmapIteratorAll-10 628005786 1.896 ns/op 0 B/op 0 allocs/op
BenchmarkMmapIteratorKeys-10 708844383 1.678 ns/op 0 B/op 0 allocs/op
BenchmarkMmapIteratorValues-10 708912603 1.677 ns/op 0 B/op 0 allocs/op
BenchmarkPut-10 1731052 715.6 ns/op 730 B/op 8 allocs/op
Performance on a 64-bit Linux machine is similar:
goos: linux
goarch: amd64
pkg: github.com/perbu/cdb
cpu: AMD Ryzen 7 9800X3D 8-Core Processor
BenchmarkGet-16 45158083 26.57 ns/op 0 B/op 0 allocs/op
BenchmarkMmapIteratorAll-16 445780636 2.675 ns/op 0 B/op 0 allocs/op
BenchmarkMmapIteratorKeys-16 452996073 2.677 ns/op 0 B/op 0 allocs/op
BenchmarkMmapIteratorValues-16 451605426 2.650 ns/op 0 B/op 0 allocs/op
BenchmarkPut-16 1572086 745.9 ns/op 734 B/op 8 allocs/op```
API Reference
Writing
package main
import (
"log"
"os"
"github.com/perbu/cdb"
)
func main() {
path := "/tmp/example.cdb"
// Create new database file
writer, err := cdb.Create(path)
if err != nil {
log.Fatal(err)
}
// Add key-value pair
key := []byte("example")
value := []byte("data")
err = writer.Put(key, value)
if err != nil {
log.Fatal(err)
}
// Finalize and return reader
db, err := writer.Freeze()
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Alternative: use custom WriteSeeker
file, err := os.Create("/tmp/custom.cdb")
if err != nil {
log.Fatal(err)
}
defer file.Close()
writer2, err := cdb.NewWriter(file)
if err != nil {
log.Fatal(err)
}
err = writer2.Put([]byte("test"), []byte("value"))
if err != nil {
log.Fatal(err)
}
err = writer2.Close()
if err != nil {
log.Fatal(err)
}
}
Reading
package main
import (
"fmt"
"log"
"os"
"github.com/perbu/cdb"
)
func main() {
path := "/tmp/example.cdb"
// Open with memory mapping
db, err := cdb.Open(path)
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Lookup value
key := []byte("example")
value, err := db.Get(key)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Value: %s\n", value)
// Get file size
size := db.Size()
fmt.Printf("Database size: %d bytes\n", size)
// Alternative: Create from open file
file, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
defer file.Close()
db2, err := cdb.Mmap(file)
if err != nil {
log.Fatal(err)
}
defer db2.Close()
}
In-Memory Reading
package main
import (
"fmt"
"log"
"os"
"github.com/perbu/cdb"
)
func main() {
// Read CDB file into memory
data, err := os.ReadFile("/tmp/example.cdb")
if err != nil {
log.Fatal(err)
}
// Create in-memory CDB from byte slice
db, err := cdb.NewInMemory(data)
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Use same API as memory-mapped version
value, err := db.Get([]byte("example"))
if err != nil {
log.Fatal(err)
}
fmt.Printf("Value: %s\n", value)
// Iterate over all entries
for key, value := range db.All() {
fmt.Printf("%s: %s\n", key, value)
}
}
Note: The byte slice must remain valid for the lifetime of the InMemoryCDB. Data is still accessible after
Close().
Iteration (Go 1.23+)
package main
import (
"fmt"
"log"
"github.com/perbu/cdb"
)
func main() {
// Create a sample database first
writer, err := cdb.Create("/tmp/iter_example.cdb")
if err != nil {
log.Fatal(err)
}
writer.Put([]byte("key1"), []byte("value1"))
writer.Put([]byte("key2"), []byte("value2"))
writer.Put([]byte("key3"), []byte("value3"))
db, err := writer.Freeze()
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Iterate key-value pairs
fmt.Println("Key-Value pairs:")
for key, value := range db.All() {
fmt.Printf(" %s: %s\n", key, value)
}
// Iterate keys only
fmt.Println("Keys:")
for key := range db.Keys() {
fmt.Printf(" %s\n", key)
}
// Iterate values only
fmt.Println("Values:")
for value := range db.Values() {
fmt.Printf(" %s\n", value)
}
}
Based on the original CDB specification by D. J. Bernstein.