celt

package
v0.0.0-...-cb01d58 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2026 License: BSD-3-Clause Imports: 10 Imported by: 0

Documentation

Overview

Package celt implements the CELT decoder per RFC 6716 Section 4.3.

Package celt implements the CELT (Constrained-Energy Lapped Transform) layer of the Opus codec as specified in RFC 6716 Section 4.3.

Package celt implements the CELT decoder per RFC 6716 Section 4.3.

Code generated by tools/gen_math_utils_tables.go; DO NOT EDIT.

Package celt implements the CELT decoder per RFC 6716 Section 4.3. CELT (Constrained Energy Lapped Transform) is the transform-based layer of Opus for music and general audio.

Code generated by tools/gen_window_tables.go; DO NOT EDIT.

Index

Constants

View Source
const (
	SpreadNone       = spreadNone
	SpreadLight      = spreadLight
	SpreadNormal     = spreadNormal
	SpreadAggressive = spreadAggressive
)

Exported spread constants for callers outside the celt package.

View Source
const (
	// MaxPVQK is the maximum number of pulses we support in PVQ coding.
	MaxPVQK = 128
	// MaxPVQN is the maximum number of dimensions we support.
	MaxPVQN = 256
)

Constants for CWRS

View Source
const BetaIntra = 4915.0 / 32768.0 // 0.15

BetaIntra is the inter-band prediction coefficient for INTRA-frame mode. No inter-frame prediction, only inter-band. Source: libopus celt/quant_bands.c

View Source
const BitRes = bitRes

BitRes is the Q3 fixed-point resolution used throughout CELT allocation math. This mirrors libopus BITRES.

View Source
const CELTSigScale = 32768.0

CELTSigScale is the internal signal scale used by CELT. Input samples in float range [-1.0, 1.0] are scaled up by this factor for internal processing, matching libopus CELT_SIG_SCALE.

View Source
const DB6 = 1.0

DB6 is the value corresponding to a 6 dB step in CELT's log2 energy units. In libopus, energies are stored in log2 units, so 6 dB = 1.0.

View Source
const DCRejectCutoffHz = 3

DCRejectCutoffHz is the cutoff frequency for the DC rejection high-pass filter. libopus uses 3 Hz at the Opus encoder level. Reference: libopus src/opus_encoder.c line 2008

View Source
const DecodeBufferSize = 2048

DecodeBufferSize matches libopus DEC_PITCH_BUF_SIZE (decode_mem length without overlap). This buffer is larger than any single frame to preserve history for short blocks.

View Source
const DelayCompensation = 192

DelayCompensation is the number of samples of lookahead for CELT. libopus uses Fs/250 = 192 samples at 48kHz (4ms). This provides a lookahead that allows for better transient handling. Reference: libopus src/opus_encoder.c delay_compensation

View Source
const HybridCELTStartBand = 17

HybridCELTStartBand is the first CELT band decoded in hybrid mode. Bands 0-16 are covered by SILK; CELT only decodes bands 17-21.

View Source
const IntensityDecay = 16384

IntensityDecay is the decay parameter for intensity stereo Laplace encoding. Matches the decoder's expectation for stereo param decoding. Reference: libopus celt/celt_decoder.c, stereo parameter decoding

View Source
const MaxBands = 21

MaxBands is the maximum number of frequency bands in CELT. These are Bark-scale bands covering 0-20kHz at 48kHz sample rate.

View Source
const Overlap = 120

Overlap is the number of overlap samples at 48kHz (2.5ms window overlap). This is fixed for all CELT frame sizes.

View Source
const PreemphCoef = 0.85000610

PreemphCoef is the de-emphasis filter coefficient. The encoder applies pre-emphasis; decoder applies inverse de-emphasis: y[n] = x[n] + PreemphCoef * y[n-1] Value matches libopus static_modes_float.h: 0.85000610f

View Source
const SilkCELTDelay = 60

SilkCELTDelay is the delay compensation in samples at 48kHz for hybrid mode. SILK needs to be delayed relative to CELT for proper time alignment.

View Source
const TransientMinEnergy = 1e-10

TransientMinEnergy is the minimum energy to consider for transient detection. Very quiet frames are not considered transient.

View Source
const TransientThreshold = 4.0

TransientThreshold is the energy ratio threshold for transient detection. A ratio > 4.0 (6dB difference) between adjacent sub-blocks triggers short blocks. Reference: libopus celt/celt_encoder.c transient_analysis()

Variables

View Source
var (
	// ErrInvalidFrame indicates the frame data is invalid or corrupted.
	ErrInvalidFrame = errors.New("celt: invalid frame data")

	// ErrInvalidFrameSize indicates an unsupported frame size.
	ErrInvalidFrameSize = errors.New("celt: invalid frame size")

	// ErrNilDecoder indicates a nil range decoder was passed.
	ErrNilDecoder = errors.New("celt: nil range decoder")
)

Decoding errors

View Source
var (
	// ErrInvalidInputLength indicates the PCM input length doesn't match frame size.
	ErrInvalidInputLength = errors.New("celt: invalid input length")

	// ErrEncodingFailed indicates a general encoding failure.
	ErrEncodingFailed = errors.New("celt: encoding failed")
)

Encoding errors

View Source
var AlphaCoef = [4]float64{
	29440.0 / 32768.0,
	26112.0 / 32768.0,
	21248.0 / 32768.0,
	16384.0 / 32768.0,
}

AlphaCoef contains inter-frame energy prediction coefficients by LM (log mode). Used for coarse energy decoding in inter-frame mode. Index corresponds to LM: 0=2.5ms, 1=5ms, 2=10ms, 3=20ms

Source: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c

View Source
var BandAlloc = [11][21]int{

	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},

	{90, 80, 75, 69, 63, 56, 49, 40, 34, 29, 20, 18, 10, 0, 0, 0, 0, 0, 0, 0, 0},

	{110, 100, 90, 84, 78, 71, 65, 58, 51, 45, 39, 32, 26, 20, 12, 0, 0, 0, 0, 0, 0},

	{118, 110, 103, 93, 86, 80, 75, 70, 65, 59, 53, 47, 40, 31, 23, 15, 4, 0, 0, 0, 0},

	{126, 119, 112, 104, 95, 89, 83, 78, 72, 66, 60, 54, 47, 39, 32, 25, 17, 12, 1, 0, 0},

	{134, 127, 120, 114, 103, 97, 91, 85, 78, 72, 66, 60, 54, 47, 41, 35, 29, 23, 16, 10, 1},

	{144, 137, 130, 124, 113, 107, 101, 95, 88, 82, 76, 70, 64, 57, 51, 45, 39, 33, 26, 15, 1},

	{152, 145, 138, 132, 123, 117, 111, 105, 98, 92, 86, 80, 74, 67, 61, 55, 49, 43, 36, 20, 1},

	{162, 155, 148, 142, 133, 127, 121, 115, 108, 102, 96, 90, 84, 77, 71, 65, 59, 53, 46, 30, 1},

	{172, 165, 158, 152, 143, 137, 131, 125, 118, 112, 106, 100, 94, 87, 81, 75, 69, 63, 56, 45, 20},

	{200, 200, 200, 200, 200, 200, 200, 200, 198, 193, 188, 183, 178, 173, 168, 163, 158, 153, 148, 129, 104},
}

BandAlloc contains base bits per band at each quality level. Index: BandAlloc[quality][band] where quality is 0-10 (11 levels). Values represent bits in 1/32 bit/sample (as in libopus band_allocation). Source: libopus celt/modes.c band_allocation table.

View Source
var BetaCoefInter = [4]float64{
	30147.0 / 32768.0,
	22282.0 / 32768.0,
	12124.0 / 32768.0,
	6554.0 / 32768.0,
}

BetaCoefInter contains inter-band energy prediction coefficients for INTER-frame mode. Values vary by LM (log mode / frame size). Source: libopus celt/quant_bands.c

View Source
var EBands = [22]int{
	0, 1, 2, 3, 4, 5, 6, 7, 8, 10,
	12, 14, 16, 20, 24, 28, 34, 40, 48, 60,
	78, 100,
}

EBands contains the MDCT bin indices for band edges at 48kHz with 5ms base frame. These 22 values define 21 bands. Each band spans from EBands[i] to EBands[i+1]. For other frame sizes, these indices are scaled appropriately.

Frequency boundaries (approximate): 0Hz, 200Hz, 400Hz, 600Hz, 800Hz, 1000Hz, 1200Hz, 1400Hz, 1600Hz, 2000Hz, 2400Hz, 2800Hz, 3200Hz, 4000Hz, 4800Hz, 5600Hz, 6800Hz, 8000Hz, 9600Hz, 12000Hz, 15600Hz, 20000Hz

Source: libopus celt/modes.c (eBand5ms table)

View Source
var EMeans = [25]float64{
	6.437500, 6.250000, 5.750000, 5.312500, 5.062500,
	4.812500, 4.500000, 4.375000, 4.875000, 4.687500,
	4.562500, 4.437500, 4.875000, 4.625000, 4.312500,
	4.500000, 4.375000, 4.625000, 4.750000, 4.437500,
	3.750000, 3.750000, 3.750000, 3.750000, 3.750000,
}

EMeans contains the mean log-energy per band in float64 format. These values are in log2 units (1.0 = 6 dB) and represent typical energy distribution across frequency bands. Source: libopus celt/quant_bands.c (float eMeans table, lines 56-62)

View Source
var LogN = [21]int{
	0, 0, 0, 0, 0, 0, 0, 0,
	8, 8, 8, 8,
	16, 16, 16,
	21, 21,
	24,
	29,
	34,
	36,
}

LogN contains log2 of band widths (in Q3 fixed-point) for bit allocation. This is used in the bit allocation algorithm to weight bands. For band i, width = EBands[i+1] - EBands[i], and LogN[i] = round(log2(width) * 256)

Source: libopus celt/modes.c (logN400 table)

View Source
var SmallDiv = [129]uint16{}/* 129 elements not displayed */

SmallDiv contains precomputed values for efficient small division in Laplace decoding. SmallDiv[i] = (1 << 16) / (i + 1) for i in 0..128 Used for energy decoding to avoid expensive division operations.

Source: libopus celt/laplace.c (ec_laplace_decode)

View Source
var SpreadICDF = spreadICDF

SpreadICDF exposes the spread entropy table for callers that need it.

View Source
var TrimICDF = trimICDF

TrimICDF exposes the allocation trim entropy table for callers that need it.

Functions

func AllocTrimAnalysis

func AllocTrimAnalysis(
	normCoeffs []float64,
	bandLogE []float64,
	nbBands int,
	lm int,
	channels int,
	normCoeffsRight []float64,
	intensity int,
	tfEstimate float64,
	equivRate int,
	surroundTrim float64,
	tonalitySlope float64,
) int

AllocTrimAnalysis computes the optimal allocation trim value for a CELT frame. The trim value biases bit allocation between lower and higher frequency bands. A higher trim value allocates more bits to lower frequencies.

The algorithm considers: - Equivalent bitrate (lower bitrates favor lower trim) - Spectral tilt (energy distribution across bands) - TF estimate (transient characteristic) - Stereo correlation (for stereo signals) - Tonality slope (optional, from analysis)

Parameters:

  • normCoeffs: normalized MDCT coefficients (left channel for stereo, or mono)
  • bandLogE: band log-energies [nbBands * channels]
  • nbBands: number of frequency bands
  • lm: log mode (frame size index)
  • channels: 1 for mono, 2 for stereo
  • normCoeffsRight: normalized right channel coefficients (nil for mono)
  • intensity: intensity stereo band threshold (nbBands for no intensity stereo)
  • tfEstimate: TF estimate from transient analysis (0.0-1.0)
  • equivRate: equivalent bitrate in bits per second
  • surroundTrim: surround mix trim adjustment (0 for non-surround)
  • tonalitySlope: tonality slope from analysis (-1 to 1, 0 if not available)

Returns: trim index in range [0, 10], where 5 is the neutral default

Reference: libopus celt/celt_encoder.c alloc_trim_analysis()

func Amp2Log2

func Amp2Log2(bandE []float64, effEnd, end, channels int) []float64

Amp2Log2 converts amplitude (sqrt energy) to log2 format for quantization. This matches libopus amp2Log2() in quant_bands.c.

The conversion:

bandLogE[i] = celt_log2_db(bandE[i]) - eMeans[i]

Parameters:

  • bandE: band amplitudes (sqrt energy)
  • effEnd: effective end band
  • channels: number of channels

Returns: bandLogE values suitable for quantization

Reference: libopus celt/quant_bands.c amp2Log2()

func ApplyAntiCollapse

func ApplyAntiCollapse(shape []float64, energy, prevEnergy1, prevEnergy2, gain float64, seed *uint32) []float64

ApplyAntiCollapse applies anti-collapse noise injection to a band. shape: the band vector (folded or decoded) energy: the band energy level prevEnergy1: energy from previous frame prevEnergy2: energy from two frames ago seed: RNG state gain: anti-collapse gain factor Returns: modified shape vector with noise injected.

Anti-collapse prevents artifacts when a band that had energy in previous frames suddenly receives no pulses (collapses). A small amount of shaped noise is added to mask the sudden silence.

Reference: RFC 6716 Section 4.3.5, libopus celt/bands.c anti_collapse()

func ApplyIntensityStereo

func ApplyIntensityStereo(mono []float64, inversionFlag int) (left, right []float64)

ApplyIntensityStereo applies intensity stereo to a band. This is a convenience function that decodes the inversion flag and applies it.

func ApplyMidSideRotation

func ApplyMidSideRotation(mid, side []float64, midGain, sideGain float64) (left, right []float64)

ApplyMidSideRotation rotates mid-side vectors to left-right. mid: mid channel coefficients side: side channel coefficients midGain, sideGain: rotation gains from theta Returns: left and right channel coefficients

func ApplyWindow

func ApplyWindow(samples []float64, overlap int)

ApplyWindow applies the Vorbis window to IMDCT output. The window is applied to both the beginning and end overlap regions.

Parameters:

  • samples: IMDCT output (length 2*N where N is MDCT size)
  • overlap: overlap size (typically 120 for CELT)

The windowing is in-place to avoid allocation.

func ApplyWindowSymmetric

func ApplyWindowSymmetric(samples []float64, overlap int)

ApplyWindowSymmetric applies window assuming symmetric IMDCT output. This is optimized for the CELT case where the IMDCT output has known symmetry.

func BandWidth

func BandWidth(band int) int

BandWidth returns the number of MDCT bins in the given band at base frame size. For band i, this is EBands[i+1] - EBands[i].

func BitsToPulsesExport

func BitsToPulsesExport(band, lm, bitsQ3 int) int

BitsToPulsesExport exposes bitsToPulses for testing.

func ClearCache

func ClearCache()

ClearCache is a no-op for compatibility. The new implementation uses a static table and doesn't need cache clearing.

func ClearCollapseMask

func ClearCollapseMask(mask *uint32)

ClearCollapseMask resets the collapse mask to zero.

func ComputeBandEnergy

func ComputeBandEnergy(coeffs []float64) float64

ComputeBandEnergy computes the per-band log2 amplitude. coeffs: MDCT coefficients for the band Returns: log2(sqrt(sum(x^2))) with libopus epsilon

func ComputeEquivRate

func ComputeEquivRate(nbCompressedBytes, channels, lm, targetBitrate int) int

ComputeEquivRate computes the equivalent bitrate for allocation trim analysis. This matches libopus computation in celt_encoder.c line 1925.

Parameters:

  • nbCompressedBytes: target compressed packet size in bytes
  • channels: number of audio channels (1 or 2)
  • lm: log mode (frame size index: 0=2.5ms, 1=5ms, 2=10ms, 3=20ms)
  • targetBitrate: target bitrate in bps (0 if using fixed packet size)

Returns: equivalent bitrate in bits per second

Reference: libopus celt/celt_encoder.c line 1925:

equiv_rate = ((opus_int32)nbCompressedBytes*8*50 << (3-LM)) - (40*C+20)*((400>>LM) - 50);

func ComputeGains

func ComputeGains(itheta, qn int) (midGain, sideGain float64)

ComputeGains converts itheta to mid and side gains. This is equivalent to cos(theta) and sin(theta).

Parameters:

  • itheta: quantized angle (0 to qn)
  • qn: number of quantization steps

Returns: mid gain (cos), side gain (sin)

func ComputeImportance

func ComputeImportance(bandLogE, oldBandE []float64, nbBands, channels, lm, lsbDepth, effectiveBytes int) []int

ComputeImportance computes per-band importance weights for TF analysis. Importance weights affect how much each band's TF decision matters in the Viterbi search.

The importance is derived from: - Band log energies relative to a spectral follower curve - Noise floor based on band width and quantization depth - Masking effects from neighboring bands

Higher importance means the band's TF decision has more perceptual impact. Default value is 13 (when no analysis is performed).

Parameters:

  • bandLogE: log-domain band energies (mean-relative, from ComputeBandEnergies)
  • oldBandE: previous frame band energies (for temporal smoothing)
  • nbBands: number of bands
  • channels: number of audio channels
  • lm: log mode (frame size index)
  • lsbDepth: bit depth of input signal (typically 16 or 24)
  • effectiveBytes: available bytes for encoding

Returns: per-band importance weights (13 = neutral, higher = more important)

Reference: libopus celt/celt_encoder.c dynalloc_analysis() importance calculation

func ComputeLinearBandAmplitudes

func ComputeLinearBandAmplitudes(mdctCoeffs []float64, nbBands, frameSize int) []float64

ComputeLinearBandAmplitudes computes linear band amplitudes directly from MDCT coefficients. This matches libopus compute_band_energies() which returns sqrt(sum of squares) per band. The result is in LINEAR scale (not log), ready to use as normalization divisor.

CRITICAL: This function computes the ORIGINAL linear amplitude from MDCT coefficients, which must be used for normalization. Do NOT use log-domain energies converted back to linear, as that introduces quantization/roundtrip errors.

Reference: libopus celt/bands.c compute_band_energies() (float path, lines 154-170)

func ComputeLinearBandAmplitudesInto

func ComputeLinearBandAmplitudesInto(mdctCoeffs []float64, nbBands, frameSize int, bandE []float64)

ComputeLinearBandAmplitudesInto computes linear band amplitudes into the provided buffer. This is the zero-allocation version of ComputeLinearBandAmplitudes.

func ComputeMDCTWithHistory

func ComputeMDCTWithHistory(samples, history []float64, shortBlocks int) []float64

ComputeMDCTWithHistory computes MDCT using a history buffer for overlap. samples: current frame samples history: buffer containing previous frame's tail (will be updated with current frame's tail) shortBlocks: number of short blocks for transient mode

func ComputeMDCTWithHistoryInto

func ComputeMDCTWithHistoryInto(scratch, samples, history []float64, shortBlocks int) []float64

ComputeMDCTWithHistoryInto computes MDCT using a history buffer for overlap, assembling the input into the caller-provided scratch buffer. scratch must have capacity >= len(samples)+Overlap. history is updated in-place with the current frame's tail.

func ComputeSpectralFlux

func ComputeSpectralFlux(currentEnergies, previousEnergies []float64, nbBands int) float64

ComputeSpectralFlux computes the frame-to-frame spectral change. This measures how much the spectrum has changed between consecutive frames. Low flux indicates a stationary tone, high flux indicates transients or noise.

Parameters:

  • currentEnergies: current frame band energies (log-domain)
  • previousEnergies: previous frame band energies (log-domain)
  • nbBands: number of bands to compare

Returns: normalized spectral flux in range [0, 1]

Reference: libopus uses similar metrics for transient detection

func ComputeSpreadWeights

func ComputeSpreadWeights(bandLogE []float64, nbBands, channels, lsbDepth int) []int

ComputeSpreadWeights computes per-band weights for the spread decision. Higher weights for perceptually important bands based on masking analysis.

This implements the libopus masking model from dynalloc_analysis(): 1. Compute noise floor per band (based on logN, lsb_depth, eMeans, preemphasis) 2. Compute signal as max(bandLogE) - noise floor 3. Apply forward/backward masking propagation 4. Compute SMR (signal-to-mask ratio) 5. Convert SMR to spread weight: 32 >> clamp(-round(smr), 0, 5)

Parameters:

  • bandLogE: log-domain band energies (may contain multiple channels: C*nbBands)
  • nbBands: number of bands per channel
  • channels: number of audio channels (1 or 2)
  • lsbDepth: bit depth of input (typically 16 or 24)

Returns: weights per band (higher = more perceptually important)

Reference: libopus celt/celt_encoder.c dynalloc_analysis()

func ComputeSpreadWeightsSimple

func ComputeSpreadWeightsSimple(bandLogE []float64, nbBands int) []int

ComputeSpreadWeightsSimple computes spread weights with default parameters. This is a convenience wrapper for the common case of mono audio with 16-bit depth.

Parameters:

  • bandLogE: log-domain band energies
  • nbBands: number of bands

Returns: weights per band (higher = more important)

func ComputeStereoAngle

func ComputeStereoAngle(energyL, energyR float64) float64

ComputeStereoAngle computes the stereo angle from L/R energies. Returns theta in radians [0, pi/2] representing the stereo image width. theta = 0: mono (all energy in mid) theta = pi/4: balanced stereo theta = pi/2: pure side (opposite channels)

func ComputeTheta

func ComputeTheta(itheta, qn int) float64

ComputeTheta converts quantized itheta to angle in radians. itheta is quantized to qn steps over [0, pi/2].

Parameters:

  • itheta: quantized angle (0 to qn)
  • qn: number of quantization steps

Returns: theta in radians [0, pi/2]

func ConvertMidSideToLR

func ConvertMidSideToLR(mid, side []float64) (left, right []float64)

ConvertMidSideToLR converts mid/side to L/R representation. This is the inverse of ConvertToMidSide.

The conversion is:

left[i] = (mid[i] + side[i]) / sqrt(2)
right[i] = (mid[i] - side[i]) / sqrt(2)

Combined with ConvertToMidSide, this forms an identity transform: L,R -> M,S -> L,R (with floating point precision)

func ConvertToMidSide

func ConvertToMidSide(left, right []float64) (mid, side []float64)

ConvertToMidSide converts L/R stereo to mid/side representation. This is the inverse of MidSideToLR.

The conversion is:

mid[i] = (left[i] + right[i]) / sqrt(2)
side[i] = (left[i] - right[i]) / sqrt(2)

The sqrt(2) normalization preserves energy: |L|^2 + |R|^2 = |M|^2 + |S|^2

Parameters:

  • left: left channel samples
  • right: right channel samples

Returns: mid and side channel arrays

Reference: RFC 6716 Section 4.3.4

func ConvertToMidSideInPlace

func ConvertToMidSideInPlace(left, right []float64)

ConvertToMidSideInPlace converts L/R to M/S in-place. The left array becomes mid, right array becomes side. More efficient when copies are not needed.

func DecodePulses

func DecodePulses(index uint32, n, k int) []int

DecodePulses converts a CWRS index to a pulse vector.

Parameters:

  • index: the combinatorial index (0 to V(n,k)-1)
  • n: number of dimensions (band width)
  • k: total number of pulses (sum of absolute values)

Returns: pulse vector of length n, where sum(|v[i]|) == k

The algorithm walks through positions, determining how many pulses go at each position by counting codewords in the combinatorial structure.

This implementation uses the precomputed table for O(1) lookups when possible.

Reference: libopus celt/cwrs.c decode_pulses() / cwrsi()

func DeinterleaveStereo

func DeinterleaveStereo(interleaved []float64) (left, right []float64)

DeinterleaveStereo separates interleaved stereo samples into L and R arrays. Input: [L0, R0, L1, R1, ...] Output: [L0, L1, ...], [R0, R1, ...]

func DeinterleaveStereoInto

func DeinterleaveStereoInto(interleaved, left, right []float64)

DeinterleaveStereoInto separates interleaved stereo samples into pre-allocated L and R slices. left and right must each have capacity >= len(interleaved)/2.

func DenormalizeBand

func DenormalizeBand(shape []float64, energy float64) []float64

DenormalizeBand scales a normalized band vector by its energy. shape: normalized vector (unit L2 norm) energy: band energy in log2 units (1 = 6 dB) Returns: denormalized MDCT coefficients.

This matches libopus celt/bands.c denormalise_bands().

func DualStereoSplit

func DualStereoSplit(coeffsL, coeffsR []float64) (left, right []float64)

DualStereoSplit handles dual stereo mode where channels are independent. Simply returns copies of the input slices for consistent interface.

Parameters:

  • coeffsL, coeffsR: independently decoded left and right coefficients

Returns: left and right arrays (copies)

func DuplicateMonoToStereo

func DuplicateMonoToStereo(mono []float64) (left, right []float64)

DuplicateMonoToStereo creates stereo by duplicating mono to both channels.

func DurationFromFrameSize

func DurationFromFrameSize(frameSize int) float64

DurationFromFrameSize returns the frame duration in milliseconds.

func EffectiveBandsForFrameSize

func EffectiveBandsForFrameSize(bw CELTBandwidth, frameSize int) int

EffectiveBandsForFrameSize returns the effective band count considering both bandwidth and frame size constraints.

func Encode

func Encode(pcm []float64, frameSize int) ([]byte, error)

Encode encodes mono PCM samples to a CELT packet. pcm: float64 samples at 48kHz frameSize: 120, 240, 480, or 960 samples Returns: encoded Opus CELT packet bytes

This is the simple public API for mono encoding. For more control, use NewEncoder() and call EncodeFrame() directly.

Reference: RFC 6716 Section 4.3

func EncodeFrames

func EncodeFrames(pcmFrames [][]float64, frameSize int) ([][]byte, error)

EncodeFrames encodes multiple consecutive frames. Useful for encoding a stream of audio data. pcmFrames: slice of PCM frames, each with frameSize samples frameSize: samples per frame (must be same for all frames) Returns: slice of encoded packets

func EncodePulses

func EncodePulses(y []int, n, k int) uint32

EncodePulses converts a pulse vector to a CWRS index. This is the inverse of DecodePulses, useful for testing round-trip.

Parameters:

  • y: pulse vector of length n
  • n: number of dimensions
  • k: total number of pulses (should equal sum(|y[i]|))

Returns: the combinatorial index (0 to V(n,k)-1)

func EncodePulsesScratch

func EncodePulsesScratch(y []int, n, k int, uBuf *[]uint32) uint32

EncodePulsesScratch is the scratch-aware version of EncodePulses. It uses a pre-allocated u buffer to avoid allocations in the hot path.

func EncodeSilence

func EncodeSilence(frameSize int, channels int) ([]byte, error)

EncodeSilence encodes a silent frame of the given size. Useful for generating comfort noise or filler packets.

func EncodeStereo

func EncodeStereo(pcm []float64, frameSize int) ([]byte, error)

EncodeStereo encodes stereo PCM samples to a CELT packet. pcm: interleaved L/R float64 samples at 48kHz frameSize: 120, 240, 480, or 960 samples per channel Returns: encoded Opus CELT packet bytes

The input should be interleaved: [L0, R0, L1, R1, ...] Total length should be frameSize * 2.

This uses mid-side stereo encoding (dual_stereo=0, intensity disabled).

Reference: RFC 6716 Section 4.3

func EncodeStereoFrames

func EncodeStereoFrames(pcmFrames [][]float64, frameSize int) ([][]byte, error)

EncodeStereoFrames encodes multiple consecutive stereo frames. pcmFrames: slice of interleaved stereo PCM frames frameSize: samples per frame per channel Returns: slice of encoded packets

func EncodeStereoWithEncoder

func EncodeStereoWithEncoder(enc *Encoder, pcm []float64, frameSize int) ([]byte, error)

EncodeStereoWithEncoder encodes stereo PCM using the provided encoder. Allows stateful encoding with custom encoder instances.

func EncodeWithEncoder

func EncodeWithEncoder(enc *Encoder, pcm []float64, frameSize int) ([]byte, error)

EncodeWithEncoder encodes mono PCM using the provided encoder. Allows stateful encoding with custom encoder instances.

func EstimateStereoAngle

func EstimateStereoAngle(energyMid, energySide float64) float64

EstimateStereoAngle estimates the stereo angle from mid and side energies. Used for encoder decisions and analysis.

Parameters:

  • energyMid: energy of mid channel
  • energySide: energy of side channel

Returns: estimated theta in radians

func ExpRotationExport

func ExpRotationExport(x []float64, length, dir, stride, k, spread int)

ExpRotationExport exposes expRotation for testing.

func FindFoldSource

func FindFoldSource(targetBand int, codedMask uint32, bandWidths []int) int

FindFoldSource finds the band to fold from for an uncoded band. targetBand: the band index that needs folding codedMask: bitmask of bands that have been coded (received pulses) bandWidths: array of band widths (from eBands table) Returns: index of source band to fold from, or -1 if none available.

The algorithm searches backwards from targetBand to find the most recent coded band that can serve as a reasonable source.

func FindFoldSourceWithOffset

func FindFoldSourceWithOffset(targetBand int, targetOffset int, codedBands [][]float64) (srcBand, offset int, found bool)

FindFoldSourceWithOffset finds a source band and offset for folding. This variant returns additional offset information for more precise folding. targetBand: the band index that needs folding targetOffset: starting MDCT bin of target band codedBands: slice of decoded band vectors (indexed by band number) Returns: source band index, offset within source, and whether found.

Reference: libopus celt/bands.c compute_band_fold()

func FoldBand

func FoldBand(lowband []float64, n int, seed *uint32) []float64

FoldBand generates a normalized vector by folding from a lower band. lowband: the source band vector (already decoded and normalized) n: width of target band (number of MDCT bins) seed: RNG state for sign variation (modified in place) Returns: normalized vector of length n with unit L2 norm.

If lowband is empty or nil, generates pseudo-random noise instead.

func FoldBandFromMultiple

func FoldBandFromMultiple(sources [][]float64, n int, seed *uint32) []float64

FoldBandFromMultiple generates a folded vector using multiple source bands. This provides better spectral diversity for uncoded high-frequency bands. sources: slice of source band vectors to fold from n: width of target band seed: RNG state Returns: normalized vector of length n.

Reference: libopus uses this approach for better quality folding.

func FrameSizeFromDuration

func FrameSizeFromDuration(durationMs float64) (int, error)

FrameSizeFromDuration returns the frame size in samples for a given duration in milliseconds. Valid durations: 2.5, 5, 10, 20ms.

func FrameSizeToLM

func FrameSizeToLM(frameSize int) int

FrameSizeToLM converts frame size to LM (log mode) index.

func GetCacheCaps

func GetCacheCaps() [168]uint8

GetCacheCaps returns the cache caps table for testing.

func GetCodedBandCount

func GetCodedBandCount(mask uint32) int

GetCodedBandCount returns the number of bands with pulses. mask: collapse mask Returns: count of coded bands.

func GetEBands

func GetEBands(lm int) []int

GetEBands returns the scaled eBands boundaries for a given LM. LM: 0=2.5ms, 1=5ms, 2=10ms, 3=20ms

func GetEMeans

func GetEMeans() [25]float64

GetEMeans returns the eMeans array for testing.

func GetEMeansBand

func GetEMeansBand(band int) float64

GetEMeansBand returns the mean log-energy for a specific band (log2 units). Returns 0 for out-of-range bands. This is a safe accessor for the eMeans table.

func GetEProbModel

func GetEProbModel() [4][2][42]uint8

GetEProbModel returns the probability model table for testing.

func GetPulsesExport

func GetPulsesExport(q int) int

GetPulsesExport exposes getPulses for testing.

func GetShortBlockCount

func GetShortBlockCount(frameSize int) int

GetShortBlockCount returns the number of short blocks for a given frame size. This is the ShortBlocks value from ModeConfig when transient is detected.

func GetWindow

func GetWindow() []float64

GetWindow returns the standard CELT overlap window (120 samples). This is used for gain fading in hybrid mode to ensure smooth transitions. Returns nil if the window is not available.

func GetWindowBuffer

func GetWindowBuffer(overlap int) []float64

GetWindowBuffer returns the precomputed window buffer for the given overlap size. For the standard CELT overlap of 120 samples, returns windowBuffer120. Returns nil if no precomputed buffer exists for the size.

func GetWindowBufferF32

func GetWindowBufferF32(overlap int) []float32

GetWindowBufferF32 returns the precomputed float32 window buffer for the given overlap size. Returns a freshly computed float32 buffer for non-standard sizes.

func GetWindowSquareBuffer

func GetWindowSquareBuffer(overlap int) []float64

GetWindowSquareBuffer returns precomputed w[i]^2 values for the overlap window. This avoids recomputing window[i]*window[i] inside hot comb-filter loops.

func IMDCT

func IMDCT(spectrum []float64) []float64

IMDCT computes the inverse MDCT of frequency coefficients. Input: n frequency bins (spectrum) Output: 2*n time samples

For power-of-2 sizes, uses FFT-based approach for O(n log n) complexity. For other sizes (like CELT's 120, 240, 480, 960), uses direct computation which is O(n^2) but handles any size correctly.

Reference: RFC 6716 Section 4.3.5, libopus celt/mdct.c

func IMDCTDirect

func IMDCTDirect(spectrum []float64) []float64

IMDCTDirect computes IMDCT per RFC 6716 Section 4.3.5. Formula: y[n] = sum_{k=0}^{N-1} X[k] * cos(pi/N * (n + 0.5 + N/2) * (k + 0.5)) Input: N frequency coefficients Output: 2*N time samples Normalization: matches libopus test_unit_mdct.c inverse (no extra scaling)

This is O(n^2) but mathematically exact and handles non-power-of-2 sizes (like CELT's 120, 240, 480, 960) that the FFT-based approach cannot.

func IMDCTOverlap

func IMDCTOverlap(spectrum []float64, overlap int) []float64

IMDCTOverlap computes the CELT IMDCT with short overlap using a zero previous overlap buffer. Output length is N + overlap.

func IMDCTOverlapWithPrev

func IMDCTOverlapWithPrev(spectrum, prevOverlap []float64, overlap int) []float64

IMDCTOverlapWithPrev computes CELT IMDCT using the provided overlap history. The returned slice includes frameSize+overlap samples.

func IMDCTShort

func IMDCTShort(coeffs []float64, shortBlocks int) []float64

IMDCTShort computes IMDCT for transient frames with multiple short blocks. coeffs: interleaved coefficients for shortBlocks MDCTs shortBlocks: number of short MDCTs (2, 4, or 8) Returns: interleaved time samples with proper overlap handling.

In transient mode, CELT uses multiple shorter MDCTs instead of one long MDCT. This provides better time resolution for transients (like drum hits) at the cost of reduced frequency resolution.

Reference: libopus celt/celt_decoder.c, transient mode handling

func ImdctInPlaceExported

func ImdctInPlaceExported(spectrum []float64, out []float64, blockStart, overlap int)

ImdctInPlaceExported exports imdctInPlace for testing

func InitCaps

func InitCaps(nbBands, lm, channels int) []int

InitCaps initializes band caps for allocation. Exported for testing.

func InitCapsForHybrid

func InitCapsForHybrid(nbBands, lm, channels, startBand int) []int

InitCapsForHybrid initializes band caps for hybrid mode. In hybrid mode, bands before startBand get zero cap (no bits allocated).

func InitCapsInto

func InitCapsInto(caps []int, nbBands, lm, channels int)

InitCapsInto initializes band caps into the provided slice. This is an exported wrapper around initCapsInto for callers outside celt.

func IntensityStereo

func IntensityStereo(mono []float64, invert bool) (left, right []float64)

IntensityStereo creates stereo from mono with optional inversion. In intensity stereo mode, both channels share the same spectral shape but may have opposite signs. This is efficient for high-frequency content where the ear is less sensitive to phase.

Parameters:

  • mono: the mid channel coefficients
  • invert: if true, right channel is inverted (sign flipped)

Returns: left and right coefficient arrays

func InterleaveBands

func InterleaveBands(bands [][]float64, shortBlocks int) []float64

InterleaveBands interleaves band coefficients for transient frames. bands: slice of band vectors shortBlocks: number of short MDCT blocks Returns: interleaved coefficient array.

In transient mode, CELT uses multiple short MDCTs instead of one long MDCT. The coefficients are interleaved so that each short block can be processed.

Reference: libopus celt/celt_decoder.c, transient mode

func InterleaveStereo

func InterleaveStereo(left, right []float64) []float64

InterleaveStereo combines separate L and R arrays into interleaved format. Input: [L0, L1, ...], [R0, R1, ...] Output: [L0, R0, L1, R1, ...]

func IsBandCoded

func IsBandCoded(mask uint32, band int) bool

IsBandCoded checks if a band was coded (received pulses). mask: collapse mask band: band index Returns: true if the band received non-zero bit allocation.

func KissFFT32To

func KissFFT32To(out []complex64, x []complex64)

KissFFT32To performs a forward complex FFT into out using the libopus-style Kiss FFT implementation. out must have length >= len(x).

func KissFFT32ToWithScratch

func KissFFT32ToWithScratch(out []complex64, x []complex64, scratch []KissCpx)

KissFFT32ToWithScratch performs a forward complex FFT into out using caller- provided scratch. scratch should have length >= len(x) to avoid allocations.

func LMToFrameSize

func LMToFrameSize(lm int) int

LMToFrameSize converts LM (log mode) index to frame size in samples.

func LRToMidSide

func LRToMidSide(left, right []float64) (mid, side []float64)

LRToMidSide converts left-right stereo to mid-side. This is the inverse of MidSideToLR with theta=pi/4.

Parameters:

  • left, right: left and right channel coefficients

Returns: mid and side coefficient arrays

func LibopusIMDCTF32

func LibopusIMDCTF32(spectrum []float32, prevOverlap []float32, overlap int) []float32

LibopusIMDCTF32 implements IMDCT following libopus clt_mdct_backward_c structure. This is the float32 version for exact matching with libopus's floating-point behavior.

Input: spectrum of length N2 (e.g., 960) Output: N2 + overlap samples (windowed IMDCT output) prevOverlap: previous frame's overlap buffer (length = overlap) overlap: overlap size (e.g., 120)

Returns: frame samples (length N2 + overlap)

func MDCT

func MDCT(samples []float64) []float64

MDCT computes the forward Modified Discrete Cosine Transform. For CELT-typical inputs (frameSize+Overlap), this uses the short-overlap algorithm from libopus. For legacy 2*N inputs, it falls back to the direct MDCT formula.

func MDCTForwardWithOverlap

func MDCTForwardWithOverlap(samples []float64, overlap int) []float64

MDCTForwardWithOverlap is the exported version of mdctForwardOverlap for testing. Input: samples with length frameSize+overlap Returns: MDCT coefficients of length frameSize

func MDCTShort

func MDCTShort(samples []float64, shortBlocks int) []float64

MDCTShort computes the forward MDCT for transient frames with multiple short blocks. This processes multiple short MDCTs and interleaves the coefficients in the same format expected by IMDCTShort.

samples: interleaved time samples for shortBlocks MDCTs shortBlocks: number of short MDCTs (2, 4, or 8) Returns: interleaved frequency coefficients

In transient mode, CELT uses multiple shorter MDCTs instead of one long MDCT. This provides better time resolution for transients at the cost of reduced frequency resolution.

Reference: libopus celt/celt_encoder.c, transient mode handling

func MidSideToLR

func MidSideToLR(mid, side []float64, theta float64) (left, right []float64)

MidSideToLR converts mid-side stereo to left-right. The conversion uses a rotation matrix controlled by theta:

L = cos(theta) * M + sin(theta) * S
R = cos(theta) * M - sin(theta) * S

Parameters:

  • mid: mid channel coefficients (M = (L+R)/2)
  • side: side channel coefficients (S = (L-R)/2)
  • theta: stereo angle in radians (0 = mono, pi/2 = full stereo)

Returns: left and right channel coefficient arrays

func MidSideToLRGains

func MidSideToLRGains(mid, side []float64, midGain, sideGain float64) (left, right []float64)

MidSideToLRGains converts mid-side to left-right using precomputed gains. This is more efficient when gains are already computed from theta.

Parameters:

  • mid, side: frequency-domain coefficients
  • midGain, sideGain: rotation gains (cos(theta), sin(theta))

Returns: left and right coefficient arrays

func MixStereoToMono

func MixStereoToMono(left, right []float64) []float64

MixStereoToMono mixes stereo down to mono. Useful for decoder fallback or testing.

func NeedsAntiCollapse

func NeedsAntiCollapse(mask uint32, band int) bool

NeedsAntiCollapse checks if a band collapsed (no pulses in transient frame). mask: collapse mask from current frame band: band index to check Returns: true if the band collapsed and needs anti-collapse noise injection.

Reference: RFC 6716 Section 4.3.5

func NormalizeBandsToArrayInto

func NormalizeBandsToArrayInto(mdctCoeffs []float64, nbBands, frameSize int, norm, bandE []float64)

NormalizeBandsToArrayInto normalizes bands into a provided contiguous array. This is the zero-allocation version - caller provides norm and bandE buffers.

Parameters:

  • mdctCoeffs: MDCT coefficients to normalize
  • nbBands: number of bands
  • frameSize: frame size in samples
  • norm: output buffer (length >= frameSize)
  • bandE: scratch buffer for band amplitudes (length >= nbBands)

Reference: libopus celt/bands.c normalise_bands() (float path, lines 172-187)

func NormalizeResidualExport

func NormalizeResidualExport(pulses []int, gain float64, yy float64) []float64

NormalizeResidualExport exposes normalizeResidual for testing.

func NormalizeResidualIntoExport

func NormalizeResidualIntoExport(out []float64, pulses []int, gain float64, yy float64)

NormalizeResidualIntoExport exposes normalizeResidualInto for testing.

func NormalizeVector

func NormalizeVector(v []float64) []float64

NormalizeVector scales vector to unit L2 norm. If the input vector has zero energy, returns the input unchanged.

func OpPVQSearchExport

func OpPVQSearchExport(x []float64, k int) ([]int, float64)

OpPVQSearchExport exposes opPVQSearch for testing.

func OverlapAdd

func OverlapAdd(current, prevOverlap []float64, overlap int) (output, newOverlap []float64)

OverlapAdd combines the current frame with the previous overlap. This is the core operation for continuous audio reconstruction in CELT.

Parameters:

  • current: windowed IMDCT output for current frame (2*frameSize samples)
  • prevOverlap: tail samples from previous frame (overlap region)
  • overlap: number of overlap samples (typically 120 for CELT)

Returns:

  • output: reconstructed samples (frameSize = len(current)/2)
  • newOverlap: tail to save for next frame's overlap-add

The MDCT/IMDCT overlap-add operation per RFC 6716: - IMDCT of N coefficients produces 2N windowed samples - Output per frame is N samples (frameSize) - First 'overlap' samples: sum current[0:overlap] + prevOverlap - Middle samples: copy from current[overlap:frameSize] - Save current[frameSize:frameSize+overlap] for next frame

func OverlapAddInPlace

func OverlapAddInPlace(current []float64, prevOverlap []float64, overlap int) []float64

OverlapAddInPlace performs overlap-add modifying prevOverlap in place. This variant avoids allocation for the overlap buffer.

Returns: output samples only (prevOverlap is modified to contain new overlap)

func OverlapAddShortOverlap

func OverlapAddShortOverlap(current, prevOverlap []float64, frameSize, overlap int) (output, newOverlap []float64)

OverlapAddShortOverlap combines overlap for CELT short-overlap IMDCT output. current length is frameSize + overlap, output length is frameSize.

func PVQ_U

func PVQ_U(n, k int) uint32

PVQ_U computes U(N,K), the number of codewords where the first position has no pulse. U(N,K) = V(N-1, K) for N > 1 U(1,K) = K (special handling in libopus)

Note: This returns the "counting" U function where U(N,K) = V(N-1,K), not the internal table U used in the recurrence relation.

func PVQ_V

func PVQ_V(n, k int) uint32

PVQ_V computes V(N,K), the total number of PVQ codewords with N dimensions and K pulses (where the sum of absolute values equals K).

V(N,K) = U(N,K) + U(N,K+1)

This uses O(1) table lookup when possible, falling back to computation only for values outside the table range.

Reference: RFC 6716 Section 4.3.4.1, libopus celt/cwrs.c

func PatchTransientDecision

func PatchTransientDecision(newE, oldE []float64, nbEBands, start, end, channels int) bool

PatchTransientDecision looks for sudden increases of energy to decide whether we need to patch the transient decision. This is a "second chance" to detect transients that the time-domain transient_analysis() may have missed.

This is particularly important for the first frame where the time-domain analysis may fail due to zero-padded buffers, but the frequency-domain energy increase from silence to signal is obvious.

Parameters:

  • newE: current frame's band log-energies (log2 domain)
  • oldE: previous frame's band log-energies (log2 domain)
  • nbEBands: number of effective bands
  • start: first band to consider (usually 0)
  • end: last band + 1 to consider (usually nbEBands)
  • channels: number of channels (1 or 2)

Returns: true if mean energy increase > 1.0 dB and transient should be forced

Reference: libopus celt/celt_encoder.c patch_transient_decision()

func PulsesToBitsExport

func PulsesToBitsExport(band, lm, pulses int) int

PulsesToBitsExport exposes pulsesToBits for testing.

func QuantEnergyFinalise

func QuantEnergyFinalise(
	re *rangecoding.Encoder,
	start, end int,
	oldEBands, errorVal []float64,
	fineQuant, finePriority []int,
	bitsLeft, channels int,
)

QuantEnergyFinalise uses remaining bits for additional energy refinement. This matches libopus quant_energy_finalise() in quant_bands.c.

Parameters:

  • re: range encoder
  • start, end: band range to encode
  • oldEBands: quantized energies (will be updated)
  • error: quantization error (will be updated)
  • fineQuant: fine bits already used per band
  • finePriority: priority for finalization (0 or 1)
  • bitsLeft: remaining bits to use
  • channels: number of audio channels

Reference: libopus celt/quant_bands.c quant_energy_finalise()

func QuantFineEnergy

func QuantFineEnergy(
	re *rangecoding.Encoder,
	start, end int,
	oldEBands, errorVal []float64,
	prevQuant, extraQuant []int,
	channels int,
)

QuantFineEnergy encodes fine energy refinement bits. This matches libopus quant_fine_energy() in quant_bands.c.

Parameters:

  • re: range encoder
  • start, end: band range to encode
  • oldEBands: quantized energies (will be updated with fine refinement)
  • error: quantization error from coarse encoding (will be updated)
  • prevQuant: previous quantization bits per band (can be nil)
  • extraQuant: fine bits per band (0 = no refinement)
  • channels: number of audio channels

Reference: libopus celt/quant_bands.c quant_fine_energy()

func QuantizeTheta

func QuantizeTheta(theta float64, qn int) int

QuantizeTheta quantizes an angle to the given number of steps. Used in encoder; provided here for completeness and testing.

Parameters:

  • theta: angle in radians [0, pi/2]
  • qn: number of quantization steps

Returns: quantized itheta [0, qn]

func ScaledBandEnd

func ScaledBandEnd(band, frameSize int) int

ScaledBandEnd returns the scaled MDCT bin index for the end of a band.

func ScaledBandStart

func ScaledBandStart(band, frameSize int) int

ScaledBandStart returns the scaled MDCT bin index for the start of a band. For frame sizes larger than 2.5ms, indices are scaled by (frameSize/120).

func ScaledBandWidth

func ScaledBandWidth(band, frameSize int) int

ScaledBandWidth returns the number of MDCT bins in a band for a given frame size.

func ShouldUseShortBlocks

func ShouldUseShortBlocks(transientResult TransientAnalysisResult, percussiveDetected bool, lm int, totalBits int) bool

ShouldUseShortBlocks combines multiple transient indicators to decide whether to use short blocks. This is a high-level decision function that considers:

  • Standard transient analysis (mask_metric)
  • Percussive attack detection
  • Attack duration (hysteresis)
  • Frame budget constraints

Parameters:

  • transientResult: result from TransientAnalysis or TransientAnalysisWithState
  • percussiveDetected: result from DetectPercussiveAttack
  • lm: log2 of frame size multiplier (0-3)
  • totalBits: available bits for the frame

Returns: true if short blocks should be used

func SpreadDecisionForShortBlocks

func SpreadDecisionForShortBlocks() int

SpreadDecisionForShortBlocks returns spread decision for transient frames. For short blocks, spreading is typically disabled or minimal.

Returns: SPREAD_NONE for transient frames (libopus behavior)

func StereoWidth

func StereoWidth(mid, side []float64) float64

StereoWidth computes the perceived stereo width from mid and side. Returns a value in [0, 1] where 0 = mono, 1 = full stereo.

func SynthesizeWithConfig

func SynthesizeWithConfig(coeffs []float64, overlap int, transient bool, shortBlocks int, prevOverlap []float64) (output, newOverlap []float64)

SynthesizeWithConfig performs synthesis with explicit configuration. Useful for testing or non-standard configurations.

func TFAnalysis

func TFAnalysis(X []float64, N0, nbEBands int, isTransient bool, lm int, tfEstimate float64, effectiveBytes int, importance []int) (tfRes []int, tfSelect int)

TFAnalysis performs time-frequency analysis to determine optimal TF resolution per band. It uses a Viterbi algorithm to find the best TF resolution settings.

Parameters:

  • X: normalized MDCT coefficients
  • N0: total number of coefficients (per channel)
  • nbEBands: number of bands to analyze
  • isTransient: whether frame uses short blocks
  • lm: log mode (frame size index)
  • tfEstimate: estimate of temporal vs frequency content (0.0 = time, 1.0 = freq)
  • effectiveBytes: available bytes for encoding (affects lambda)
  • importance: per-band importance weights (nil for uniform)

Returns:

  • tfRes: per-band TF resolution flags (0 or 1)
  • tfSelect: TF select flag for bitstream

Reference: libopus celt/celt_encoder.c tf_analysis()

func TFAnalysisWithScratch

func TFAnalysisWithScratch(X []float64, N0, nbEBands int, isTransient bool, lm int, tfEstimate float64, effectiveBytes int, importance []int, scratch *TFAnalysisScratch) (tfRes []int, tfSelect int)

TFAnalysisWithScratch is the zero-allocation version of TFAnalysis.

func TFDecodeForTest

func TFDecodeForTest(start, end int, isTransient bool, tfRes []int, lm int, rd *rangecoding.Decoder)

TFDecodeForTest exposes tfDecode for cross-package tests (e.g., CGO comparisons). It should not be used in production code.

func TFEncode

func TFEncode(re *rangecoding.Encoder, start, end int, isTransient bool, tfRes []int, lm int)

TFEncode is a convenience wrapper around tfEncode for callers outside the celt package. It encodes TF resolution flags without running TF analysis.

func TFEncodeWithSelect

func TFEncodeWithSelect(re *rangecoding.Encoder, start, end int, isTransient bool, tfRes []int, lm int, tfSelect int)

TFEncodeWithSelect encodes TF resolution with a specific tf_select value. This is called after TFAnalysis to encode the computed TF decisions.

Parameters:

  • re: range encoder
  • start, end: band range
  • isTransient: whether frame uses short blocks
  • tfRes: per-band TF resolution flags (0 or 1)
  • lm: log mode (frame size index)
  • tfSelect: TF select flag from TFAnalysis

Reference: libopus celt/celt_encoder.c tf_encode()

func ThetaToGains

func ThetaToGains(itheta, qn int) (mid, side float64)

ThetaToGains converts itheta to mid and side gains. itheta: quantized angle (0 to qn) qn: number of quantization steps Returns: mid gain, side gain (both in [0, 1])

Reference: libopus celt/bands.c

func UpdateCollapseMask

func UpdateCollapseMask(mask *uint32, band int)

UpdateCollapseMask marks a band as having received pulses. mask: pointer to collapse mask (modified in place) band: band index to mark as coded

The collapse mask tracks which bands received non-zero bit allocation. This is used for anti-collapse processing in transient frames.

func UpdateStereoSaving

func UpdateStereoSaving(prev float64, normL, normR []float64, nbBands, lm, intensity int) float64

UpdateStereoSaving updates the running stereo_saving estimate used by libopus compute_vbr(). The state is updated once per frame after alloc-trim analysis.

func ValidFrameSize

func ValidFrameSize(frameSize int) bool

ValidFrameSize returns true if the frame size is valid for CELT.

func VorbisWindow

func VorbisWindow(i, overlap int) float64

VorbisWindow computes the CELT Vorbis window value at position i for the given overlap length. This matches libopus's window generation.

This window is: - Power-complementary: w[i]^2 + w[overlap-1-i]^2 = 1 - Smooth: continuous first derivative - Good spectral properties: low sidelobe levels

Parameters:

  • i: position in window (0 to overlap-1)
  • overlap: window length (overlap region)

Returns: window value in [0, 1]

func WindowEnergy

func WindowEnergy(overlap int) float64

WindowEnergy computes the total energy of a windowed segment. Used for level normalization.

Types

type AllocationResult

type AllocationResult struct {
	BandBits     []int // PVQ bit budget per band in Q3 (a.k.a. pulses[] in libopus)
	FineBits     []int // Fine energy bits per band
	FinePriority []int // Fine energy priority flags per band
	Caps         []int // PVQ caps per band in Q3
	Balance      int   // Bit balance carried into quant_all_bands (Q3)
	CodedBands   int   // Number of coded bands
	Intensity    int   // Intensity stereo start band (0 when disabled)
	DualStereo   bool  // Dual stereo flag
}

AllocationResult holds the output of bit allocation computation.

func ComputeAllocation

func ComputeAllocation(totalBits, nbBands, channels int, cap, offsets []int, trim int, intensity int, dualStereo bool, lm int) AllocationResult

ComputeAllocation computes bit allocation without consuming a range coder. This mirrors libopus clt_compute_allocation() math but skips entropy reads.

func ComputeAllocationHybrid

func ComputeAllocationHybrid(re *rangecoding.Encoder, totalBitsQ3, nbBands, channels int, cap, offsets []int, trim int, intensity int, dualStereo bool, lm int, prev int, signalBandwidth int) AllocationResult

ComputeAllocationHybrid computes bit allocation for hybrid mode CELT encoding. In hybrid mode, CELT only encodes bands 17-21 (start=HybridCELTStartBand). This properly sets bits for bands 0-16 to 0.

func ComputeAllocationWithDecoder

func ComputeAllocationWithDecoder(rd *rangecoding.Decoder, totalBits, nbBands, channels int, cap, offsets []int, trim int, intensity int, dualStereo bool, lm int) AllocationResult

ComputeAllocationWithDecoder computes bit allocation and consumes the range decoder for skip/intensity/dual-stereo decisions.

func ComputeAllocationWithEncoder

func ComputeAllocationWithEncoder(re *rangecoding.Encoder, totalBitsQ3, nbBands, channels int, cap, offsets []int, trim int, intensity int, dualStereo bool, lm int, prev int, signalBandwidth int) AllocationResult

ComputeAllocationWithEncoder computes bit allocation in Q3 and encodes the stereo params to the range encoder. This is the encoding counterpart to ComputeAllocationWithDecoder. prev is the last coded band count used for skip hysteresis (0 = no history). signalBandwidth is the highest band index considered to carry signal (>= start).

type CELTBandwidth

type CELTBandwidth int

CELTBandwidth represents the audio bandwidth for CELT coding.

const (
	// CELTNarrowband represents 4kHz audio bandwidth (narrowband).
	CELTNarrowband CELTBandwidth = iota
	// CELTMediumband represents 6kHz audio bandwidth (mediumband).
	CELTMediumband
	// CELTWideband represents 8kHz audio bandwidth (wideband).
	CELTWideband
	// CELTSuperwideband represents 12kHz audio bandwidth (super-wideband).
	CELTSuperwideband
	// CELTFullband represents 20kHz audio bandwidth (fullband).
	CELTFullband
)

func BandwidthFromOpusConfig

func BandwidthFromOpusConfig(opusBandwidth int) CELTBandwidth

BandwidthFromOpusConfig returns the CELT bandwidth from an Opus TOC bandwidth field. Opus TOC bandwidth values: 0=NB, 1=MB, 2=WB, 3=SWB, 4=FB

func (CELTBandwidth) EffectiveBands

func (bw CELTBandwidth) EffectiveBands() int

EffectiveBands returns the number of coded bands for the given bandwidth. This is the maximum number of bands; actual coded bands may be fewer depending on frame size and bit allocation.

func (CELTBandwidth) MaxFrequency

func (bw CELTBandwidth) MaxFrequency() int

MaxFrequency returns the maximum audio frequency in Hz for this bandwidth.

func (CELTBandwidth) String

func (bw CELTBandwidth) String() string

String returns the string representation of the bandwidth.

type CeltTargetStats

type CeltTargetStats struct {
	FrameSize     int
	BaseBits      int
	TargetBits    int
	Tonality      float64
	DynallocBoost int
	TFBoost       int
	PitchChange   bool
	FloorLimited  bool
	MaxDepth      float64
}

CeltTargetStats captures per-frame VBR target diagnostics for CELT.

type CoarseDecisionStats

type CoarseDecisionStats struct {
	Frame     int
	Band      int
	Channel   int
	Intra     bool
	LM        int
	ProbFS0   int
	ProbDecay int
	X         float64
	Pred      float64
	Residual  float64
	QIInitial int
	QIFinal   int
	Tell      int
	BitsLeft  int
}

CoarseDecisionStats captures per-band coarse energy quantization decisions. This is intended for diagnostics and is only emitted when a hook is installed.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder decodes CELT frames from an Opus packet. It maintains state across frames for proper audio continuity via overlap-add synthesis and energy prediction.

CELT is the transform-based layer of Opus, using the Modified Discrete Cosine Transform (MDCT) for music and general audio. The decoder reconstructs audio by: 1. Decoding energy envelope (coarse + fine quantization) 2. Decoding normalized band shapes via PVQ 3. Applying denormalization (scaling by energy) 4. Performing IMDCT synthesis with overlap-add 5. Applying de-emphasis filter

Reference: RFC 6716 Section 4.3

func NewDecoder

func NewDecoder(channels int) *Decoder

NewDecoder creates a new CELT decoder with the given number of channels. Valid channel counts are 1 (mono) or 2 (stereo). The decoder is ready to process CELT frames after creation.

func (*Decoder) Bandwidth

func (d *Decoder) Bandwidth() CELTBandwidth

Bandwidth returns the current CELT bandwidth setting.

func (*Decoder) Channels

func (d *Decoder) Channels() int

Channels returns the number of audio channels (1 or 2).

func (*Decoder) DecodeBands

func (d *Decoder) DecodeBands(
	energies []float64,
	bandBits []int,
	nbBands int,
	stereo bool,
	frameSize int,
) []float64

DecodeBands decodes all frequency bands from the bitstream. energies: per-band energy values (from coarse + fine energy decoding) bandBits: bits allocated per band (from bit allocation) nbBands: number of bands to decode stereo: true if stereo mode frameSize: frame size in samples (120, 240, 480, 960) Returns: MDCT coefficients (denormalized) of length frameSize.

The output is zero-padded to frameSize. Band coefficients fill bins 0 to totalBins-1, where totalBins = sum(ScaledBandWidth(i, frameSize) for i in 0..nbBands-1). Upper bins (totalBins to frameSize-1) remain zero, representing highest frequencies. This ensures IMDCT receives exactly frameSize coefficients, producing correct sample count.

func (*Decoder) DecodeBandsStereo

func (d *Decoder) DecodeBandsStereo(
	energiesL, energiesR []float64,
	bandBits []int,
	nbBands int,
	frameSize int,
	intensity int,
) (left, right []float64)

DecodeBandsStereo decodes all frequency bands in stereo mode. energiesL/R: per-band energies for left/right channels bandBits: bits allocated per band nbBands: number of bands frameSize: frame size in samples intensity: intensity stereo start band (-1 if not used) Returns: left and right channel MDCT coefficients, each of length frameSize.

In stereo mode, bands can use: 1. Dual stereo: separate PVQ vectors for L and R 2. Mid-side: decode mid/side and rotate to L/R 3. Intensity: copy mono to both with sign flag

Output is zero-padded to frameSize per channel for correct IMDCT operation.

Reference: libopus celt/bands.c quant_all_bands() stereo path

func (*Decoder) DecodeCoarseEnergy

func (d *Decoder) DecodeCoarseEnergy(nbBands int, intra bool, lm int) []float64

DecodeCoarseEnergy decodes coarse band energies in log2 units (1 = 6 dB). intra=true: no inter-frame prediction (first frame or after loss) intra=false: uses alpha prediction from previous frame Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c unquant_coarse_energy()

func (*Decoder) DecodeCoarseEnergyWithDecoder

func (d *Decoder) DecodeCoarseEnergyWithDecoder(rd *rangecoding.Decoder, nbBands int, intra bool, lm int) []float64

DecodeCoarseEnergyWithDecoder decodes coarse energies using an explicit range decoder. This variant allows passing a range decoder directly rather than using d.rangeDecoder.

func (*Decoder) DecodeEnergyFinalise

func (d *Decoder) DecodeEnergyFinalise(energies []float64, nbBands int, fineQuant []int, finePriority []int, bitsLeft int)

DecodeEnergyFinalise consumes leftover bits for additional energy refinement. This mirrors libopus unquant_energy_finalise(). For non-hybrid mode, use start=0.

func (*Decoder) DecodeEnergyFinaliseRange

func (d *Decoder) DecodeEnergyFinaliseRange(start, end int, energies []float64, fineQuant []int, finePriority []int, bitsLeft int)

DecodeEnergyFinaliseRange consumes leftover bits for energy refinement in range [start, end). This mirrors libopus unquant_energy_finalise() which takes both start and end parameters. For hybrid mode, start should be HybridCELTStartBand (17).

func (*Decoder) DecodeEnergyRemainder

func (d *Decoder) DecodeEnergyRemainder(energies []float64, nbBands int, remainderBits []int)

DecodeEnergyRemainder uses remaining bits for additional energy precision. Called after all PVQ bands decoded, uses leftover bits from bit allocation. Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c unquant_energy_finalise()

func (*Decoder) DecodeEnergyRemainderWithDecoder

func (d *Decoder) DecodeEnergyRemainderWithDecoder(rd *rangecoding.Decoder, energies []float64, nbBands int, remainderBits []int)

DecodeEnergyRemainderWithDecoder uses remainder bits with an explicit range decoder.

func (*Decoder) DecodeFineEnergy

func (d *Decoder) DecodeFineEnergy(energies []float64, nbBands int, fineBits []int)

DecodeFineEnergy adds fine energy precision to coarse values. fineBits[band] specifies bits allocated for refinement (0 = no refinement). Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c unquant_fine_energy()

func (*Decoder) DecodeFineEnergyWithDecoder

func (d *Decoder) DecodeFineEnergyWithDecoder(rd *rangecoding.Decoder, energies []float64, nbBands int, fineBits []int)

DecodeFineEnergyWithDecoder adds fine energy precision using an explicit range decoder.

func (*Decoder) DecodeFrame

func (d *Decoder) DecodeFrame(data []byte, frameSize int) ([]float64, error)

DecodeFrame decodes a complete CELT frame from raw bytes. If data is nil or empty, performs Packet Loss Concealment (PLC) instead of decoding. data: raw CELT frame bytes (without Opus framing), or nil/empty for PLC frameSize: expected output samples (120, 240, 480, or 960) Returns: PCM samples as float64 slice, interleaved if stereo

The decoding pipeline: 1. Initialize range decoder 2. Decode frame header flags (silence, transient, intra) 3. Decode energy envelope (coarse + fine) 4. Compute bit allocation 5. Decode bands via PVQ 6. Synthesis: IMDCT + windowing + overlap-add 7. Apply de-emphasis filter

Reference: RFC 6716 Section 4.3, libopus celt/celt_decoder.c celt_decode_with_ec()

func (*Decoder) DecodeFrameHybrid

func (d *Decoder) DecodeFrameHybrid(rd *rangecoding.Decoder, frameSize int) ([]float64, error)

DecodeFrameHybrid decodes a CELT frame for hybrid mode. In hybrid mode, CELT only decodes bands 17-21 (frequencies above ~8kHz). The range decoder should already have been partially consumed by SILK.

Parameters:

  • rd: Range decoder (SILK has already consumed its portion)
  • frameSize: Expected output samples (480 or 960 for hybrid 10ms/20ms)

Returns: PCM samples as float64 slice at 48kHz

Implementation approach: - Decode all bands as usual but zero out bands 0-16 before synthesis - This ensures correct operation with the existing synthesis pipeline - Only bands 17-21 contribute to the output (high frequencies for hybrid)

Reference: RFC 6716 Section 3.2 (Hybrid mode), libopus celt/celt_decoder.c

func (*Decoder) DecodeFrameHybridWithPacketStereo

func (d *Decoder) DecodeFrameHybridWithPacketStereo(rd *rangecoding.Decoder, frameSize int, packetStereo bool) ([]float64, error)

DecodeFrameHybridWithPacketStereo decodes a hybrid CELT frame while honoring the packet stereo flag. This mirrors DecodeFrameWithPacketStereo for CELT-only but uses the hybrid start band.

func (*Decoder) DecodeFrameWithDecoder

func (d *Decoder) DecodeFrameWithDecoder(rd *rangecoding.Decoder, frameSize int) ([]float64, error)

DecodeFrameWithDecoder decodes a frame using a pre-initialized range decoder. This is useful when the range decoder is shared with other layers (e.g., SILK in hybrid mode).

func (*Decoder) DecodeFrameWithPacketStereo

func (d *Decoder) DecodeFrameWithPacketStereo(data []byte, frameSize int, packetStereo bool) ([]float64, error)

DecodeFrameWithPacketStereo decodes a CELT frame with explicit packet stereo flag. This handles the case where the packet's stereo flag differs from the decoder's configured channels. For example, a stereo decoder (channels=2) receiving a mono packet (packetStereo=false).

packetStereo: true if the packet contains stereo data, false for mono

When packetStereo doesn't match decoder channels: - Mono packet + stereo decoder: decode mono, duplicate to stereo output - Stereo packet + mono decoder: decode stereo, mix to mono output

func (*Decoder) DecodeHybridFECPLC

func (d *Decoder) DecodeHybridFECPLC(frameSize int) ([]float64, error)

DecodeHybridFECPLC generates CELT concealment for hybrid decode_fec cadence. This mirrors the decode_fec behavior where CELT PLC is accumulated on top of SILK LBRR, with decoder-side postfilter/de-emphasis ordering.

func (*Decoder) DecodeIntensityStereo

func (d *Decoder) DecodeIntensityStereo(mid []float64) (left, right []float64)

DecodeIntensityStereo decodes intensity stereo for a band. mid: the mid channel coefficients Returns: left and right with optional sign inversion on right.

In intensity stereo, both channels share the same shape but may have opposite signs (determined by a single bit).

Reference: RFC 6716 Section 4.3.4.3

func (*Decoder) DecodeLaplaceTest

func (d *Decoder) DecodeLaplaceTest(fs int, decay int) int

DecodeLaplaceTest is an exported wrapper for testing. It decodes a Laplace-distributed integer using the range coder.

func (*Decoder) DecodePVQ

func (d *Decoder) DecodePVQ(n, k int) []float64

DecodePVQ decodes a PVQ codeword from the range decoder. n: band width (number of MDCT bins) k: number of pulses (from bit allocation) Returns: normalized float64 vector of length n with unit L2 norm.

If k == 0, returns a zero vector (caller should fold from another band).

func (*Decoder) DecodePVQWithTrace

func (d *Decoder) DecodePVQWithTrace(band, n, k int) []float64

DecodePVQWithTrace decodes a PVQ codeword and traces the result. band: the band index (for tracing purposes) n: band width (number of MDCT bins) k: number of pulses (from bit allocation) Returns: normalized float64 vector of length n with unit L2 norm.

This is identical to DecodePVQ but also calls DefaultTracer.TracePVQ.

func (*Decoder) DecodeStereoParams

func (d *Decoder) DecodeStereoParams(nbBands int) (intensity, dualStereo int)

DecodeStereoParams decodes stereo parameters (intensity and dual stereo). Reference: RFC 6716 Section 4.3.4, libopus celt/celt_decoder.c

func (*Decoder) DecodeStereoTheta

func (d *Decoder) DecodeStereoTheta(qn int) int

DecodeStereoTheta decodes theta with sign for stereo balance. qn: number of quantization steps (determines precision) Returns: itheta value (0 = pure mid, qn = pure side)

Reference: libopus celt/bands.c compute_theta()

func (*Decoder) DecodeTheta

func (d *Decoder) DecodeTheta(n int) float64

DecodeTheta decodes the stereo angle for mid-side mixing. n: number of points in the itheta quantization (depends on bit allocation) Returns theta in [0, pi/2] range for mid-side rotation.

The angle theta controls the balance between mid and side channels: - theta = 0: mono (all energy in mid) - theta = pi/2: full stereo (equal mid and side)

Reference: libopus celt/bands.c quant_band_stereo()

func (*Decoder) FinalRange

func (d *Decoder) FinalRange() uint32

FinalRange returns the final range coder state after decoding. This matches libopus OPUS_GET_FINAL_RANGE and is used for bitstream verification. Must be called after decoding a frame to get a meaningful value.

func (*Decoder) GetEnergy

func (d *Decoder) GetEnergy(band, channel int) float64

GetEnergy returns the energy for a specific band and channel from prevEnergy.

func (*Decoder) NextRNG

func (d *Decoder) NextRNG() uint32

NextRNG advances the RNG and returns the new value. Uses the same LCG as libopus for deterministic behavior.

func (*Decoder) OverlapBuffer

func (d *Decoder) OverlapBuffer() []float64

OverlapBuffer returns the overlap buffer for CELT overlap. Size is Overlap * channels samples.

func (*Decoder) PostfilterGain

func (d *Decoder) PostfilterGain() float64

PostfilterGain returns the comb filter gain.

func (*Decoder) PostfilterPeriod

func (d *Decoder) PostfilterPeriod() int

PostfilterPeriod returns the pitch period for the postfilter.

func (*Decoder) PostfilterTapset

func (d *Decoder) PostfilterTapset() int

PostfilterTapset returns the filter tap configuration.

func (*Decoder) PreemphState

func (d *Decoder) PreemphState() []float64

PreemphState returns the de-emphasis filter state. One value per channel.

func (*Decoder) PrevEnergy

func (d *Decoder) PrevEnergy() []float64

PrevEnergy returns the previous frame's band energies. Used for inter-frame energy prediction in coarse energy decoding. Layout: [band0_ch0, band1_ch0, ..., band20_ch0, band0_ch1, ..., band20_ch1]

func (*Decoder) PrevEnergy2

func (d *Decoder) PrevEnergy2() []float64

PrevEnergy2 returns the band energies from two frames ago. Used for anti-collapse detection.

func (*Decoder) RNG

func (d *Decoder) RNG() uint32

RNG returns the current RNG state.

func (*Decoder) RangeDecoder

func (d *Decoder) RangeDecoder() *rangecoding.Decoder

RangeDecoder returns the current range decoder.

func (*Decoder) Reset

func (d *Decoder) Reset()

Reset clears decoder state for a new stream. Call this when starting to decode a new audio stream.

func (*Decoder) SampleRate

func (d *Decoder) SampleRate() int

SampleRate returns the output sample rate (always 48000 for CELT).

func (*Decoder) SetBandwidth

func (d *Decoder) SetBandwidth(bw CELTBandwidth)

SetBandwidth sets the CELT bandwidth derived from the Opus TOC.

func (*Decoder) SetEnergy

func (d *Decoder) SetEnergy(band, channel int, energy float64)

SetEnergy sets the energy for a specific band and channel.

func (*Decoder) SetOverlapBuffer

func (d *Decoder) SetOverlapBuffer(samples []float64)

SetOverlapBuffer copies the given samples to the overlap buffer.

func (*Decoder) SetPostfilter

func (d *Decoder) SetPostfilter(period int, gain float64, tapset int)

SetPostfilter sets the postfilter parameters.

func (*Decoder) SetPrevEnergy

func (d *Decoder) SetPrevEnergy(energies []float64)

SetPrevEnergy copies the given energies to the previous energy buffer. Also shifts current prev to prev2.

func (*Decoder) SetPrevEnergyWithPrev

func (d *Decoder) SetPrevEnergyWithPrev(prev, energies []float64)

SetPrevEnergyWithPrev updates prevEnergy using the provided previous state. This avoids losing the prior frame when prevEnergy is updated during decoding. The energies array uses compact layout [L0..L(n-1), R0..R(n-1)] where n = nbBands. The prevEnergy array uses full layout [L0..L20, R0..R20] where 21 = MaxBands.

func (*Decoder) SetRNG

func (d *Decoder) SetRNG(seed uint32)

SetRNG sets the RNG state.

func (*Decoder) SetRangeDecoder

func (d *Decoder) SetRangeDecoder(rd *rangecoding.Decoder)

SetRangeDecoder sets the range decoder for the current frame. This must be called before decoding each frame.

func (*Decoder) Synthesize

func (d *Decoder) Synthesize(coeffs []float64, transient bool, shortBlocks int) []float64

Synthesize performs full IMDCT + windowing + overlap-add for decoded coefficients. This is the main synthesis function called by the decoder.

Parameters:

  • coeffs: MDCT coefficients from DecodeBands
  • transient: true if frame uses short blocks (for transients)
  • shortBlocks: number of short MDCTs if transient (1, 2, 4, or 8)

Returns: PCM samples for this frame

func (*Decoder) SynthesizeStereo

func (d *Decoder) SynthesizeStereo(coeffsL, coeffsR []float64, transient bool, shortBlocks int) []float64

SynthesizeStereo performs synthesis for stereo frames. Handles both channels with proper interleaving.

Parameters:

  • coeffsL, coeffsR: MDCT coefficients for left and right channels
  • transient: true if using short blocks
  • shortBlocks: number of short MDCTs

Returns: interleaved stereo samples [L0, R0, L1, R1, ...]

func (*Decoder) WindowAndOverlap

func (d *Decoder) WindowAndOverlap(imdctOut []float64) []float64

WindowAndOverlap applies Vorbis window and performs overlap-add. This is a combined operation for efficiency.

Parameters:

  • imdctOut: raw IMDCT output (will be windowed in place)

Returns: reconstructed samples after overlap-add

type DynallocResult

type DynallocResult struct {
	// MaxDepth is the maximum signal level relative to noise floor (in dB).
	// Used for floor_depth calculation in VBR.
	// Reference: libopus celt_encoder.c lines 1682-1693
	MaxDepth float64

	// Offsets contains per-band allocation offsets for dynamic bit allocation.
	// Bands with high energy variance get extra bits.
	Offsets []int

	// SpreadWeight contains per-band masking weights for spread decision.
	// Higher values indicate more perceptually important bands.
	SpreadWeight []int

	// Importance contains per-band importance values (0-13 typically).
	// Used for bit allocation prioritization.
	Importance []int

	// TotBoost is the total boost in bits (Q3 format).
	// Represents extra bits allocated beyond base target.
	TotBoost int
}

DynallocResult contains the output of dynalloc_analysis. These values are used for VBR target computation and bit allocation.

func DynallocAnalysis

func DynallocAnalysis(
	bandLogE, bandLogE2, oldBandE []float64,
	nbBands, start, end, channels, lsbDepth, lm int,
	logN []int16,
	effectiveBytes int,
	isTransient, vbr, constrainedVBR, lfe bool,
	toneFreq, toneishness float64,
	surroundDynalloc []float64,
	analysisValid bool,
	analysisLeakBoost []uint8,
) DynallocResult

DynallocAnalysis performs dynamic allocation analysis to compute: 1. maxDepth: signal depth relative to noise floor (for VBR floor_depth) 2. offsets: per-band bit allocation offsets 3. spread_weight: per-band masking weights for spread decision 4. importance: per-band importance for allocation prioritization 5. tot_boost: total boost bits for VBR target

Parameters:

  • bandLogE: current frame band energies (log2 domain), [channels * nbBands]
  • bandLogE2: secondary band energies (from second MDCT for transients), [channels * nbBands]
  • oldBandE: previous frame band energies, [channels * nbBands]
  • nbBands: number of frequency bands
  • start: starting band (usually 0)
  • end: ending band (usually nbBands or less)
  • channels: number of audio channels (1 or 2)
  • lsbDepth: bit depth of input (16-24)
  • lm: log2 of frame size multiplier (0=2.5ms, 1=5ms, 2=10ms, 3=20ms)
  • logN: per-band log2 of width in Q8 format
  • effectiveBytes: total available bytes for encoding
  • isTransient: true if frame is transient
  • vbr: true if using variable bitrate
  • constrainedVBR: true if using constrained VBR
  • toneFreq: detected tone frequency in radians/sample (-1 if none)
  • toneishness: tone purity metric (0.0-1.0)

Reference: libopus celt/celt_encoder.c lines 1049-1273

func DynallocAnalysisSimple

func DynallocAnalysisSimple(bandLogE []float64, nbBands, lm, effectiveBytes int) DynallocResult

DynallocAnalysisSimple is a convenience wrapper for common mono encoding scenarios. It uses default parameters appropriate for typical audio encoding.

Parameters:

  • bandLogE: current frame band energies (log2 domain)
  • nbBands: number of frequency bands
  • lm: log2 of frame size multiplier
  • effectiveBytes: total available bytes

Returns: DynallocResult with maxDepth suitable for VBR floor_depth calculation

func DynallocAnalysisWithScratch

func DynallocAnalysisWithScratch(
	bandLogE, bandLogE2, oldBandE []float64,
	nbBands, start, end, channels, lsbDepth, lm int,
	logN []int16,
	effectiveBytes int,
	isTransient, vbr, constrainedVBR, lfe bool,
	toneFreq, toneishness float64,
	surroundDynalloc []float64,
	analysisValid bool,
	analysisLeakBoost []uint8,
	scratch *DynallocScratch,
) DynallocResult

DynallocAnalysisWithScratch is the zero-allocation version of DynallocAnalysis.

type DynallocScratch

type DynallocScratch struct {
	// Result arrays (caller provides these in result struct)
	Offsets      []int
	SpreadWeight []int
	Importance   []int

	// Conversion buffers (float32 for precision matching libopus)
	BandLogE32   []float32
	BandLogE2_32 []float32
	OldBandE32   []float32
	NoiseFloor   []float32

	// Masking model buffers
	Mask      []float32
	Sig       []float32
	Follower  []float32
	BandLogE3 []float32
	LeakBand  []float32
	LeakFrom  []float32
	LeakTo    []float32
}

DynallocScratch holds pre-allocated buffers for DynallocAnalysis.

func (*DynallocScratch) EnsureDynallocScratch

func (s *DynallocScratch) EnsureDynallocScratch(nbBands, channels int)

EnsureDynallocScratch ensures scratch buffers are large enough.

type EncodeOptions

type EncodeOptions struct {
	ForceIntra     bool // Force intra mode (no inter-frame prediction)
	ForceTransient bool // Force transient mode (short blocks)
	Bitrate        int  // Target bitrate in bits per second (0 = default)
}

EncodeOptions provides encoding control options.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder encodes audio frames using CELT transform coding. It maintains state across frames for proper audio continuity via energy prediction and overlap-add analysis.

The encoder state mirrors the decoder state to ensure synchronized prediction. This includes: - Energy arrays for inter-frame prediction - Overlap buffer for MDCT overlap-add - Pre-emphasis filter state - RNG state for deterministic folding decisions

Reference: RFC 6716 Section 4.3

func NewEncoder

func NewEncoder(channels int) *Encoder

NewEncoder creates a new CELT encoder with the given number of channels. Valid channel counts are 1 (mono) or 2 (stereo). The encoder is ready to process CELT frames after creation.

The initialization mirrors libopus encoder reset state: - prevEnergy starts at 0.0 (oldBandE cleared) - RNG seed 0 (matches libopus initialization)

func (*Encoder) AnalysisBandwidth

func (e *Encoder) AnalysisBandwidth() int

AnalysisBandwidth returns the current analysis-derived bandwidth index.

func (*Encoder) ApplyDCReject

func (e *Encoder) ApplyDCReject(pcm []float64) []float64

ApplyDCReject applies a DC rejection (high-pass) filter to remove DC offset. This matches libopus dc_reject() which is applied before CELT encoding.

The filter is a simple first-order high-pass:

coef = 6.3 * cutoffHz / sampleRate
out[i] = x[i] - m
m = coef*x[i] + (1-coef)*m

Reference: libopus src/opus_encoder.c dc_reject()

func (*Encoder) ApplyDCRejectScratchHybrid

func (e *Encoder) ApplyDCRejectScratchHybrid(pcm []float64) []float64

ApplyDCRejectScratchHybrid applies DC rejection using the encoder scratch buffers.

func (*Encoder) ApplyDelayCompensationScratchHybrid

func (e *Encoder) ApplyDelayCompensationScratchHybrid(pcm []float64, frameSize int) []float64

ApplyDelayCompensationScratchHybrid applies CELT delay compensation using encoder state. It prepends the delay buffer and returns a frame-sized slice of samples.

func (*Encoder) ApplyPreemphasis

func (e *Encoder) ApplyPreemphasis(pcm []float64) []float64

ApplyPreemphasis applies the pre-emphasis filter to PCM input samples. Pre-emphasis boosts high frequencies to improve coding efficiency.

The filter equation is:

y[n] = x[n] - PreemphCoef * x[n-1]

This is the inverse of the decoder's de-emphasis filter:

y[n] = x[n] + PreemphCoef * y[n-1]

The filter state is maintained in e.preemphState for frame continuity.

Parameters:

  • pcm: input PCM samples (interleaved if stereo)

Returns: pre-emphasized samples

Reference: RFC 6716 Section 4.3.5, libopus celt/celt_encoder.c Uses PreemphCoef = 0.85 (D03-05-03)

func (*Encoder) ApplyPreemphasisInPlace

func (e *Encoder) ApplyPreemphasisInPlace(pcm []float64)

ApplyPreemphasisInPlace applies pre-emphasis in-place to the input samples. This is more efficient when a copy is not needed.

func (*Encoder) ApplyPreemphasisWithScaling

func (e *Encoder) ApplyPreemphasisWithScaling(pcm []float64) []float64

ApplyPreemphasisWithScaling applies pre-emphasis with signal scaling. Input samples are first scaled from float range [-1.0, 1.0] to signal scale (multiplied by CELTSigScale = 32768), then the pre-emphasis filter is applied.

This matches libopus celt_preemphasis() behavior where samples are scaled and filtered together. The decoder's scaleSamples(1/32768) reverses the scaling.

func (*Encoder) ApplyPreemphasisWithScalingScratch

func (e *Encoder) ApplyPreemphasisWithScalingScratch(pcm []float64) []float64

ApplyPreemphasisWithScalingScratch applies pre-emphasis with scaling using pre-allocated scratch buffers. This is the zero-allocation version of ApplyPreemphasisWithScaling, suitable for use from the hybrid encoding path.

func (*Encoder) Bandwidth

func (e *Encoder) Bandwidth() CELTBandwidth

Bandwidth returns the active CELT bandwidth cap.

func (*Encoder) Bitrate

func (e *Encoder) Bitrate() int

Bitrate returns the current target bitrate in bits per second.

func (*Encoder) BitrateToBits

func (e *Encoder) BitrateToBits(frameSize int) int

BitrateToBits exposes bitrate_to_bits for hybrid callers.

func (*Encoder) CBRPayloadBytes

func (e *Encoder) CBRPayloadBytes(frameSize int) int

CBRPayloadBytes exposes cbrPayloadBytes for hybrid callers.

func (*Encoder) CapsScratch

func (e *Encoder) CapsScratch(nbBands int) []int

CapsScratch returns a scratch caps slice sized for nbBands.

func (*Encoder) Channels

func (e *Encoder) Channels() int

Channels returns the number of audio channels (1 or 2).

func (*Encoder) Complexity

func (e *Encoder) Complexity() int

Complexity returns the current complexity setting.

func (*Encoder) ComputeAllocationHybridScratch

func (e *Encoder) ComputeAllocationHybridScratch(re *rangecoding.Encoder, totalBitsQ3, nbBands int, cap, offsets []int, trim int, intensity int, dualStereo bool, lm int, prev int, signalBandwidth int) *AllocationResult

ComputeAllocationHybridScratch computes hybrid bit allocation using encoder scratch. This mirrors ComputeAllocationHybrid but avoids per-call allocations.

func (*Encoder) ComputeBandEnergies

func (e *Encoder) ComputeBandEnergies(mdctCoeffs []float64, nbBands, frameSize int) []float64

ComputeBandEnergies computes energy for each frequency band from MDCT coefficients. Returns energies in log2 scale, RELATIVE TO MEAN (same as libopus). energies[c*nbBands + band] = log2(amplitude) - eMeans[band]

The energy computation extracts loudness per frequency band: 1. For each band, sum squares of MDCT coefficients 2. Divide by band width to get average power 3. Convert to log2 scale: energy = 0.5 * log2(sumSq) 4. Subtract eMeans to make values mean-relative (like libopus amp2Log2)

The decoder adds eMeans back during denormalization, recovering the original. This ensures encoder and decoder use matching gain values.

Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c amp2Log2()

func (*Encoder) ComputeBandEnergiesInto

func (e *Encoder) ComputeBandEnergiesInto(mdctCoeffs []float64, nbBands, frameSize int, dst []float64)

ComputeBandEnergiesInto computes band energies into the provided destination buffer. Use this instead of ComputeBandEnergies when you need to avoid buffer aliasing.

func (*Encoder) ComputeBandEnergiesRaw

func (e *Encoder) ComputeBandEnergiesRaw(mdctCoeffs []float64, nbBands, frameSize int) []float64

ComputeBandEnergiesRaw computes energy for each frequency band WITHOUT eMeans subtraction. Returns raw energies in log2 scale (log2 of amplitude). Used for testing/debugging to compare with libopus intermediate values.

func (*Encoder) ComputeMDCTWithHistoryScratch

func (e *Encoder) ComputeMDCTWithHistoryScratch(inputScratch, samples, history []float64, shortBlocks int) []float64

ComputeMDCTWithHistoryScratch computes MDCT with history using scratch buffers. inputScratch is used to assemble [history|samples] before the transform. history is updated in-place with the current frame's tail. EnsureScratch must have been called first.

func (*Encoder) ComputeMDCTWithHistoryScratchStereoL

func (e *Encoder) ComputeMDCTWithHistoryScratchStereoL(samples, history []float64, shortBlocks int) []float64

ComputeMDCTWithHistoryScratchStereoL computes MDCT for the left channel using separate scratch output buffers. The result is written to scratch.mdctLeft so it survives a subsequent right-channel call.

func (*Encoder) ComputeMDCTWithHistoryScratchStereoR

func (e *Encoder) ComputeMDCTWithHistoryScratchStereoR(samples, history []float64, shortBlocks int) []float64

ComputeMDCTWithHistoryScratchStereoR computes MDCT for the right channel using separate scratch output buffers. The result is written to scratch.mdctRight.

func (*Encoder) ComputeSubBlockEnergies

func (e *Encoder) ComputeSubBlockEnergies(pcm []float64, frameSize int) []float64

ComputeSubBlockEnergies computes energy per sub-block for analysis. Returns energy values for each of the 8 sub-blocks. Useful for debugging or adaptive thresholding.

func (*Encoder) ConsecTransient

func (e *Encoder) ConsecTransient() int

ConsecTransient returns the number of consecutive transient frames.

func (*Encoder) ConstrainedVBR

func (e *Encoder) ConstrainedVBR() bool

ConstrainedVBR reports whether constrained VBR mode is enabled.

func (*Encoder) DCRejectEnabled

func (e *Encoder) DCRejectEnabled() bool

DCRejectEnabled reports whether dc_reject is applied in EncodeFrame.

func (*Encoder) DecideIntraMode

func (e *Encoder) DecideIntraMode(energies []float64, startBand, nbBands int, lm int) bool

DecideIntraMode runs libopus-style two-pass intra/inter selection for coarse energy. It performs trial encodes on a saved range coder state and restores state before returning. startBand specifies the first band to encode (0 for CELT-only, 17 for hybrid mode). This matches libopus quant_coarse_energy which iterates from start to end.

func (*Encoder) DelayCompensationEnabled

func (e *Encoder) DelayCompensationEnabled() bool

DelayCompensationEnabled reports whether EncodeFrame applies lookahead delay compensation.

func (*Encoder) DetectPercussiveAttack

func (e *Encoder) DetectPercussiveAttack(pcm []float64, frameSize int) (bool, int, float64)

DetectPercussiveAttack performs specialized detection for sharp percussive attacks. This is optimized for drum hits, hand claps, and other impulsive sounds that require very fine time resolution.

Unlike the standard transient detector which uses forward/backward masking, this function looks for:

  • Very rapid energy rise (attack time < 5ms)
  • High peak-to-average ratio (crest factor)
  • Broadband energy distribution (not tonal)

Parameters:

  • pcm: input PCM samples (mono or interleaved stereo)
  • frameSize: frame size in samples

Returns: (isPercussive, attackPosition, attackStrength)

  • isPercussive: true if a sharp percussive attack is detected
  • attackPosition: sample index where attack begins (0 to frameSize-1)
  • attackStrength: measure of attack sharpness (0.0 to 1.0)

This can be used to:

  1. Force transient mode even when standard detection misses it
  2. Adjust TF resolution for optimal attack preservation
  3. Guide pre-echo reduction in VBR mode

func (*Encoder) DetectTransient

func (e *Encoder) DetectTransient(pcm []float64, frameSize int) bool

DetectTransient analyzes PCM for sudden energy changes. Returns true if the frame should use short MDCT blocks.

Transient detection identifies frames with: - Sharp attacks (drum hits, plucks) - Sudden silence - Energy jumps > 6dB between adjacent sub-blocks

When transient is detected, the encoder uses multiple short MDCTs instead of one long MDCT for better time resolution at the cost of frequency resolution.

Parameters:

  • pcm: input PCM samples (mono or interleaved stereo)
  • frameSize: frame size in samples (120, 240, 480, or 960)

Returns: true if transient detected and short blocks should be used

Reference: RFC 6716 Section 4.3.1, libopus celt/celt_encoder.c

func (*Encoder) DetectTransientWithCustomThreshold

func (e *Encoder) DetectTransientWithCustomThreshold(pcm []float64, frameSize int, threshold float64) bool

DetectTransientWithCustomThreshold detects transients with a custom threshold. This allows tuning for different audio content types.

func (*Encoder) DynallocAnalysisHybridScratch

func (e *Encoder) DynallocAnalysisHybridScratch(bandLogE, bandLogE2, oldBandE []float64, nbBands, start, end, lsbDepth, lm int, effectiveBytes int, isTransient, vbr, constrainedVBR bool, toneFreq, toneishness float64) DynallocResult

DynallocAnalysisHybridScratch runs dynalloc analysis using encoder scratch buffers.

func (*Encoder) EncodeBandPVQ

func (e *Encoder) EncodeBandPVQ(shape []float64, n, k int)

EncodeBandPVQ encodes a normalized band shape using PVQ. k is the number of pulses (determined by bit allocation via bitsToKEncode).

Parameters:

  • shape: normalized band shape (unit L2 norm)
  • n: band width (number of MDCT bins)
  • k: number of pulses

The encoded data consists of a single PVQ index encoded uniformly with V(n,k) possible values.

Reference: libopus celt/bands.c quant_band()

func (*Encoder) EncodeBands

func (e *Encoder) EncodeBands(shapesL, shapesR [][]float64, bandBits []int, nbBands, frameSize int)

EncodeBands encodes all bands using PVQ. shapesL, shapesR: normalized band shapes for Left/Right (R is nil for mono) bandBits: bit allocation per band from ComputeAllocation nbBands: number of bands frameSize: frame size in samples (120, 240, 480, 960)

For each band: - If bits <= 0: skip (band will be folded by decoder) - Otherwise: compute k from bits and encode via EncodeBandPVQ - For stereo, bits are split between L and R (Dual Stereo)

Reference: libopus celt/bands.c quant_all_bands()

func (*Encoder) EncodeBandsHybrid

func (e *Encoder) EncodeBandsHybrid(shapesL, shapesR [][]float64, bandBits []int, nbBands, frameSize, startBand int)

EncodeBandsHybrid encodes bands for hybrid mode (starting from startBand). In hybrid mode, bands 0 to startBand-1 are handled by SILK. Only bands from startBand onwards are PVQ encoded.

Reference: RFC 6716 Section 3.2 - Hybrid mode uses start_band=17 for CELT

func (*Encoder) EncodeCoarseEnergy

func (e *Encoder) EncodeCoarseEnergy(energies []float64, nbBands int, intra bool, lm int) []float64

EncodeCoarseEnergy encodes coarse (6dB step) band energies. This mirrors decoder's DecodeCoarseEnergy exactly (in reverse). intra=true: no inter-frame prediction (first frame or after loss) intra=false: uses alpha prediction from previous frame

Returns the quantized energies (after encoding) for use by fine energy encoding.

Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c quant_coarse_energy()

func (*Encoder) EncodeCoarseEnergyHybrid

func (e *Encoder) EncodeCoarseEnergyHybrid(energies []float64, nbBands int, intra bool, lm int, startBand int) []float64

EncodeCoarseEnergyHybrid encodes coarse energies for hybrid mode. Only encodes bands from startBand onwards (typically band 17).

func (*Encoder) EncodeCoarseEnergyRange

func (e *Encoder) EncodeCoarseEnergyRange(energies []float64, start, end int, intra bool, lm int) []float64

EncodeCoarseEnergyRange encodes coarse energies for bands in [start, end). This mirrors EncodeCoarseEnergy but only processes the specified band range. Bands outside the range keep their previous energy values.

func (*Encoder) EncodeCoarseEnergyWithEncoder

func (e *Encoder) EncodeCoarseEnergyWithEncoder(re *rangecoding.Encoder, energies []float64, nbBands int, intra bool, lm int) []float64

EncodeCoarseEnergyWithEncoder encodes coarse energies using an explicit range encoder. This variant allows passing a range encoder directly rather than using e.rangeEncoder.

func (*Encoder) EncodeEnergyFinalise

func (e *Encoder) EncodeEnergyFinalise(energies []float64, quantizedEnergies []float64, nbBands int, fineQuant []int, finePriority []int, bitsLeft int)

EncodeEnergyFinalise consumes leftover bits for additional energy refinement. This mirrors decoder's DecodeEnergyFinalise (libopus quant_energy_finalise). energies: original target energies quantizedEnergies: current quantized energies (coarse + fine) fineQuant/finePriority: allocation outputs bitsLeft: remaining whole bits available in the packet

func (*Encoder) EncodeEnergyFinaliseRange

func (e *Encoder) EncodeEnergyFinaliseRange(energies []float64, quantizedEnergies []float64, start, end int, fineQuant []int, finePriority []int, bitsLeft int)

EncodeEnergyFinaliseRange consumes leftover bits for energy refinement in [start, end).

func (*Encoder) EncodeEnergyFinaliseRangeFromError

func (e *Encoder) EncodeEnergyFinaliseRangeFromError(quantizedEnergies []float64, start, end int, fineQuant []int, finePriority []int, bitsLeft int)

EncodeEnergyFinaliseRangeFromError mirrors libopus quant_energy_finalise() for hybrid range coding, consuming the in-place residual state from coarse/fine.

func (*Encoder) EncodeEnergyRemainder

func (e *Encoder) EncodeEnergyRemainder(energies []float64, quantizedEnergies []float64, nbBands int, remainderBits []int)

EncodeEnergyRemainder encodes any leftover precision bits. Called after PVQ bands decoded, uses leftover bits from bit allocation. This mirrors decoder's DecodeEnergyRemainder exactly (in reverse).

Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c quant_energy_finalise()

func (*Encoder) EncodeEnergyRemainderWithEncoder

func (e *Encoder) EncodeEnergyRemainderWithEncoder(re *rangecoding.Encoder, energies []float64, quantizedEnergies []float64, nbBands int, remainderBits []int)

EncodeEnergyRemainderWithEncoder encodes remainder bits using an explicit range encoder.

func (*Encoder) EncodeFineEnergy

func (e *Encoder) EncodeFineEnergy(energies []float64, quantizedCoarse []float64, nbBands int, fineBits []int)

EncodeFineEnergy encodes fine energy refinement bits. This adds fractional precision to coarse energy values. fineBits[band] specifies bits allocated for refinement (0 = no refinement).

This mirrors decoder's DecodeFineEnergy exactly (in reverse).

Reference: RFC 6716 Section 4.3.2, libopus celt/quant_bands.c quant_fine_energy()

func (*Encoder) EncodeFineEnergyHybrid

func (e *Encoder) EncodeFineEnergyHybrid(energies []float64, quantizedCoarse []float64, nbBands int, fineBits []int, startBand int)

EncodeFineEnergyHybrid encodes fine energies for hybrid mode. Only encodes bands from startBand onwards.

func (*Encoder) EncodeFineEnergyRange

func (e *Encoder) EncodeFineEnergyRange(energies []float64, quantizedCoarse []float64, start, end int, fineBits []int)

EncodeFineEnergyRange encodes fine energies for bands in [start, end).

func (*Encoder) EncodeFineEnergyRangeFromError

func (e *Encoder) EncodeFineEnergyRangeFromError(quantizedEnergies []float64, start, end int, fineBits []int)

EncodeFineEnergyRangeFromError mirrors libopus quant_fine_energy() for hybrid range coding, consuming the in-place coarse error residual state.

func (*Encoder) EncodeFineEnergyWithEncoder

func (e *Encoder) EncodeFineEnergyWithEncoder(re *rangecoding.Encoder, energies []float64, quantizedCoarse []float64, nbBands int, fineBits []int)

EncodeFineEnergyWithEncoder encodes fine energies using an explicit range encoder.

func (*Encoder) EncodeFrame

func (e *Encoder) EncodeFrame(pcm []float64, frameSize int) ([]byte, error)

EncodeFrame encodes a complete CELT frame from PCM samples. pcm: input samples (interleaved if stereo), length = frameSize * channels frameSize: 120, 240, 480, or 960 samples Returns: encoded bytes

The encoding pipeline (mirrors decoder's DecodeFrame): 1. Validate inputs 2. Get mode configuration 3. Detect transient 4. Apply pre-emphasis 5. Compute MDCT 6. Compute band energies 7. Normalize bands 8. Initialize range encoder 9. Encode frame flags (silence, transient, intra) 10. For stereo: encode stereo params 11. Encode coarse energy 12. Compute bit allocation 13. Encode fine energy 14. Encode bands (PVQ) 15. Finalize and return bytes

Reference: RFC 6716 Section 4.3, libopus celt/celt_encoder.c

func (*Encoder) EncodeFrameWithOptions

func (e *Encoder) EncodeFrameWithOptions(pcm []float64, frameSize int, opts EncodeOptions) ([]byte, error)

EncodeFrameWithOptions encodes a frame with additional control options.

func (*Encoder) EncodeStereoFrame

func (e *Encoder) EncodeStereoFrame(left, right []float64, frameSize int) ([]byte, error)

EncodeStereoFrame encodes a stereo frame from separate L/R channels. left: left channel samples right: right channel samples frameSize: 120, 240, 480, or 960 samples per channel

func (*Encoder) EncodeStereoParams

func (e *Encoder) EncodeStereoParams(nbBands int) int

EncodeStereoParams encodes stereo mode parameters to the bitstream. For the initial implementation, this encodes mid-side stereo only: - intensity = nbBands (meaning no intensity stereo, all bands use mid-side) - dual_stereo = 0 (meaning mid-side mode, not dual stereo)

Returns the intensity band (-1 since intensity stereo is disabled in this mode).

The decoder reads stereo params in decodeStereoParams() which expects: 1. intensity band index encoded with Laplace model 2. dual_stereo flag encoded as single bit

Reference: RFC 6716 Section 4.3.4, libopus celt/celt_decoder.c

func (*Encoder) EncodeStereoParamsWithIntensity

func (e *Encoder) EncodeStereoParamsWithIntensity(nbBands, intensityBand int, dualStereo bool) int

EncodeStereoParamsWithIntensity encodes stereo params with optional intensity stereo. intensityBand: band where intensity stereo starts (-1 to disable) dualStereo: true for dual stereo mode

For future use when intensity stereo is implemented.

func (*Encoder) EnergyMask

func (e *Encoder) EnergyMask() []float64

EnergyMask returns the current per-band surround mask. The returned slice aliases encoder state and must not be modified by callers.

func (*Encoder) EnsureScratch

func (e *Encoder) EnsureScratch(frameSize int)

EnsureScratch ensures all scratch buffers are properly sized for the given frame size. Call this before using the encoder's scratch-aware methods from an external path (e.g., hybrid encoding) that does not go through EncodeFrame.

func (*Encoder) FinalRange

func (e *Encoder) FinalRange() uint32

FinalRange returns the final range coder state after encoding. This matches libopus OPUS_GET_FINAL_RANGE and is used for bitstream verification. Must be called after EncodeFrame() to get a meaningful value.

func (*Encoder) FrameCount

func (e *Encoder) FrameCount() int

FrameCount returns the number of frames encoded.

func (*Encoder) GetAttackDuration

func (e *Encoder) GetAttackDuration() int

GetAttackDuration returns the number of consecutive transient frames. This is useful for adapting encoding parameters during percussive passages. A value > 1 indicates sustained percussive activity (e.g., drum roll).

func (*Encoder) GetEnergy

func (e *Encoder) GetEnergy(band, channel int) float64

GetEnergy returns the energy for a specific band and channel from prevEnergy.

func (*Encoder) GetLastBandLogE

func (e *Encoder) GetLastBandLogE() []float64

GetLastBandLogE returns the last frame's primary band log-energies. These are the bandLogE values passed to DynallocAnalysis.

func (*Encoder) GetLastBandLogE2

func (e *Encoder) GetLastBandLogE2() []float64

GetLastBandLogE2 returns the last frame's secondary band log-energies. For transients, this is from the long MDCT; otherwise same as bandLogE.

func (*Encoder) GetLastDynalloc

func (e *Encoder) GetLastDynalloc() DynallocResult

GetLastDynalloc returns the last computed dynalloc result. This is computed during encoding and stored for the next frame's VBR decisions.

func (*Encoder) HFAverage

func (e *Encoder) HFAverage() int

HFAverage returns the high-frequency average used for tapset decision. This is updated during SpreadingDecision when updateHF=true.

func (*Encoder) IncrementFrameCount

func (e *Encoder) IncrementFrameCount()

IncrementFrameCount increments the frame counter. Call this after successfully encoding a frame.

func (*Encoder) IsHybrid

func (e *Encoder) IsHybrid() bool

IsHybrid returns true if the encoder is in hybrid mode.

func (*Encoder) IsIntraFrame

func (e *Encoder) IsIntraFrame() bool

IsIntraFrame returns true if this frame should use intra mode.

This matches libopus two-pass behavior for complexity >= 4: - libopus uses force_intra=0 by default - With two_pass=1 (complexity >= 4), intra starts as force_intra (=0) - Then two-pass encoding compares intra vs inter and picks the better one

For simplicity, we match the libopus default: always return false (inter mode) even for frame 0, because libopus's two-pass typically chooses inter mode for the first frame when encoding simple signals (like sine waves).

Reference: libopus celt/quant_bands.c line 279:

intra = force_intra || (!two_pass && *delayedIntra>2*C*(end-start) && ...)

With two_pass=1 and force_intra=0, this evaluates to intra=0.

func (*Encoder) LFE

func (e *Encoder) LFE() bool

LFE reports whether LFE mode constraints are enabled.

func (*Encoder) LSBDepth

func (e *Encoder) LSBDepth() int

LSBDepth returns the current input signal LSB depth.

func (*Encoder) LastCodedBands

func (e *Encoder) LastCodedBands() int

LastCodedBands returns the last coded band count used for allocation skip decisions.

func (*Encoder) LastTonality

func (e *Encoder) LastTonality() float64

LastTonality returns the most recently computed tonality estimate. The value ranges from 0 (noise-like spectrum) to 1 (pure tone). This is used by computeVBRTarget for bit allocation decisions.

func (*Encoder) MDCTScratch

func (e *Encoder) MDCTScratch(samples []float64) []float64

MDCTScratch computes the MDCT using the encoder's pre-allocated scratch buffers. This is the zero-allocation equivalent of the public MDCT function. EnsureScratch must have been called with an appropriate frameSize first.

func (*Encoder) MDCTShortScratch

func (e *Encoder) MDCTShortScratch(samples []float64, shortBlocks int) []float64

MDCTShortScratch computes the short-block MDCT using scratch buffers. This is the zero-allocation equivalent of MDCTShort. EnsureScratch must have been called with an appropriate frameSize first.

func (*Encoder) NextRNG

func (e *Encoder) NextRNG() uint32

NextRNG advances the RNG and returns the new value. Uses the same LCG as libopus for deterministic behavior (D03-04-03).

func (*Encoder) NormalizeBands

func (e *Encoder) NormalizeBands(mdctCoeffs []float64, energies []float64, nbBands, frameSize int) [][]float64

NormalizeBands divides each band's MDCT coefficients by its energy, producing unit-norm shapes ready for PVQ quantization. Returns shapes[band] = normalized coefficients for that band.

The decoder does: output = shape * gain (denormalization) So encoder does: shape = input / gain (normalization)

Parameters:

  • mdctCoeffs: MDCT coefficients for all bands concatenated
  • energies: per-band energy values (log2 scale from coarse + fine energy)
  • nbBands: number of bands to process
  • frameSize: frame size in samples (120, 240, 480, 960)

Returns: shapes[band] = normalized float64 vector with unit L2 norm

Reference: RFC 6716 Section 4.3.4.1

func (*Encoder) NormalizeBandsToArray

func (e *Encoder) NormalizeBandsToArray(mdctCoeffs []float64, energies []float64, nbBands, frameSize int) []float64

NormalizeBandsToArray normalizes bands into a single contiguous array (length = frameSize). This mirrors libopus normalise_bands(): divide by the per-band LINEAR amplitude.

CRITICAL FIX: This function now uses LINEAR band amplitudes computed directly from MDCT coefficients, NOT log-domain energies converted back to linear. The log-domain roundtrip was introducing quantization errors that corrupted PVQ encoding.

The energies parameter is now IGNORED - we compute linear amplitudes directly from mdctCoeffs. This matches libopus which calls compute_band_energies() to get linear bandE, then uses that directly in normalise_bands().

Reference: libopus celt/bands.c normalise_bands() (float path, lines 172-187)

func (*Encoder) NormalizeBandsToArrayMonoWithBandE

func (e *Encoder) NormalizeBandsToArrayMonoWithBandE(mdctCoeffs []float64, nbBands, frameSize int) (norm []float64, bandE []float64)

NormalizeBandsToArrayMonoWithBandE normalizes MDCT coefficients for mono and returns the normalized coefficients and linear band amplitudes.

func (*Encoder) NormalizeBandsToArrayStereoWithBandE

func (e *Encoder) NormalizeBandsToArrayStereoWithBandE(mdctLeft, mdctRight []float64, nbBands, frameSize int) (normL, normR, bandE []float64)

NormalizeBandsToArrayStereoWithBandE normalizes MDCT coefficients for stereo and returns normalized L/R coefficients plus combined linear band amplitudes. The bandE layout is [L bands][R bands].

func (*Encoder) OffsetsScratch

func (e *Encoder) OffsetsScratch(nbBands int) []int

OffsetsScratch returns a scratch offsets slice sized for nbBands.

func (*Encoder) OverlapBuffer

func (e *Encoder) OverlapBuffer() []float64

OverlapBuffer returns the overlap buffer for MDCT analysis. Size is Overlap * channels samples.

func (*Encoder) PacketLoss

func (e *Encoder) PacketLoss() int

PacketLoss returns the expected packet loss percentage.

func (*Encoder) PhaseInversionDisabled

func (e *Encoder) PhaseInversionDisabled() bool

PhaseInversionDisabled returns whether stereo phase inversion is disabled.

func (*Encoder) Prediction

func (e *Encoder) Prediction() int

Prediction returns the active CELT prediction mode (0, 1, or 2).

func (*Encoder) PreemphState

func (e *Encoder) PreemphState() []float64

PreemphState returns the pre-emphasis filter state. One value per channel.

func (*Encoder) PreparePVQDebugFrame

func (e *Encoder) PreparePVQDebugFrame(frame int)

PreparePVQDebugFrame resets per-call PVQ debug sequencing. This is used by temporary parity probes.

func (*Encoder) PrevBandLogEnergy

func (e *Encoder) PrevBandLogEnergy() []float64

PrevBandLogEnergy returns the previous frame's band log-energies. Used for spectral flux computation in tonality analysis.

func (*Encoder) PrevEnergy

func (e *Encoder) PrevEnergy() []float64

PrevEnergy returns the previous frame's band energies. Used for inter-frame energy prediction in coarse energy encoding. Layout: [band0_ch0, band1_ch0, ..., band20_ch0, band0_ch1, ..., band20_ch1]

func (*Encoder) PrevEnergy2

func (e *Encoder) PrevEnergy2() []float64

PrevEnergy2 returns the band energies from two frames ago. Used for anti-collapse detection.

func (*Encoder) QuantAllBandsEncodeScratch

func (e *Encoder) QuantAllBandsEncodeScratch(re *rangecoding.Encoder, channels, frameSize, lm int, start, end int,
	normL, normR []float64, pulses []int, shortBlocks int, spread int, tapset int, dualStereo int, intensity int,
	tfRes []int, totalBitsQ3 int, balance int, codedBands int, seed *uint32, complexity int, bandE []float64)

QuantAllBandsEncodeScratch encodes PVQ bands using the encoder's scratch buffers.

func (*Encoder) RNG

func (e *Encoder) RNG() uint32

RNG returns the current RNG state. After encoding, this contains the final range coder state for verification.

func (*Encoder) RangeEncoder

func (e *Encoder) RangeEncoder() *rangecoding.Encoder

RangeEncoder returns the current range encoder.

func (*Encoder) Reset

func (e *Encoder) Reset()

Reset clears encoder state for a new stream. Call this when starting to encode a new audio stream.

func (*Encoder) ResetTransientState

func (e *Encoder) ResetTransientState()

ResetTransientState clears the transient detection state. Call this when starting a new audio segment or after a discontinuity.

func (*Encoder) RoundFloat64ToFloat32

func (e *Encoder) RoundFloat64ToFloat32(x []float64)

RoundFloat64ToFloat32 rounds each element to float32 precision and back.

func (*Encoder) SampleRate

func (e *Encoder) SampleRate() int

SampleRate returns the operating sample rate (always 48000 for CELT).

func (*Encoder) SetAnalysisBandwidth

func (e *Encoder) SetAnalysisBandwidth(bandwidth int, valid bool)

SetAnalysisBandwidth provides the analysis-derived bandwidth index (1..20) used by allocation gating in clt_compute_allocation().

func (*Encoder) SetAnalysisInfo

func (e *Encoder) SetAnalysisInfo(bandwidth int, leakBoost [leakBands]uint8, activity, tonalitySlope float64, maxPitchRatio float64, valid bool)

SetAnalysisInfo provides analysis-derived state from the top-level Opus analysis pipeline. This mirrors libopus use of AnalysisInfo in CELT dynalloc.

func (*Encoder) SetBandwidth

func (e *Encoder) SetBandwidth(bw CELTBandwidth)

SetBandwidth sets the CELT bandwidth cap used for band allocation.

func (*Encoder) SetBitrate

func (e *Encoder) SetBitrate(bps int)

SetBitrate sets the target bitrate in bits per second. This affects bit allocation for frame encoding.

func (*Encoder) SetCoarseDecisionHook

func (e *Encoder) SetCoarseDecisionHook(fn func(CoarseDecisionStats))

SetCoarseDecisionHook installs a callback that receives per-band coarse quantization decisions during EncodeCoarseEnergy.

func (*Encoder) SetCoarseEnergyAvailableBytes

func (e *Encoder) SetCoarseEnergyAvailableBytes(bytes int)

SetCoarseEnergyAvailableBytes overrides nbAvailableBytes used by coarse energy intra/decay logic. Use 0 to clear the override.

func (*Encoder) SetComplexity

func (e *Encoder) SetComplexity(complexity int)

SetComplexity sets encoder complexity (0-10). Higher values use more CPU for better quality.

func (*Encoder) SetConstrainedVBR

func (e *Encoder) SetConstrainedVBR(enabled bool)

SetConstrainedVBR enables or disables constrained VBR mode.

func (*Encoder) SetConstrainedVBRBoundScale

func (e *Encoder) SetConstrainedVBRBoundScale(scale float64)

SetConstrainedVBRBoundScale sets a scale for constrained-VBR vbr_bound. Valid range is [0, 1], where 1 matches libopus single-stream behavior.

func (*Encoder) SetDCRejectEnabled

func (e *Encoder) SetDCRejectEnabled(enabled bool)

SetDCRejectEnabled controls whether EncodeFrame applies dc_reject(). For Opus-level encoding, this should be false because dc_reject is already applied.

func (*Encoder) SetDelayCompensationEnabled

func (e *Encoder) SetDelayCompensationEnabled(enabled bool)

SetDelayCompensationEnabled controls whether EncodeFrame prepends Fs/250 lookahead history before CELT analysis/quantization.

func (*Encoder) SetEnergy

func (e *Encoder) SetEnergy(band, channel int, energy float64)

SetEnergy sets the energy for a specific band and channel.

func (*Encoder) SetEnergyMask

func (e *Encoder) SetEnergyMask(mask []float64)

SetEnergyMask sets per-band surround masking for CELT surround control. Expected sizes: 21 values for mono, 42 values for stereo. Invalid sizes clear the mask.

func (*Encoder) SetForceTransient

func (e *Encoder) SetForceTransient(force bool)

SetForceTransient forces short blocks for testing/debugging. When true, the encoder uses short blocks (transient mode) for the next frame regardless of transient analysis result.

func (*Encoder) SetFrameBitsForTest

func (e *Encoder) SetFrameBitsForTest(bits int)

SetFrameBitsForTest exposes frameBits for testing.

func (*Encoder) SetHybrid

func (e *Encoder) SetHybrid(hybrid bool)

SetHybrid sets the hybrid mode flag. When true, postfilter flag encoding is skipped per RFC 6716 Section 3.2. Reference: libopus celt_encoder.c line 2047-2048:

if(!hybrid && tell+16<=total_bits) ec_enc_bit_logp(enc, 0, 1);

func (*Encoder) SetLFE

func (e *Encoder) SetLFE(enabled bool)

SetLFE enables or disables LFE mode constraints.

func (*Encoder) SetLSBDepth

func (e *Encoder) SetLSBDepth(depth int)

SetLSBDepth sets the input signal LSB depth (8-24 bits). This affects masking/spread decisions at low bitrates.

func (*Encoder) SetLastCodedBands

func (e *Encoder) SetLastCodedBands(val int)

SetLastCodedBands updates the last coded band count.

func (*Encoder) SetLastTonality

func (e *Encoder) SetLastTonality(tonality float64)

SetLastTonality sets the tonality estimate (for testing or manual override). Valid range is [0, 1] where 0 = noise and 1 = pure tone.

func (*Encoder) SetMaxPayloadBytes

func (e *Encoder) SetMaxPayloadBytes(maxPayloadBytes int)

SetMaxPayloadBytes sets an optional payload cap for the next CELT frame. The value excludes the Opus TOC byte. A value <= 0 disables the cap.

func (*Encoder) SetOverlapBuffer

func (e *Encoder) SetOverlapBuffer(samples []float64)

SetOverlapBuffer copies the given samples to the overlap buffer.

func (*Encoder) SetPacketLoss

func (e *Encoder) SetPacketLoss(lossPercent int)

SetPacketLoss sets the expected packet loss percentage (0-100). This affects the prefilter gain for improved loss resilience.

func (*Encoder) SetPhaseInversionDisabled

func (e *Encoder) SetPhaseInversionDisabled(disabled bool)

SetPhaseInversionDisabled disables stereo phase inversion. When true, the encoder will not use phase inversion for stereo decorrelation. This can improve compatibility with some audio processing chains.

func (*Encoder) SetPrediction

func (e *Encoder) SetPrediction(mode int)

SetPrediction controls CELT inter-frame prediction behavior. Valid modes mirror libopus CELT_SET_PREDICTION: - 0: disable prediction and force intra (disable_pf=1, force_intra=1) - 1: disable prefilter only (disable_pf=1, force_intra=0) - 2: normal prediction (disable_pf=0, force_intra=0)

func (*Encoder) SetPrefilterDebugHook

func (e *Encoder) SetPrefilterDebugHook(fn func(PrefilterDebugStats))

SetPrefilterDebugHook installs a callback that receives per-frame prefilter stats.

func (*Encoder) SetPrevEnergy

func (e *Encoder) SetPrevEnergy(energies []float64)

SetPrevEnergy shifts current prev to prev2 and sets new prev energies. This should be called after encoding a frame with the actual energies used.

func (*Encoder) SetPrevEnergyWithPrev

func (e *Encoder) SetPrevEnergyWithPrev(prev, energies []float64)

SetPrevEnergyWithPrev updates prevEnergy using the provided previous state. This avoids losing the prior frame when prevEnergy is updated during encoding.

func (*Encoder) SetRNG

func (e *Encoder) SetRNG(seed uint32)

SetRNG sets the RNG state.

func (*Encoder) SetRangeEncoder

func (e *Encoder) SetRangeEncoder(re *rangecoding.Encoder)

SetRangeEncoder sets the range encoder for the current frame. This must be called before encoding each frame.

func (*Encoder) SetSurroundTrim

func (e *Encoder) SetSurroundTrim(trim float64)

SetSurroundTrim sets the surround trim adjustment used by alloc_trim analysis. Positive values reduce alloc_trim (favoring higher bands), matching libopus.

func (*Encoder) SetTapsetDecision

func (e *Encoder) SetTapsetDecision(tapset int)

SetTapsetDecision sets the tapset decision value. Valid values are 0, 1, or 2.

func (*Encoder) SetTargetStatsHook

func (e *Encoder) SetTargetStatsHook(fn func(CeltTargetStats))

SetTargetStatsHook installs a callback that receives per-frame CELT VBR targets.

func (*Encoder) SetVBR

func (e *Encoder) SetVBR(enabled bool)

SetVBR enables or disables variable bitrate mode.

func (*Encoder) SignalBandwidthForAllocation

func (e *Encoder) SignalBandwidthForAllocation(nbBands, equivRate int) int

SignalBandwidthForAllocation mirrors libopus signal-bandwidth gating used by clt_compute_allocation(). It combines analysis bandwidth with the equivalent bitrate-derived minimum bandwidth floor.

func (*Encoder) SpreadingDecision

func (e *Encoder) SpreadingDecision(normX []float64, nbBands, channels, frameSize int, updateHF bool) int

SpreadingDecision analyzes the normalized MDCT coefficients to decide the optimal spread parameter for PVQ quantization.

The spread parameter controls how pulses are distributed across the band: - SPREAD_AGGRESSIVE (3): More spreading, better for tonal signals - SPREAD_NORMAL (2): Default spreading - SPREAD_LIGHT (1): Less spreading - SPREAD_NONE (0): No spreading, for very noisy signals

The algorithm counts how many coefficients fall below certain thresholds relative to the band energy. Tonal signals have energy concentrated in few bins (low counts), while noisy signals have energy spread across many bins (high counts).

Parameters:

  • normX: normalized MDCT coefficients (unit-norm per band)
  • nbBands: number of bands to analyze
  • channels: number of audio channels (1 or 2)
  • frameSize: frame size in samples (determines M scaling)
  • updateHF: whether to update high-frequency average for tapset decision

Returns: spread decision (0=SPREAD_NONE, 1=SPREAD_LIGHT, 2=SPREAD_NORMAL, 3=SPREAD_AGGRESSIVE)

Reference: libopus celt/bands.c spreading_decision()

func (*Encoder) SpreadingDecisionWithWeights

func (e *Encoder) SpreadingDecisionWithWeights(normX []float64, nbBands, channels, frameSize int, updateHF bool, spreadWeight []int) int

SpreadingDecisionWithWeights analyzes the normalized MDCT coefficients to decide the optimal spread parameter for PVQ quantization, using precomputed spread weights.

The spread parameter controls how pulses are distributed across the band: - SPREAD_AGGRESSIVE (3): More spreading, better for tonal signals - SPREAD_NORMAL (2): Default spreading - SPREAD_LIGHT (1): Less spreading - SPREAD_NONE (0): No spreading, for very noisy signals

Parameters:

  • normX: normalized MDCT coefficients (unit-norm per band)
  • nbBands: number of bands to analyze
  • channels: number of audio channels (1 or 2)
  • frameSize: frame size in samples (determines M scaling)
  • updateHF: whether to update high-frequency average for tapset decision
  • spreadWeight: per-band weights from ComputeSpreadWeights

Returns: spread decision (0=SPREAD_NONE, 1=SPREAD_LIGHT, 2=SPREAD_NORMAL, 3=SPREAD_AGGRESSIVE)

Reference: libopus celt/bands.c spreading_decision()

func (*Encoder) StabilizeEnergiesBeforeCoarseHybrid

func (e *Encoder) StabilizeEnergiesBeforeCoarseHybrid(energies []float64, start, end, nbBands int)

StabilizeEnergiesBeforeCoarseHybrid mirrors libopus pre-coarse stabilization: if abs(bandLogE-oldBandE) < 2, bias current energy toward previous quant error.

func (*Encoder) SurroundTrim

func (e *Encoder) SurroundTrim() float64

SurroundTrim returns the current surround trim adjustment.

func (*Encoder) TFAnalysisHybridScratch

func (e *Encoder) TFAnalysisHybridScratch(norm []float64, nbBands int, transient bool, lm int, tfEstimate float64, effectiveBytes int, importance []int) ([]int, int)

TFAnalysisHybridScratch runs TF analysis using the encoder's scratch buffers.

func (*Encoder) TFResScratch

func (e *Encoder) TFResScratch(nbBands int) []int

TFResScratch returns a scratch TF resolution slice sized for nbBands.

func (*Encoder) TapsetDecision

func (e *Encoder) TapsetDecision() int

TapsetDecision returns the current tapset decision (0, 1, or 2). The tapset controls the window taper used in the prefilter/postfilter comb filter: - 0: Narrow taper (concentrated energy) - 1: Medium taper (balanced) - 2: Wide taper (spread energy) This is computed during SpreadingDecision when updateHF=true. Reference: libopus celt/bands.c spreading_decision() and celt/celt.c comb_filter()

func (*Encoder) TestEncodeLaplace

func (e *Encoder) TestEncodeLaplace(val, fs, decay int) int

TestEncodeLaplace exposes encodeLaplace for testing.

func (*Encoder) TransientAnalysis

func (e *Encoder) TransientAnalysis(pcm []float64, frameSize int, allowWeakTransients bool) TransientAnalysisResult

TransientAnalysis performs full transient analysis matching libopus. This computes:

  • Whether the frame is transient (should use short blocks)
  • tf_estimate: bias for TF resolution analysis (0 = time, 1 = freq)
  • tf_chan: which channel has the strongest transient

The algorithm uses a high-pass filter followed by forward/backward masking to detect temporal energy variations. The mask_metric measures how much the signal energy varies over time relative to a masked threshold.

Parameters:

  • pcm: input PCM samples (mono or interleaved stereo)
  • frameSize: frame size in samples (120, 240, 480, or 960)
  • allowWeakTransients: for hybrid mode at low bitrate

Returns: TransientAnalysisResult with all metrics

Reference: libopus celt/celt_encoder.c transient_analysis()

func (*Encoder) TransientAnalysisHybrid

func (e *Encoder) TransientAnalysisHybrid(preemph []float64, frameSize, nbBands, lm int, allowWeakTransients bool) (transient bool, weakTransient bool, tfEstimate, toneFreq, toneishness float64, shortBlocks int, bandLogE2 []float64)

TransientAnalysisHybrid performs transient analysis and updates preemph overlap state. Returns transient flags, tf/tone metrics, shortBlocks choice, and optional bandLogE2.

func (*Encoder) TransientAnalysisWithState

func (e *Encoder) TransientAnalysisWithState(pcm []float64, frameSize int, allowWeakTransients bool) TransientAnalysisResult

TransientAnalysisWithState performs enhanced transient analysis using persistent state. This improves detection of percussive sounds by:

  1. Using persistent HP filter state across frames for better attack detection
  2. Tracking attack duration for multi-frame transient handling
  3. Applying hysteresis to prevent rapid toggling
  4. Adaptive thresholding based on signal level

This function updates the encoder's transient state and should be used when encoding sequences of frames for optimal percussive sound quality.

Parameters:

  • pcm: input PCM samples (mono or interleaved stereo, pre-emphasized)
  • frameSize: frame size in samples (total including overlap)
  • allowWeakTransients: for hybrid mode at low bitrate

Returns: TransientAnalysisResult with all metrics

Reference: libopus celt/celt_encoder.c transient_analysis() with state persistence

func (*Encoder) UpdateConsecTransient

func (e *Encoder) UpdateConsecTransient(transient bool)

UpdateConsecTransient updates the consecutive transient counter.

func (*Encoder) UpdateConsecTransientWithDisabled

func (e *Encoder) UpdateConsecTransientWithDisabled(transient bool, transientGotDisabled bool)

UpdateConsecTransientWithDisabled mirrors libopus consec_transient state cadence when transients are disabled by bit budget.

func (*Encoder) UpdateEnergyErrorHybrid

func (e *Encoder) UpdateEnergyErrorHybrid(energies, quantizedEnergies []float64, start, end, nbBands int)

UpdateEnergyErrorHybrid mirrors libopus energyError cadence in hybrid mode: clear all bands, then store clipped post-finalise residuals for coded bands.

func (*Encoder) UpdateEnergyErrorHybridFromError

func (e *Encoder) UpdateEnergyErrorHybridFromError(start, end, nbBands int)

UpdateEnergyErrorHybridFromError mirrors libopus hybrid cadence exactly: clear all bands, then store clipped post-finalise residual error[] values for coded bands. Residuals come from scratch.coarseError updated by coarse/fine/final.

func (*Encoder) UpdateHybridPrefilterHistory

func (e *Encoder) UpdateHybridPrefilterHistory(preemph []float64, frameSize int)

UpdateHybridPrefilterHistory mirrors the run_prefilter() state updates used by libopus hybrid mode when prefilter signaling is disabled.

func (*Encoder) UpdateTonalityAnalysisHybrid

func (e *Encoder) UpdateTonalityAnalysisHybrid(normCoeffs, energies []float64, nbBands, frameSize int)

UpdateTonalityAnalysisHybrid updates tonality metrics for VBR decisions.

func (*Encoder) VBR

func (e *Encoder) VBR() bool

VBR reports whether variable bitrate mode is enabled.

type KissCpx

type KissCpx = kissCpx

KissCpx is an exported alias of the internal Kiss FFT complex scratch type. It allows callers to provide reusable scratch buffers and avoid per-call allocations in hot paths.

type KissFFT64State

type KissFFT64State struct {
	// contains filtered or unexported fields
}

KissFFT64State holds the precomputed state for mixed-radix FFT (float64). Supports sizes that factor into 2, 3, 4, and 5. This is based on libopus kiss_fft implementation optimized for CELT.

func GetKissFFT64State

func GetKissFFT64State(nfft int) *KissFFT64State

GetKissFFT64State returns a cached or newly created FFT state for the given size.

func (*KissFFT64State) KissFFT

func (s *KissFFT64State) KissFFT(fin, fout []complex128)

KissFFT performs the forward FFT.

func (*KissFFT64State) KissIFFT

func (s *KissFFT64State) KissIFFT(fin, fout []complex128)

KissIFFT performs the inverse FFT.

type ModeConfig

type ModeConfig struct {
	FrameSize   int // Samples at 48kHz: 120, 240, 480, 960
	ShortBlocks int // Number of short MDCTs if transient: 1, 2, 4, 8
	LM          int // Log mode index: 0, 1, 2, 3
	EffBands    int // Effective number of bands for this frame size
	MDCTSize    int // MDCT window size for long blocks
}

ModeConfig contains frame-size-dependent configuration for CELT decoding. Parameters vary based on frame duration (2.5ms to 20ms).

func GetModeConfig

func GetModeConfig(frameSize int) ModeConfig

GetModeConfig returns the mode configuration for the given frame size. Valid frame sizes are 120, 240, 480, and 960 samples at 48kHz.

type PrefilterDebugStats

type PrefilterDebugStats struct {
	Frame          int
	Enabled        bool
	UsedTonePath   bool
	UsedPitchPath  bool
	TFEstimate     float64
	NBBytes        int
	ToneFreq       float64
	Toneishness    float64
	MaxPitchRatio  float64
	PitchSearchOut int
	PitchBeforeRD  int
	PitchAfterRD   int
	PFOn           bool
	QG             int
	Gain           float64
}

PrefilterDebugStats captures per-frame prefilter diagnostics.

type QuantCoarseEnergyParams

type QuantCoarseEnergyParams struct {
	// Start is the first band to encode (typically 0, or 17 for hybrid).
	Start int
	// End is the last band to encode (exclusive).
	End int
	// EffEnd is the effective end for distortion computation.
	EffEnd int
	// LM is the log mode (0=2.5ms, 1=5ms, 2=10ms, 3=20ms).
	LM int
	// Channels is the number of audio channels (1 or 2).
	Channels int
	// Budget is the total bit budget for encoding.
	Budget int
	// NBAvailableBytes is the number of bytes available for encoding.
	NBAvailableBytes int
	// ForceIntra forces intra mode regardless of analysis.
	ForceIntra bool
	// TwoPass enables two-pass encoding comparing intra vs inter.
	TwoPass bool
	// LossRate is the packet loss rate (0-100).
	LossRate int
	// LFE indicates low-frequency effects mode.
	LFE bool
}

QuantCoarseEnergyParams holds parameters for coarse energy quantization.

type QuantCoarseEnergyResult

type QuantCoarseEnergyResult struct {
	// QuantizedEnergy is the quantized energy per band per channel.
	// Layout: [ch0_band0, ch0_band1, ..., ch1_band0, ch1_band1, ...]
	QuantizedEnergy []float64

	// Error is the quantization error per band per channel (for fine energy).
	// error[i] = original - quantized (in DB6 units)
	Error []float64

	// Intra indicates whether intra mode was used.
	Intra bool
}

QuantCoarseEnergyResult holds the result of coarse energy quantization.

func QuantCoarseEnergy

func QuantCoarseEnergy(
	re *rangecoding.Encoder,
	eBands []float64,
	oldEBands []float64,
	params QuantCoarseEnergyParams,
	delayedIntra *float32,
) QuantCoarseEnergyResult

QuantCoarseEnergy quantizes coarse band energies with two-pass comparison. This is the main entry point matching libopus quant_coarse_energy().

The algorithm: 1. Optionally encode with intra mode (no inter-frame prediction) 2. Encode with inter mode (using inter-frame prediction) 3. Compare results and pick the better one (when two_pass is enabled)

Reference: libopus celt/quant_bands.c quant_coarse_energy()

type StereoMode

type StereoMode int

StereoMode specifies the stereo coding mode for a band.

const (
	// StereoMidSide uses mid-side encoding with theta rotation.
	// Good for correlated stereo content (most music).
	StereoMidSide StereoMode = iota

	// StereoIntensity uses mono with optional sign inversion.
	// Used for high frequency bands to save bits.
	StereoIntensity

	// StereoDual encodes left and right independently.
	// Used when channels are uncorrelated.
	StereoDual
)

func GetStereoMode

func GetStereoMode(band, intensityBand int, dualStereo bool) StereoMode

GetStereoMode determines the stereo mode for a band. The mode depends on:

  • band index relative to intensity stereo start band
  • whether dual stereo mode is enabled
  • bit allocation for the band

Parameters:

  • band: band index (0 to nbBands-1)
  • intensityBand: band where intensity stereo starts (-1 if not used)
  • dualStereo: true if dual stereo mode is enabled

Returns: the stereo mode to use for this band

func (StereoMode) String

func (sm StereoMode) String() string

String returns the string representation of the stereo mode.

type TFAnalysisScratch

type TFAnalysisScratch struct {
	Metric []int     // Per-band metric (size: nbEBands)
	Tmp    []float64 // Band coefficients working buffer
	Tmp1   []float64 // Copy for transient analysis
	Path0  []int     // Viterbi path state 0
	Path1  []int     // Viterbi path state 1
	TfRes  []int     // Output buffer
}

TFAnalysisScratch holds pre-allocated buffers for TF analysis.

func (*TFAnalysisScratch) EnsureTFAnalysisScratch

func (s *TFAnalysisScratch) EnsureTFAnalysisScratch(nbEBands, maxBandWidth int)

EnsureTFAnalysisScratch ensures scratch buffers are large enough.

type TonalityAnalysisResult

type TonalityAnalysisResult struct {
	Tonality     float64   // Overall tonality (0=noise, 1=pure tone)
	SFM          float64   // Spectral Flatness Measure (0=tonal, 1=flat/noise)
	BandTonality []float64 // Per-band tonality estimates
	SpectralFlux float64   // Frame-to-frame spectral change (0=stationary, higher=transient)
}

TonalityAnalysisResult holds the results of tonality analysis. Tonality measures how "tonal" (pitched/harmonic) vs "noisy" (aperiodic) a signal is. This information is used by the VBR algorithm to allocate more bits to tonal signals which benefit more from accurate spectral representation.

func ComputeTonality

func ComputeTonality(mdctCoeffs []float64, args ...interface{}) TonalityAnalysisResult

ComputeTonality analyzes MDCT coefficients to estimate signal tonality. Uses the Spectral Flatness Measure (SFM) which compares geometric mean to arithmetic mean. A flat spectrum (noise) has SFM close to 1, while a peaked spectrum (tone) has SFM close to 0. Tonality is computed as 1 - SFM, so tones have high tonality.

This is a variadic function that supports two call patterns: - ComputeTonality(mdctCoeffs, nbBands, frameSize) - explicit band configuration - ComputeTonality(mdctCoeffs, prevCoeffs) - with previous frame for flux (legacy)

Parameters:

  • mdctCoeffs: MDCT coefficients for one channel
  • args: either (nbBands int, frameSize int) or (prevCoeffs []float64)

Returns: TonalityAnalysisResult with overall and per-band tonality

Reference: ITU-R BS.1387 (PEAQ) for SFM definition

func ComputeTonalityFromNormalized

func ComputeTonalityFromNormalized(normCoeffs []float64, nbBands, frameSize int) TonalityAnalysisResult

ComputeTonalityFromNormalized computes tonality from pre-normalized MDCT coefficients. This is useful when normalization has already been done (as in encode_frame.go).

Parameters:

  • normCoeffs: normalized MDCT coefficients (unit energy per band)
  • nbBands: number of frequency bands
  • frameSize: frame size for scaling band boundaries

Returns: TonalityAnalysisResult

func ComputeTonalityWithBands

func ComputeTonalityWithBands(mdctCoeffs []float64, nbBands, frameSize int) TonalityAnalysisResult

ComputeTonalityWithBands analyzes MDCT coefficients with explicit band count. This is the more precise version that takes explicit nbBands and frameSize.

Parameters:

  • mdctCoeffs: MDCT coefficients for one channel
  • nbBands: number of frequency bands to analyze
  • frameSize: frame size in samples (used to scale band boundaries)

Returns: TonalityAnalysisResult with overall and per-band tonality

func ComputeTonalityWithBandsScratch

func ComputeTonalityWithBandsScratch(mdctCoeffs []float64, nbBands, frameSize int, scratch *TonalityScratch) TonalityAnalysisResult

ComputeTonalityWithBandsScratch analyzes MDCT coefficients with explicit band count using pre-allocated scratch buffers. This is the zero-allocation version.

type TonalityScratch

type TonalityScratch struct {
	Powers       []float64 // Power spectrum buffer (size: frameSize)
	BandTonality []float64 // Per-band tonality output (size: nbBands)
	BandPowers   []float64 // Temporary buffer for per-band power (size: max band width ~176)
}

TonalityScratch holds pre-allocated buffers for tonality analysis. This eliminates allocations in the hot path.

func (*TonalityScratch) EnsureTonalityScratch

func (s *TonalityScratch) EnsureTonalityScratch(frameSize, nbBands int)

EnsureTonalityScratch ensures the scratch buffers are large enough.

type TransientAnalysisResult

type TransientAnalysisResult struct {
	IsTransient   bool    // Whether a transient was detected
	TfEstimate    float64 // Time-frequency estimate (0.0 = time, 1.0 = freq) for TF analysis bias
	TfChannel     int     // Which channel had the strongest transient (0 or 1)
	MaskMetric    float64 // The raw mask metric value (for debugging)
	WeakTransient bool    // Whether this is a "weak" transient (for hybrid mode)
	ToneFreq      float64 // Detected tone frequency in radians/sample (-1 if no tone)
	Toneishness   float64 // How "pure" the tone is (0.0-1.0, higher = purer)
}

TransientAnalysisResult holds the results of transient analysis. This provides both the transient decision and the tf_estimate metric.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL