Waveform Generator — `wfmgen`¶

doppler ships a C-first waveform generator: one declarative synth engine (every algorithm in C, exactly once), exposed two ways —

wfmgen — the one command-line tool. A single waveform or a multi-segment scene, the raw / CSV / BLUE / SigMF containers, and streaming to ZMQ. (A one-segment run is the simple single-waveform case.)
doppler.wfm — the same engine as a Python API, one import path: from doppler.wfm import ….

Reach for --from-file (or the Python Composer) when you need multiple segments, mixing, BLUE/SigMF, or a ZMQ stream — otherwise the flags below generate a single waveform.

The 30-second version

wfmgen --type qpsk --snr 12 --count 100000 -o capture.cf32   # 100k QPSK samples @ 12 dB Es/No
wfmgen --type tone --freq 1e5 --count 4096                    # a tone → stdout (cf32)
wfmgen --type pn --pn_length 9 --file_type csv -o pn.csv      # length-9 MLS as text

Installation¶

pip install doppler-dsp        # → the `wfmgen` command + the doppler.wfm API

The wheel ships the self-contained wfmgen binary as package data and a wfmgen console-script — a thin shim that execs it — alongside the doppler.wfm Python module. To build from source instead:

git clone https://github.com/doppler-dsp/doppler && cd doppler
cmake -B build -DBUILD_PYTHON=ON && cmake --build build --target wfmgen_cli
# binary: build/native/src/wfmcompose/wfmgen

Waveform types¶

--type selects the waveform; every type shares the same parameter set.

`--type`	What it is	Key parameters
`tone`	a complex sinusoid at `--freq`	`--freq`
`noise`	complex AWGN (unit power)	`--snr` (ignored — it is noise)
`pn`	a maximum-length sequence (±1 chips), `--sps` samples/chip	`--pn_length`, `--pn_poly`, `--sps`
`bpsk`	BPSK symbols (PN-sourced data), `--sps` samples/symbol	`--sps`, `--snr`
`qpsk`	Gray-coded QPSK symbols (PN-sourced data)	`--sps`, `--snr`
`chirp`	linear-FM sweep `--freq` → `--f_end` over `--count`	`--freq`, `--f_end`
`bits`	a user bit pattern, oversampled `--sps` and cycled	`--bits`, `--modulation`, `--sps`

The data bits for bpsk/qpsk come from a deterministic PN sequence (seeded by --seed), so output is reproducible and receiver-correlatable. A chirp sweeps its instantaneous frequency linearly from --freq (the start) to --f_end over the --count samples, then holds at --f_end; --f_end < --freq is a down-chirp. The phase is continuous across segments, so concatenated chirps join seamlessly (radar pulse compression, SAR, sonar). A bits waveform instead plays back your sequence — a preamble, sync word, or test vector — given as a 0/1 string (--bits 10110101), a hex string (--bits-hex AA55, MSB first), or a file (--bits-file pattern.txt); --modulation (none/bpsk/qpsk) maps the bits to symbols. The PSK carriers (pn/bpsk/qpsk) default to rectangular sample-and-hold chips (a wide sinc² spectrum); add --pulse rrc for root-raised-cosine shaping to get a band-limited carrier (e.g. a WCDMA QPSK downlink at roll-off 0.22) straight from the generator.

wfmgen --type chirp --freq 100e3 --f_end 300e3 --fs 1e6 --count 10000 -o chirp.cf32
wfmgen --type bits --bits 10110101 --modulation bpsk --sps 8 --count 64 -o sync.cf32
wfmgen --type bits --bits-hex AA55 --modulation none --sps 4 -o preamble.cf32
wfmgen --type qpsk --sps 8 --pulse rrc --rrc-beta 0.22 --count 100000 -o wcdma.cf32

Parameter reference¶

Engine¶

=== ====

Flag	Type	Default	Meaning
`--type`	`tone noise pn bpsk qpsk chirp bits`	`tone`	waveform
`--fs`	float (Hz)	`1e6`	sample rate
`--freq`	float (Hz)	`0`	frequency offset from baseband (mixed by the LO); chirp start
`--f_end`	float (Hz)	`0`	chirp end frequency (`--type chirp` only)
`--snr`	float (dB)	`100`	SNR; metric chosen by `--snr_mode` (≈clean at 100)
`--snr_mode`	`auto fs ebno esno`	`auto`	how `--snr` is interpreted (see below)
`--seed`	uint32	`1`	PRNG / LFSR seed (deterministic)
`--sps`	int	`8`	samples per symbol (`*psk`/`bits`) / per chip (`pn`)
`--pn_length`	int (2..64)	`7`	LFSR register length → period `2ⁿ−1`
`--pn_poly`	uint64	`0`	LFSR polynomial; `0` ⇒ auto-pick the MLS polynomial
`--lfsr`	`galois fibonacci`	`galois`	LFSR realization (same polynomial/period, different sequence)
`--bits`	0/1 string	—	`bits`: pattern, e.g. `10110101` (or `--bits-hex`/`--bits-file`)
`--modulation`	`none bpsk qpsk`	`bpsk`	`bits`: how the pattern maps to symbols
`--pulse`	`rect rrc`	`rect`	pn/bpsk/qpsk pulse shape; `rrc` = band-limited RRC shaping
`--rrc-beta`	float	`0.35`	RRC roll-off (`--pulse rrc`)
`--rrc-span`	int	`8`	RRC filter support in symbols (`--pulse rrc`)
`--count`	int	`1024`	number of complex samples to generate

Output¶

Flag	Values	Default	Meaning
`--sample_type`	`cf32 cf64 ci32 ci16 ci8`	`cf32`	wire type; integers are full-scale ±1.0
`--file_type`	`raw csv blue sigmf`	`raw`	container (see Containers)
`--endian`	`le be`	`le`	byte order (raw/BLUE only; csv is text)
`--output` / `-o`	path (or `zmq://…`)	stdout	sink
`--record`	path	—	write a JSON record of the resolved run

Composer (multi-segment, `--from-file`)¶

Flag	Meaning
`--from-file SPEC.json`	run a multi-segment spec (see Multi-segment)
`json-template [FILE]`	subcommand: dump an editable example spec (to `FILE`, else stdout) — see Multi-segment
`--level DB`	source level in dBFS (≤0); scales the segment by `10^(DB/20)` (SNR-invariant; default 0)
`--headroom DB`	back the output off to `−DB` dBFS so peaks fit (SNR-invariant; default 0)
`--clip-report`	print the clipped fraction + peak; `--clip-error` exits non-zero on a clip
`--fc HZ`	capture center frequency, written into BLUE/SigMF metadata
`--off N`	trailing off-time (zeros) after the segment
`--repeat`	loop the whole sequence
`--continuous`	never stop (implies repeat) — for streaming
`--detached`	BLUE only: write `<out>.hdr` (HCB) + `<out>.det` (data)
`--realtime`	pace the output to `fs`, mimicking a sample clock (see Real-time pacing)
`--realtime-resync`	like `--realtime`, but re-anchor to "now" on each underrun

Amplitude & full-scale¶

The amplitude invariant is unit average power: every waveform is normalised so its mean power is 1.0. That — not a constant envelope — is what the rest of the system is built on. It is the reference the SNR math uses (signal power ≡ 1, so the noise σ falls straight out of the target SNR; see SNR & noise), and the level you control is the SNR, not a signal gain. The I/Q full-scale is ±1.0 per axis (→ the largest integer code).

Today's waveforms all happen to be constant-envelope, so for them the peak equals the average and they sit exactly at ±1.0 — but that is a property of the current set, not a design assumption:

`--type`	Sample values	Envelope	Avg. power
`tone`	`exp(j·2πft)`	constant, mag 1	`1.0`
`bpsk` / `pn`	`±1` (real axis)	constant, mag 1	`1.0`
`qpsk`	`(±1/√2, ±1/√2)`	constant, mag 1	`1.0`
`chirp`	`exp(j·φ(t))`, φ′ ramps `freq→f_end`	constant, mag 1	`1.0`
`noise`	complex Gaussian, `σ = 1/√2` per axis	Gaussian, PAPR > 0	`1.0`

Don't rely on |z| = 1. A pulse-shaped (RRC), QAM, or OFDM waveform has a peak-to-average power ratio (PAPR) above 0 dB: at unit average power its peaks run well past ±1.0. noise, and any signal-plus-noise sum, already do.

Scaling to the wire, and headroom¶

cf32 / cf64 carry samples verbatim and never clip — peaks past ±1.0 are preserved. The integer types map ±1.0 → ±max-code by saturating each axis to ±1.0, then truncating toward zero (a plain cast, not round-to-nearest):

`--sample_type`	Map	Full-scale code
`ci32`	`clip(v, ±1)·(2³¹−1)`	`±2 147 483 647`
`ci16`	`clip(v, ±1)·32767`	`±32 767`
`ci8`	`clip(v, ±1)·127`	`±127`

So clipping is governed by PAPR, not by something being "signal" vs "noise":

A constant-envelope, clean signal (today's tone/PSK/PN at --snr 100) fills the integer range exactly, with no clipping.
Any PAPR > 0 dB content clips at the rails — added noise (at --snr 0, noise power = signal power, ~⅓ of integer I/Q components already saturate) and any future pulse-shaped / QAM / OFDM mode. Such a signal needs headroom, which wfmgen provides: --headroom <dB> (and Writer(headroom=…) in Python) scales the whole output down to −H dBFS so the peaks fit. It is a single common gain, so it is SNR-invariant — it moves only the absolute level, not any power ratio — and 0 dB (the default) is a bit-exact no-op. An integer capture that clips reports the exact backoff to use (remedy: --headroom N). You can also just carry envelope-varying signals as a float type (cf32 / cf64), which never clips.

Reader inverts the same map, so a float round-trip is exact and an integer round-trip is exact only where it neither clipped nor truncated.

>>> import numpy as np
>>> from doppler.wfm import Synth
>>> # the invariant is unit *average* power (here a clean, constant-envelope QPSK)
>>> x = Synth(type="qpsk", sps=1, snr=100.0).steps(4096)
>>> bool(np.allclose(np.mean(np.abs(x) ** 2), 1.0))
True
>>> # add noise (or, later, pulse-shaping / QAM) and peaks exceed full-scale:
>>> y = Synth(type="qpsk", sps=1, snr=0.0).steps(100000)
>>> float(np.mean(np.abs(y.real) > 1.0)) > 0.1   # many samples clip in ci*
True

SNR & noise¶

--snr is applied as AWGN; --snr_mode chooses the reference:

Mode	`--snr` means	Use for
`fs`	SNR over the full sample rate (in-band power / noise power)	tones, wideband
`esno`	Es/No — energy per symbol over noise PSD	modulated (`*psk`)
`ebno`	Eb/No — energy per bit over noise PSD	link-budget work
`auto`	`fs` for `tone`/`noise`/`pn`, `esno` for `bpsk`/`qpsk`	the sensible default

--snr 100 (the default) is clean — snr ≥ 100 dB generates no AWGN at all, so a clean waveform pays no noise cost. Lower --snr to add noise; the signal stays at unit average power (Amplitude & full-scale), so the per-axis noise σ is σ = sqrt(1 / (2·10^(snr_fs/10))), where Es/No and Eb/No are first converted to an over-fs SNR using 10·log10(sps) (and, for Eb/No, the bits/symbol: 1 for BPSK/PN, 2 for QPSK). (--type noise always generates AWGN.) Likewise --freq 0 skips the LO — the carrier is a constant 1 — so a clean baseband waveform is pure signal generation.

Same QPSK at three references

wfmgen --type qpsk --snr 10 --snr_mode esno     # 10 dB Es/No (the auto default)
wfmgen --type qpsk --snr 7  --snr_mode ebno     # 7 dB Eb/No  (= 10 dB Es/No)
wfmgen --type qpsk --snr 1  --snr_mode fs        # 1 dB over fs (per-sample)

PN sequences & MLS¶

--type pn emits a maximum-length sequence; --pn_length n sets the LFSR register length (2 to 64, period 2ⁿ−1). The register, polynomial, and --pn_poly are full 64-bit. Leave --pn_poly 0 and the engine selects a primitive polynomial that yields a true MLS for that length (a built-in table of verified primitive polynomials for every length 2..64) — verified by period, balance, and the thumbtack autocorrelation. Supply --pn_poly only to force a specific tap set.

wfmgen --type pn --pn_length 7   --sps 1 --count 127   # one full period (2⁷−1)
wfmgen --type pn --pn_length 11  --sps 4               # length-11 MLS, 4× oversampled
wfmgen --type pn --pn_length 7   --lfsr fibonacci      # Fibonacci realization

--lfsr selects the LFSR realization: galois (default, internal XOR feedback) or fibonacci (external XOR of the tapped bits). Both use the same primitive polynomial and have the same period 2ⁿ−1; they differ only in the chip sequence/phase. The Fibonacci taps are derived from the same polynomial, so --pn_poly 0 still auto-selects the MLS for either mode.

Containers¶

--sample_type (the datatype) is orthogonal to --file_type (the container) and --endian (byte order).

`--file_type`	Output	Notes
`raw`	interleaved I/Q in the chosen `--sample_type`	the SDR default; honors `--endian`
`csv`	one `I,Q` line per sample	`%0.9f` cf32, `%0.17g` cf64, `%d` integer; text, no endian
`blue`	X-Midas / REDHAWK BLUE type-1000	(`wfmgen` only) self-describing 512-byte header
`sigmf`	`<base>.sigmf-data` + `<base>.sigmf-meta`	(`wfmgen` only) one annotation per segment

BLUE type-1000 writes a complete 512-byte X-Midas/REDHAWK Header Control Block so one file is fully self-describing: data_rep←--endian, format (CB/CI/CL/CF/CD)←--sample_type, and xdelta = 1/fs. Add --detached to split it into a header + data pair — <out>.hdr (the HCB, with detached=1 and data_start=0) and <out>.det (the raw samples). Detached output requires --output and a finite (non---continuous) run; attached mode keeps whatever extension you give -o (.blue/.prm/.tmp).

SigMF writes the samples as raw into <base>.sigmf-data and a JSON sidecar <base>.sigmf-meta with core:datatype/core:sample_rate, a capture at --fc, and one annotation per composer segment (frequency edges, label, wfmgen:* params).

# 16-bit big-endian into a self-describing BLUE file
wfmgen --type qpsk --count 200000 --sample_type ci16 --endian be \
       --file_type blue -o capture.blue

# a SigMF pair (capture.sigmf-data + capture.sigmf-meta)
wfmgen --from-file scenario.json --sample_type ci16 --file_type sigmf -o capture

Sinks¶

`--output`	Result
(omitted)	binary stream to stdout (pipe it)
`file.iq`	write to a file
`zmq://tcp://*:5555`	(`wfmgen` only) publish to a ZMQ PUB endpoint (SIGS wire format)

wfmgen --type tone --count 1000000 | other-tool          # pipe via stdout
wfmgen  --type tone --continuous --output zmq://tcp://*:5555   # stream forever to ZMQ

A dp_sub_* subscriber (e.g. examples/c/spectrum_analyzer) reads the ZMQ stream.

Real-time pacing¶

By default wfmgen emits as fast as the CPU allows — fs is only metadata (the BLUE xdelta, the ZMQ header). Add --realtime to throttle the output to fs, so blocks leave on an epoch + n/fs schedule — mimicking a hardware sample clock feeding the sink. This is what you want when a downstream consumer expects samples to arrive at the real rate (a live spectrum display, an SDR playback emulation):

# Stream QPSK to a live receiver at the true 1 MS/s, not as fast as possible
wfmgen --type qpsk --fs 1e6 --sps 8 --continuous --realtime \
       --output zmq://tcp://*:5555

The schedule is drift-free: each deadline is recomputed from the cumulative sample count against a fixed epoch, so sleep jitter never accumulates — the long-run rate is exactly fs. Pacing does not alter the samples; a file written with and without --realtime is byte-identical.

If the producer can't keep up (a block takes longer than its N/fs period — an underrun), wfmgen keeps the absolute timeline and prints a summary to stderr at exit (wfmgen: 3 underrun(s) — worst 1.2 ms behind real time). Use --realtime-resync instead to re-anchor the clock to "now" on each underrun, staying near real time going forward at the cost of an inserted gap.

Software pacing is average-rate, not sample-accurate

On a non-realtime OS you get a drift-free average rate with bounded per-block jitter, never true sample-clock fidelity. Keep blocks large enough that the period N/fs comfortably exceeds scheduler jitter, and let the consumer's buffer absorb the rest. The same engine is in Python as SampleClock.

Multi-segment specs¶

wfmgen --from-file SPEC.json sequences segments — each a waveform plus an optional trailing off-time — and can repeat or run forever. The schema is wfmgen-1:

{
  "version": "wfmgen-1",
  "repeat": false,
  "continuous": false,
  "segments": [
    { "type": "tone", "fs": 1e6, "freq": 1e5, "snr": 100.0,
      "num_samples": 10000, "off_samples": 5000 },
    { "type": "qpsk", "fs": 1e6, "snr": 9.0, "snr_mode": "esno",
      "sps": 8, "num_samples": 40000, "off_samples": 0 }
  ]
}

type and snr_mode are strings; every other field is numeric and falls back to the engine default if omitted. num_samples is the on-time; off_samples is a trailing gap of zeros. repeat loops the sequence; continuous never finishes (for streaming).

wfmgen --from-file scenario.json -o scenario.cf32

Rather than write the schema from memory, dump a ready-to-edit example with wfmgen json-template and edit it down:

wfmgen json-template > scenario.json   # or: wfmgen json-template scenario.json
# …edit scenario.json…
wfmgen --from-file scenario.json -o scenario.cf32

The template is a representative spec — an inline tone, an RRC-shaped QPSK-from-bits burst with a trailing gap, and a two-source sum mix — that is valid by construction: it round-trips through --from-file unchanged, so it doubles as a working starting point, not just documentation. With no path (or -) it prints to stdout; pass a path to write the file directly.

Mixing sources (`sum`) and sequencing them (`add`)¶

A segment can hold several sources mixed at the same time — a signal of interest plus interferers plus a noise floor — instead of just one. The two composition verbs are orthogonal:

.sum() mixes sources over the same span (one receiver → one sample rate, one shared noise floor). SNR lives on a source; the floor is resolved once, in C, so the Python, JSON, and wfmgen --from-file faces are byte-identical.
.add() sequences segments in time, back-to-back — the multi-segment timeline above, built fluently.

flowchart LR
    subgraph SEG["Segment — .sum() mixes at the SAME time, one noise floor"]
        direction TB
        y1["Synth qpsk · level −12 dBFS"]
        y2["Synth tone · interferer"]
        y3["Synth noise · the floor"]
    end
    subgraph TL["Timeline — .add() sequences in TIME ▶"]
        direction LR
        sA["Segment A"] --> sB["Segment B<br/>(+ trailing gap)"] --> sC["…"]
    end
    SEG -- ".add(B, …)" --> sA
    TL --> COMP["Composer(…).compose()"] --> IQ[("complex64 I/Q")]

    classDef syn fill:#ede7f6,stroke:#5e35b1,color:#000;
    classDef seg fill:#e3f2fd,stroke:#1565c0,color:#000;
    class y1,y2,y3 syn;
    class sA,sB,sC seg;

from doppler.wfm import Composer, Segment, qpsk, tone, noise

# A scene: a −12 dB QPSK SoI at +50 kHz over a CW interferer, at 15 dB Es/No.
scene = Segment.sum(
    qpsk(snr=15, snr_mode="esno", level=-12),  # the anchor sets the floor
    tone(freq=5e4),                             # an interferer (level 0 dBFS)
    num_samples=65536,
)

# Sequence a clean preamble tone, then the scene:
timeline = Segment("tone", freq=1e5, num_samples=2000, off_samples=500).add(scene)
iq = Composer(timeline).compose()

Rules of the floor (resolved per segment): an explicit noise(level=N) source fixes it at N dBFS; otherwise the first source carrying snr is the anchor and the floor is level(anchor) − SNR_fs(anchor). Other sources place themselves with level (a plain dBFS offset); giving a non-anchor both snr and level is a spec error. A single-source segment keeps its bundled AWGN untouched, so it is byte-identical to the pre-composition path.

In the wfmgen-1 JSON schema, a mixed segment replaces the inline source fields with a sum array (each entry is a source; fs/num_samples/off_samples stay on the segment):

{ "fs": 1e6, "num_samples": 65536, "off_samples": 0,
  "sum": [
    { "type": "qpsk", "snr": 15.0, "snr_mode": "esno", "sps": 8, "level": -12.0 },
    { "type": "tone", "freq": 5e4 }
  ] }

--record writes the resolved sum (the cleaned anchor plus an explicit noise source at the floor), so a recorded scene reproduces exactly.

Reproducible runs (`--record`)¶

--record run.json writes the fully-resolved spec — every value after defaulting (the auto-selected MLS polynomial, the resolved SNR mode, a summed segment's cleaned anchor + explicit noise floor) and the --headroom. Feed it straight back with --from-file and you get a byte-identical stream:

wfmgen --type bpsk --count 50000 --sps 4 --headroom 6 --record run.json -o a.iq
wfmgen --from-file run.json -o b.iq      # a.iq and b.iq are identical

The recorded --headroom is reapplied on replay; an explicit --headroom on the --from-file run overrides it. Use --record to document a capture next to its data, or to pin an exact scenario in a test.

The three faces of `wfmgen`¶

wfmgen is generated by just-makeit in three faces that accept the same flags and produce byte-identical output:

wfmgen --type qpsk --count 4096 -o out.iq            # 1. installed C-backed console script
python -m doppler.wfm.cli --type qpsk --count 4096 # 1b. same, as a module
python wfmgen.py --type qpsk --count 4096            # 2. PEP 723 script (uv run wfmgen.py)
./build/wfmgen --type qpsk --count 4096              # 3. standalone C binary

Python API¶

The engine is also a Python class — ideal for notebooks and pipelines.

import numpy as np
from doppler.wfm import Synth

# Every flag is a keyword argument; the same defaults apply.
synth = Synth(type="qpsk", fs=1e6, snr=12.0, snr_mode="esno", sps=8, seed=1)

block = np.asarray(synth.steps(4096))   # → complex64 ndarray
one   = synth.step()                     # → a single complex64 sample
synth.reset()                            # restart the sequence (keeps config)

Also exported from doppler.wfm:

Symbol	Use
`Synth(type=…, …)`	the waveform engine (above)
`PN(poly, seed, length)`	a raw LFSR / PN sequence object
`bpsk_map(bits)` / `qpsk_map(syms)`	map bits/symbol-indices → cf32 constellation points
`wfm_awgn_amplitude(snr_db, signal_power)`	AWGN amplitude for a target SNR over fs
`wfm_ebno_to_snr_db(ebno_db, bits_per_symbol, samples_per_symbol)`	Eb/No → over-fs SNR

# A matched-filter QPSK constellation (the receiver view):
sym = np.asarray(Synth(type="qpsk", snr=20, snr_mode="esno", sps=8).steps(8*600))
pts = sym.reshape(-1, 8).mean(axis=1)    # boxcar matched filter per symbol

Reading a capture back¶

The raw container is interleaved I/Q in the chosen --sample_type, so a naive np.fromfile gets the layout (and, for integers, the scale) wrong. read_iq does the right thing — a zero-copy complex view for the float types, a SIMD rescale to ±1.0 for the integer types — or pass raw=True for the raw (N, 2) on-disk view:

from doppler.wfm.readback import read_iq

iq = read_iq("capture.iq", sample_type="ci16")   # → complex64, ±1.0
iq = read_iq("capture.iq", sample_type="cf32")   # → complex64, zero-copy

generate → read_iq is bit-faithful. See Type System → Reading interleaved I/Q.

Recipes¶

# A clean tone at +100 kHz, 1 Msample, 16-bit I/Q to a file
wfmgen --type tone --freq 1e5 --count 1000000 --sample_type ci16 -o tone.ci16

# Noisy BPSK at 6 dB Eb/No, as CSV for quick inspection
wfmgen --type bpsk --snr 6 --snr_mode ebno --count 2000 --file_type csv -o bpsk.csv

# A scenario: tone burst, gap, then QPSK — recorded for reproducibility
wfmgen --from-file scenario.json --record run.json --file_type blue -o scene.blue

# Stream continuous QPSK to ZMQ for a live receiver
wfmgen --type qpsk --snr 10 --continuous --output zmq://tcp://*:5555

Waveform Generator — wfmgen¶