Skip to content

Waveform Generator — wfmgen

doppler ships a C-first waveform generator: one declarative synth engine (every algorithm in C, exactly once), exposed two ways —

  • wfmgen — the one command-line tool. A single waveform or a multi-segment scene, the raw / CSV / BLUE / SigMF containers, and streaming to ZMQ. (A one-segment run is the simple single-waveform case.)
  • doppler.wfm — the same engine as a Python API, one import path: from doppler.wfm import ….

wfmgen engine

Reach for --from-file (or the Python Composer) when you need multiple segments, mixing, BLUE/SigMF, or a ZMQ stream — otherwise the flags below generate a single waveform.

The 30-second version

wfmgen --type qpsk --snr 12 --count 100000 -o capture.cf32   # 100k QPSK samples @ 12 dB Es/No
wfmgen --type tone --freq 1e5 --count 4096                    # a tone → stdout (cf32)
wfmgen --type pn --pn_length 9 --file_type csv -o pn.csv      # length-9 MLS as text

Installation

pip install doppler-dsp        # → the `wfmgen` command + the doppler.wfm API

The wheel ships the self-contained wfmgen binary as package data and a wfmgen console-script — a thin shim that execs it — alongside the doppler.wfm Python module. To build from source instead:

git clone https://github.com/doppler-dsp/doppler && cd doppler
cmake -B build -DBUILD_PYTHON=ON && cmake --build build --target wfmgen_cli
# binary: build/native/src/wfmcompose/wfmgen

Waveform types

--type selects the waveform; every type shares the same parameter set.

--type What it is Key parameters
tone a complex sinusoid at --freq --freq
noise complex AWGN (unit power) --snr (ignored — it is noise)
pn a maximum-length sequence (±1 chips), --sps samples/chip --pn_length, --pn_poly, --sps
bpsk BPSK symbols (PN-sourced data), --sps samples/symbol --sps, --snr
qpsk Gray-coded QPSK symbols (PN-sourced data) --sps, --snr
chirp linear-FM sweep --freq--f_end over --count --freq, --f_end
bits a user bit pattern, oversampled --sps and cycled --bits, --modulation, --sps

The data bits for bpsk/qpsk come from a deterministic PN sequence (seeded by --seed), so output is reproducible and receiver-correlatable. A chirp sweeps its instantaneous frequency linearly from --freq (the start) to --f_end over the --count samples, then holds at --f_end; --f_end < --freq is a down-chirp. The phase is continuous across segments, so concatenated chirps join seamlessly (radar pulse compression, SAR, sonar). A bits waveform instead plays back your sequence — a preamble, sync word, or test vector — given as a 0/1 string (--bits 10110101), a hex string (--bits-hex AA55, MSB first), or a file (--bits-file pattern.txt); --modulation (none/bpsk/qpsk) maps the bits to symbols. The PSK carriers (pn/bpsk/qpsk) default to rectangular sample-and-hold chips (a wide sinc² spectrum); add --pulse rrc for root-raised-cosine shaping to get a band-limited carrier (e.g. a WCDMA QPSK downlink at roll-off 0.22) straight from the generator.

wfmgen --type chirp --freq 100e3 --f_end 300e3 --fs 1e6 --count 10000 -o chirp.cf32
wfmgen --type bits --bits 10110101 --modulation bpsk --sps 8 --count 64 -o sync.cf32
wfmgen --type bits --bits-hex AA55 --modulation none --sps 4 -o preamble.cf32
wfmgen --type qpsk --sps 8 --pulse rrc --rrc-beta 0.22 --count 100000 -o wcdma.cf32

Parameter reference

Engine

=== ====

Flag Type Default Meaning
--type tone noise pn bpsk qpsk chirp bits tone waveform
--fs float (Hz) 1e6 sample rate
--freq float (Hz) 0 frequency offset from baseband (mixed by the LO); chirp start
--f_end float (Hz) 0 chirp end frequency (--type chirp only)
--snr float (dB) 100 SNR; metric chosen by --snr_mode (≈clean at 100)
--snr_mode auto fs ebno esno auto how --snr is interpreted (see below)
--seed uint32 1 PRNG / LFSR seed (deterministic)
--sps int 8 samples per symbol (*psk/bits) / per chip (pn)
--pn_length int (2..64) 7 LFSR register length → period 2ⁿ−1
--pn_poly uint64 0 LFSR polynomial; 0 ⇒ auto-pick the MLS polynomial
--lfsr galois fibonacci galois LFSR realization (same polynomial/period, different sequence)
--bits 0/1 string bits: pattern, e.g. 10110101 (or --bits-hex/--bits-file)
--modulation none bpsk qpsk bpsk bits: how the pattern maps to symbols
--pulse rect rrc rect pn/bpsk/qpsk pulse shape; rrc = band-limited RRC shaping
--rrc-beta float 0.35 RRC roll-off (--pulse rrc)
--rrc-span int 8 RRC filter support in symbols (--pulse rrc)
--count int 1024 number of complex samples to generate

Output

Flag Values Default Meaning
--sample_type cf32 cf64 ci32 ci16 ci8 cf32 wire type; integers are full-scale ±1.0
--file_type raw csv blue sigmf raw container (see Containers)
--endian le be le byte order (raw/BLUE only; csv is text)
--output / -o path (or zmq://…) stdout sink
--record path write a JSON record of the resolved run

Composer (multi-segment, --from-file)

Flag Meaning
--from-file SPEC.json run a multi-segment spec (see Multi-segment)
json-template [FILE] subcommand: dump an editable example spec (to FILE, else stdout) — see Multi-segment
--level DB source level in dBFS (≤0); scales the segment by 10^(DB/20) (SNR-invariant; default 0)
--headroom DB back the output off to −DB dBFS so peaks fit (SNR-invariant; default 0)
--clip-report print the clipped fraction + peak; --clip-error exits non-zero on a clip
--fc HZ capture center frequency, written into BLUE/SigMF metadata
--off N trailing off-time (zeros) after the segment
--repeat loop the whole sequence
--continuous never stop (implies repeat) — for streaming
--detached BLUE only: write <out>.hdr (HCB) + <out>.det (data)
--realtime pace the output to fs, mimicking a sample clock (see Real-time pacing)
--realtime-resync like --realtime, but re-anchor to "now" on each underrun

Amplitude & full-scale

The amplitude invariant is unit average power: every waveform is normalised so its mean power is 1.0. That — not a constant envelope — is what the rest of the system is built on. It is the reference the SNR math uses (signal power ≡ 1, so the noise σ falls straight out of the target SNR; see SNR & noise), and the level you control is the SNR, not a signal gain. The I/Q full-scale is ±1.0 per axis (→ the largest integer code).

Today's waveforms all happen to be constant-envelope, so for them the peak equals the average and they sit exactly at ±1.0 — but that is a property of the current set, not a design assumption:

--type Sample values Envelope Avg. power
tone exp(j·2πft) constant, mag 1 1.0
bpsk / pn ±1 (real axis) constant, mag 1 1.0
qpsk (±1/√2, ±1/√2) constant, mag 1 1.0
chirp exp(j·φ(t)), φ′ ramps freq→f_end constant, mag 1 1.0
noise complex Gaussian, σ = 1/√2 per axis Gaussian, PAPR > 0 1.0

Don't rely on |z| = 1. A pulse-shaped (RRC), QAM, or OFDM waveform has a peak-to-average power ratio (PAPR) above 0 dB: at unit average power its peaks run well past ±1.0. noise, and any signal-plus-noise sum, already do.

Scaling to the wire, and headroom

cf32 / cf64 carry samples verbatim and never clip — peaks past ±1.0 are preserved. The integer types map ±1.0 → ±max-code by saturating each axis to ±1.0, then truncating toward zero (a plain cast, not round-to-nearest):

--sample_type Map Full-scale code
ci32 clip(v, ±1)·(2³¹−1) ±2 147 483 647
ci16 clip(v, ±1)·32767 ±32 767
ci8 clip(v, ±1)·127 ±127

So clipping is governed by PAPR, not by something being "signal" vs "noise":

  • A constant-envelope, clean signal (today's tone/PSK/PN at --snr 100) fills the integer range exactly, with no clipping.
  • Any PAPR > 0 dB content clips at the rails — added noise (at --snr 0, noise power = signal power, ~⅓ of integer I/Q components already saturate) and any future pulse-shaped / QAM / OFDM mode. Such a signal needs headroom, which wfmgen provides: --headroom <dB> (and Writer(headroom=…) in Python) scales the whole output down to −H dBFS so the peaks fit. It is a single common gain, so it is SNR-invariant — it moves only the absolute level, not any power ratio — and 0 dB (the default) is a bit-exact no-op. An integer capture that clips reports the exact backoff to use (remedy: --headroom N). You can also just carry envelope-varying signals as a float type (cf32 / cf64), which never clips.

Reader inverts the same map, so a float round-trip is exact and an integer round-trip is exact only where it neither clipped nor truncated.

>>> import numpy as np
>>> from doppler.wfm import Synth
>>> # the invariant is unit *average* power (here a clean, constant-envelope QPSK)
>>> x = Synth(type="qpsk", sps=1, snr=100.0).steps(4096)
>>> bool(np.allclose(np.mean(np.abs(x) ** 2), 1.0))
True
>>> # add noise (or, later, pulse-shaping / QAM) and peaks exceed full-scale:
>>> y = Synth(type="qpsk", sps=1, snr=0.0).steps(100000)
>>> float(np.mean(np.abs(y.real) > 1.0)) > 0.1   # many samples clip in ci*
True

SNR & noise

--snr is applied as AWGN; --snr_mode chooses the reference:

Mode --snr means Use for
fs SNR over the full sample rate (in-band power / noise power) tones, wideband
esno Es/No — energy per symbol over noise PSD modulated (*psk)
ebno Eb/No — energy per bit over noise PSD link-budget work
auto fs for tone/noise/pn, esno for bpsk/qpsk the sensible default

--snr 100 (the default) is cleansnr ≥ 100 dB generates no AWGN at all, so a clean waveform pays no noise cost. Lower --snr to add noise; the signal stays at unit average power (Amplitude & full-scale), so the per-axis noise σ is σ = sqrt(1 / (2·10^(snr_fs/10))), where Es/No and Eb/No are first converted to an over-fs SNR using 10·log10(sps) (and, for Eb/No, the bits/symbol: 1 for BPSK/PN, 2 for QPSK). (--type noise always generates AWGN.) Likewise --freq 0 skips the LO — the carrier is a constant 1 — so a clean baseband waveform is pure signal generation.

Same QPSK at three references

wfmgen --type qpsk --snr 10 --snr_mode esno     # 10 dB Es/No (the auto default)
wfmgen --type qpsk --snr 7  --snr_mode ebno     # 7 dB Eb/No  (= 10 dB Es/No)
wfmgen --type qpsk --snr 1  --snr_mode fs        # 1 dB over fs (per-sample)

PN sequences & MLS

--type pn emits a maximum-length sequence; --pn_length n sets the LFSR register length (2 to 64, period 2ⁿ−1). The register, polynomial, and --pn_poly are full 64-bit. Leave --pn_poly 0 and the engine selects a primitive polynomial that yields a true MLS for that length (a built-in table of verified primitive polynomials for every length 2..64) — verified by period, balance, and the thumbtack autocorrelation. Supply --pn_poly only to force a specific tap set.

wfmgen --type pn --pn_length 7   --sps 1 --count 127   # one full period (2⁷−1)
wfmgen --type pn --pn_length 11  --sps 4               # length-11 MLS, 4× oversampled
wfmgen --type pn --pn_length 7   --lfsr fibonacci      # Fibonacci realization

--lfsr selects the LFSR realization: galois (default, internal XOR feedback) or fibonacci (external XOR of the tapped bits). Both use the same primitive polynomial and have the same period 2ⁿ−1; they differ only in the chip sequence/phase. The Fibonacci taps are derived from the same polynomial, so --pn_poly 0 still auto-selects the MLS for either mode.


Containers

--sample_type (the datatype) is orthogonal to --file_type (the container) and --endian (byte order).

--file_type Output Notes
raw interleaved I/Q in the chosen --sample_type the SDR default; honors --endian
csv one I,Q line per sample %0.9f cf32, %0.17g cf64, %d integer; text, no endian
blue X-Midas / REDHAWK BLUE type-1000 (wfmgen only) self-describing 512-byte header
sigmf <base>.sigmf-data + <base>.sigmf-meta (wfmgen only) one annotation per segment

BLUE type-1000 writes a complete 512-byte X-Midas/REDHAWK Header Control Block so one file is fully self-describing: data_rep--endian, format (CB/CI/CL/CF/CD)←--sample_type, and xdelta = 1/fs. Add --detached to split it into a header + data pair — <out>.hdr (the HCB, with detached=1 and data_start=0) and <out>.det (the raw samples). Detached output requires --output and a finite (non---continuous) run; attached mode keeps whatever extension you give -o (.blue/.prm/.tmp).

SigMF writes the samples as raw into <base>.sigmf-data and a JSON sidecar <base>.sigmf-meta with core:datatype/core:sample_rate, a capture at --fc, and one annotation per composer segment (frequency edges, label, wfmgen:* params).

# 16-bit big-endian into a self-describing BLUE file
wfmgen --type qpsk --count 200000 --sample_type ci16 --endian be \
       --file_type blue -o capture.blue

# a SigMF pair (capture.sigmf-data + capture.sigmf-meta)
wfmgen --from-file scenario.json --sample_type ci16 --file_type sigmf -o capture

Sinks

--output Result
(omitted) binary stream to stdout (pipe it)
file.iq write to a file
zmq://tcp://*:5555 (wfmgen only) publish to a ZMQ PUB endpoint (SIGS wire format)
wfmgen --type tone --count 1000000 | other-tool          # pipe via stdout
wfmgen  --type tone --continuous --output zmq://tcp://*:5555   # stream forever to ZMQ

A dp_sub_* subscriber (e.g. examples/c/spectrum_analyzer) reads the ZMQ stream.


Real-time pacing

By default wfmgen emits as fast as the CPU allows — fs is only metadata (the BLUE xdelta, the ZMQ header). Add --realtime to throttle the output to fs, so blocks leave on an epoch + n/fs schedule — mimicking a hardware sample clock feeding the sink. This is what you want when a downstream consumer expects samples to arrive at the real rate (a live spectrum display, an SDR playback emulation):

# Stream QPSK to a live receiver at the true 1 MS/s, not as fast as possible
wfmgen --type qpsk --fs 1e6 --sps 8 --continuous --realtime \
       --output zmq://tcp://*:5555

The schedule is drift-free: each deadline is recomputed from the cumulative sample count against a fixed epoch, so sleep jitter never accumulates — the long-run rate is exactly fs. Pacing does not alter the samples; a file written with and without --realtime is byte-identical.

If the producer can't keep up (a block takes longer than its N/fs period — an underrun), wfmgen keeps the absolute timeline and prints a summary to stderr at exit (wfmgen: 3 underrun(s) — worst 1.2 ms behind real time). Use --realtime-resync instead to re-anchor the clock to "now" on each underrun, staying near real time going forward at the cost of an inserted gap.

Software pacing is average-rate, not sample-accurate

On a non-realtime OS you get a drift-free average rate with bounded per-block jitter, never true sample-clock fidelity. Keep blocks large enough that the period N/fs comfortably exceeds scheduler jitter, and let the consumer's buffer absorb the rest. The same engine is in Python as SampleClock.


Multi-segment specs

wfmgen --from-file SPEC.json sequences segments — each a waveform plus an optional trailing off-time — and can repeat or run forever. The schema is wfmgen-1:

{
  "version": "wfmgen-1",
  "repeat": false,
  "continuous": false,
  "segments": [
    { "type": "tone", "fs": 1e6, "freq": 1e5, "snr": 100.0,
      "num_samples": 10000, "off_samples": 5000 },
    { "type": "qpsk", "fs": 1e6, "snr": 9.0, "snr_mode": "esno",
      "sps": 8, "num_samples": 40000, "off_samples": 0 }
  ]
}

type and snr_mode are strings; every other field is numeric and falls back to the engine default if omitted. num_samples is the on-time; off_samples is a trailing gap of zeros. repeat loops the sequence; continuous never finishes (for streaming).

wfmgen --from-file scenario.json -o scenario.cf32

Rather than write the schema from memory, dump a ready-to-edit example with wfmgen json-template and edit it down:

wfmgen json-template > scenario.json   # or: wfmgen json-template scenario.json
# …edit scenario.json…
wfmgen --from-file scenario.json -o scenario.cf32

The template is a representative spec — an inline tone, an RRC-shaped QPSK-from-bits burst with a trailing gap, and a two-source sum mix — that is valid by construction: it round-trips through --from-file unchanged, so it doubles as a working starting point, not just documentation. With no path (or -) it prints to stdout; pass a path to write the file directly.


Mixing sources (sum) and sequencing them (add)

A segment can hold several sources mixed at the same time — a signal of interest plus interferers plus a noise floor — instead of just one. The two composition verbs are orthogonal:

  • .sum() mixes sources over the same span (one receiver → one sample rate, one shared noise floor). SNR lives on a source; the floor is resolved once, in C, so the Python, JSON, and wfmgen --from-file faces are byte-identical.
  • .add() sequences segments in time, back-to-back — the multi-segment timeline above, built fluently.
flowchart LR
    subgraph SEG["Segment — .sum() mixes at the SAME time, one noise floor"]
        direction TB
        y1["Synth qpsk · level −12 dBFS"]
        y2["Synth tone · interferer"]
        y3["Synth noise · the floor"]
    end
    subgraph TL["Timeline — .add() sequences in TIME ▶"]
        direction LR
        sA["Segment A"] --> sB["Segment B<br/>(+ trailing gap)"] --> sC["…"]
    end
    SEG -- ".add(B, …)" --> sA
    TL --> COMP["Composer(…).compose()"] --> IQ[("complex64 I/Q")]

    classDef syn fill:#ede7f6,stroke:#5e35b1,color:#000;
    classDef seg fill:#e3f2fd,stroke:#1565c0,color:#000;
    class y1,y2,y3 syn;
    class sA,sB,sC seg;
from doppler.wfm import Composer, Segment, qpsk, tone, noise

# A scene: a −12 dB QPSK SoI at +50 kHz over a CW interferer, at 15 dB Es/No.
scene = Segment.sum(
    qpsk(snr=15, snr_mode="esno", level=-12),  # the anchor sets the floor
    tone(freq=5e4),                             # an interferer (level 0 dBFS)
    num_samples=65536,
)

# Sequence a clean preamble tone, then the scene:
timeline = Segment("tone", freq=1e5, num_samples=2000, off_samples=500).add(scene)
iq = Composer(timeline).compose()

Rules of the floor (resolved per segment): an explicit noise(level=N) source fixes it at N dBFS; otherwise the first source carrying snr is the anchor and the floor is level(anchor) − SNR_fs(anchor). Other sources place themselves with level (a plain dBFS offset); giving a non-anchor both snr and level is a spec error. A single-source segment keeps its bundled AWGN untouched, so it is byte-identical to the pre-composition path.

In the wfmgen-1 JSON schema, a mixed segment replaces the inline source fields with a sum array (each entry is a source; fs/num_samples/off_samples stay on the segment):

{ "fs": 1e6, "num_samples": 65536, "off_samples": 0,
  "sum": [
    { "type": "qpsk", "snr": 15.0, "snr_mode": "esno", "sps": 8, "level": -12.0 },
    { "type": "tone", "freq": 5e4 }
  ] }

--record writes the resolved sum (the cleaned anchor plus an explicit noise source at the floor), so a recorded scene reproduces exactly.


Reproducible runs (--record)

--record run.json writes the fully-resolved spec — every value after defaulting (the auto-selected MLS polynomial, the resolved SNR mode, a summed segment's cleaned anchor + explicit noise floor) and the --headroom. Feed it straight back with --from-file and you get a byte-identical stream:

wfmgen --type bpsk --count 50000 --sps 4 --headroom 6 --record run.json -o a.iq
wfmgen --from-file run.json -o b.iq      # a.iq and b.iq are identical

The recorded --headroom is reapplied on replay; an explicit --headroom on the --from-file run overrides it. Use --record to document a capture next to its data, or to pin an exact scenario in a test.


The three faces of wfmgen

wfmgen is generated by just-makeit in three faces that accept the same flags and produce byte-identical output:

wfmgen --type qpsk --count 4096 -o out.iq            # 1. installed C-backed console script
python -m doppler.wfm.cli --type qpsk --count 4096 # 1b. same, as a module
python wfmgen.py --type qpsk --count 4096            # 2. PEP 723 script (uv run wfmgen.py)
./build/wfmgen --type qpsk --count 4096              # 3. standalone C binary

Python API

The engine is also a Python class — ideal for notebooks and pipelines.

import numpy as np
from doppler.wfm import Synth

# Every flag is a keyword argument; the same defaults apply.
synth = Synth(type="qpsk", fs=1e6, snr=12.0, snr_mode="esno", sps=8, seed=1)

block = np.asarray(synth.steps(4096))   # → complex64 ndarray
one   = synth.step()                     # → a single complex64 sample
synth.reset()                            # restart the sequence (keeps config)

Also exported from doppler.wfm:

Symbol Use
Synth(type=…, …) the waveform engine (above)
PN(poly, seed, length) a raw LFSR / PN sequence object
bpsk_map(bits) / qpsk_map(syms) map bits/symbol-indices → cf32 constellation points
wfm_awgn_amplitude(snr_db, signal_power) AWGN amplitude for a target SNR over fs
wfm_ebno_to_snr_db(ebno_db, bits_per_symbol, samples_per_symbol) Eb/No → over-fs SNR
# A matched-filter QPSK constellation (the receiver view):
sym = np.asarray(Synth(type="qpsk", snr=20, snr_mode="esno", sps=8).steps(8*600))
pts = sym.reshape(-1, 8).mean(axis=1)    # boxcar matched filter per symbol

Reading a capture back

The raw container is interleaved I/Q in the chosen --sample_type, so a naive np.fromfile gets the layout (and, for integers, the scale) wrong. read_iq does the right thing — a zero-copy complex view for the float types, a SIMD rescale to ±1.0 for the integer types — or pass raw=True for the raw (N, 2) on-disk view:

from doppler.wfm.readback import read_iq

iq = read_iq("capture.iq", sample_type="ci16")   # → complex64, ±1.0
iq = read_iq("capture.iq", sample_type="cf32")   # → complex64, zero-copy

generate → read_iq is bit-faithful. See Type System → Reading interleaved I/Q.


Recipes

# A clean tone at +100 kHz, 1 Msample, 16-bit I/Q to a file
wfmgen --type tone --freq 1e5 --count 1000000 --sample_type ci16 -o tone.ci16

# Noisy BPSK at 6 dB Eb/No, as CSV for quick inspection
wfmgen --type bpsk --snr 6 --snr_mode ebno --count 2000 --file_type csv -o bpsk.csv

# A scenario: tone burst, gap, then QPSK — recorded for reproducibility
wfmgen --from-file scenario.json --record run.json --file_type blue -o scene.blue

# Stream continuous QPSK to ZMQ for a live receiver
wfmgen --type qpsk --snr 10 --continuous --output zmq://tcp://*:5555

See also