Testing strategy
aozora targets C1 100% branch coverage as a goal — but coverage is the floor, not the ceiling. Every invariant is asserted from multiple angles so a single missed test path doesn’t silently hide a regression.
The five test layers
flowchart TD
A["1. Spec cases<br/>(spec/aozora/cases/*.json)"]
B["2. Property tests<br/>(crates/*/tests/property_*.rs)"]
C["3. Corpus sweep<br/>(every Aozora Bunko work)"]
D["4. Fuzz harness<br/>(cargo-fuzz)"]
E["5. Sanitizers<br/>(Miri / TSan / ASan)"]
A --> B --> C --> D --> E
Each layer catches a different kind of bug:
| Layer | Catches |
|---|---|
| Spec cases | Per-feature contract regressions (the (input, html, canonical) triple). |
| Property tests | Invariant violations in the space of inputs (round-trip, escape-safety, span well-formedness). |
| Corpus sweep | Real-world distribution effects the property generator missed. |
| Fuzz | Latent panics on adversarial inputs the corpus doesn’t contain. |
| Sanitizers | UB / data race / heap-corruption issues the language can’t catch. |
When you add a new invariant, land all five touchpoints in the same PR, or split them into a chain of PRs that explicitly references the invariant.
Layer 1: spec cases
spec/aozora/cases/
├── ruby-nested-gaiji.json
├── emphasis-bouten.json
├── emphasis-double-ruby.json
├── kunten-kaeriten.json
├── page-break.json
└── …
Each case pins a (input, html, serialise) triple:
{
"input": "|青梅《おうめ》",
"html": "<ruby>青梅<rt>おうめ</rt></ruby>",
"serialise": "|青梅《おうめ》"
}
The unit test runner (cargo nextest run -p aozora-render) loads
every case, parses, renders, serialises, and compares against the
pinned strings. The property harness also uses these cases as
seed inputs for shrinking.
The flagship in-tree fixture lives at
spec/aozora/fixtures/56656/ — the Japanese translation of Crime
and Punishment (Aozora Bunko card 56656). It exercises 1000+ ruby
annotations, forward-reference bouten, JIS X 0213 gaiji, and
accent decomposition edge cases.
Layer 2: property tests
proptest generators in
crates/aozora-proptest drive parse / render / round-trip
invariants. Default 128 cases per proptest! block (CI budget);
just prop-deep runs 4096 per block (release-cut budget).
just prop # 128 cases
just prop-deep # 4096 cases
AOZORA_PROPTEST_CASES=10000 cargo nextest run --workspace --test 'property_*'
Why proptest over quickcheck:
- Proptest’s shrinker is structural (reduces by the generator’s ops), so a counterexample collapses to a minimal reproduction that still fails. Quickcheck shrinks per-type, which produces noisier outputs.
- Proptest persists failure seeds to
proptest-regressions/— every reproduced failure becomes a permanent regression test. Quickcheck has nothing like this.
Why a separate generator crate (aozora-proptest):
The generators are non-trivial (they have to produce valid 青空文庫 source — random byte streams would just stress the parser’s error path, which the fuzz harness already covers). Centralising them means every property test in every crate gets the same generator quality, and the generator itself can be unit-tested.
Layer 3: corpus sweep
export AOZORA_CORPUS_ROOT=$HOME/aozora-corpus
just corpus-sweep
Walks every .txt under $AOZORA_CORPUS_ROOT, parses, verifies
the round-trip property holds, no panics. ~17 000 works in active
rotation; ~90 s sweep on a modern x86_64 desktop using the parallel
loader.
The sweep catches what the property generator can’t — every weird real-world idiom the maintained corpus has accumulated over 25 years of volunteer encoding choices. It’s the parser’s truth-from-the-field.
See Performance → Corpus sweeps for the corpus structure, archive format, and parallel loader details.
Layer 4: fuzz
just fuzz parse_render -- -runs=10000
Targets under crates/*/fuzz/fuzz_targets/:
parse_render— feed arbitrary bytes throughDocument::new ∘ to_html.serialize_roundtrip—parse ∘ serialize ∘ parsestability.sjis_decode—aozora_encoding::sjis::decode_to_stringon arbitrary byte streams.
Fuzz failures auto-shrink to a minimal byte sequence and land in
crates/<crate>/fuzz/artifacts/. Add the failing input to
spec/aozora/cases/ as a regression case after diagnosing.
Why libFuzzer / cargo-fuzz:
Mainstream Rust fuzzing runs on libFuzzer via cargo-fuzz; it has
the broadest crate-ecosystem support (most upstream crates ship
fuzz targets), the corpus-management tooling is mature, and the
crash artefacts are diff-able with git diff.
Layer 5: sanitizers
bash scripts/sanitizers.sh miri # UB on FFI / scan intrinsics
bash scripts/sanitizers.sh tsan # data races (parallel corpus loader)
bash scripts/sanitizers.sh asan # heap correctness
Sanitizer runs are slower (~10× under Miri) so they don’t run on every PR — they’re nightly via the dev-image cron in CI, plus release-cut. The slow path catches the slow-class of bugs.
Why all three:
- Miri catches undefined behaviour the compiler couldn’t see (out-of- bounds slice access, dangling references, transmute mismatches). The FFI driver and the SIMD scanner have unsafe surfaces; Miri is the only fully-checked oracle for them.
- TSan catches race conditions in the parallel corpus loader. We
use
rayoncorrectly as far as we know, but TSan is the backstop. - ASan catches the small set of heap-correctness bugs that get through Miri (typically C-side issues in the FFI smoke test).
Coverage measurement
just coverage # cargo llvm-cov branch coverage; CI gate
just coverage-html # local HTML report at coverage/html/index.html
just coverage-branch # nightly toolchain, branch-coverage detail
cargo llvm-cov over tarpaulin: tarpaulin is x86_64-linux
only and uses ptrace-based instrumentation that misses some
optimised-out branches. llvm-cov uses LLVM’s source-based
coverage instrumentation — works on every target and gives accurate
branch numbers.
The CI gate is region coverage; branch coverage is informational (it requires the nightly compiler, which the workspace doesn’t pin on the hot path).
Test naming and structure
- Unit tests in
mod tests {}at the bottom of each module. - Integration tests in
crates/<crate>/tests/. One file per area (e.g.tests/lexer_phase0.rs,tests/lexer_phase3.rs). - Property tests prefixed
property_(theproprecipe globs on this). - Doc tests inside
```rustblocks in rustdoc comments. CI runsjust test-docseparately because nextest skips them.
Snapshot testing
Where the output is a multi-line string that’s tedious to inline
(rendered HTML, diagnostic-formatted text), we use
insta:
insta::assert_snapshot!(tree.to_html());
The first run writes tests/snapshots/<test>.snap; subsequent runs
compare against it. Updates happen via cargo insta review (the
interactive UI inside the dev container), never by manually editing
the .snap file.
See also
- Development loop —
just testand friends. - Performance → Corpus sweeps — how the corpus layer 3 works in practice.