Expand description
Shared primitives for both crate::post_process (HTML splicer)
and crate::ir (IR builder).
Both downstream consumers walk the same sentinel-position stream
produced by aozora-pipeline. They differ only in their emit
target (string buffer vs. typed tree), not in how they sequence
the registry. This module owns the sequencing primitives so the
two walkers stay in lockstep automatically.
Design notes:
- The fast
is_sentinel_charcheck is a single subtract-and-compare on the codepoint (ch as u32 - 0xE001 < 4). Hotter than thematches!chain it replaces because every paragraph-text walk touches this predicate per char. flatten_registry_in_source_ordermaterialises the registry into aVec<NodeRef>keyed by source-order traversal, since both walkers consume entries linearly. TheEytzingerMapupstream is binary-search-friendly but we never look up by position at HTML rewrite time β the order alone is sufficient.sole_block_sentinelwalks a&strof paragraph-inner text without allocating;paragraph_sole_block_sentinelwalks a comrak paragraph node directly with the same semantics, also allocation-free.
StructsΒ§
- Sentinel
Cursor π - Cursor over a flat sentinel-ordered slice of [
NodeRef].
EnumsΒ§
- Block
Sentinel πKind - Which paired sentinel a block-sentinel paragraph carries.
- Control
Flow π - Result of a Text-node walk over a comrak block.
FunctionsΒ§
- flatten_
registry_ πin_ source_ order - Walk
lex_out.normalizedbyte-by-byte; for every PUA sentinel, query the registry and append the resulting [NodeRef] to a freshly-allocatedVecin source order. - for_
each_ πtext_ descendant - Visit every
Textdescendant ofnodeleft-to-right, callingvisiton each&strslice. Unlikewalk_text_only_descendants, this descends into emphasis / strong / link / code subtrees and ignores their wrappers β we only care about the leaf text. Used to count sentinels and peek the registry for paragraph-level dispatch. - is_
sentinel_ πchar - True iff
chis one of the four PUA sentinel codepointsU+E001..=U+E004. - paragraph_
sole_ πblock_ sentinel - Allocation-free analogue of
sole_block_sentinelthat walks a comrak paragraph node directly. - sole_
block_ πsentinel - If
innerconsists of exactly one block-sentinel character (optionally surrounded by ASCII whitespace), return its kind. Inline sentinels never qualify. - visit_
text_ πinner - walk_
text_ πonly_ descendants - Visit every
Textdescendant ofnodeleft-to-right. Returnstrueiff the entire subtree consists ofTextleaves only (no Strong / Emph / Link / Code / etc. nodes), AND the visitor closure asked to continue at every step. Used to validate the βsole block sentinel paragraphβ invariant cheaply.