Skip to main content

Module sentinels

Module sentinels 

Source
Expand description

Shared primitives for both crate::post_process (HTML splicer) and crate::ir (IR builder).

Both downstream consumers walk the same sentinel-position stream produced by aozora-pipeline. They differ only in their emit target (string buffer vs. typed tree), not in how they sequence the registry. This module owns the sequencing primitives so the two walkers stay in lockstep automatically.

Design notes:

  • The fast is_sentinel_char check is a single subtract-and-compare on the codepoint (ch as u32 - 0xE001 < 4). Hotter than the matches! chain it replaces because every paragraph-text walk touches this predicate per char.
  • flatten_registry_in_source_order materialises the registry into a Vec<NodeRef> keyed by source-order traversal, since both walkers consume entries linearly. The EytzingerMap upstream is binary-search-friendly but we never look up by position at HTML rewrite time β€” the order alone is sufficient.
  • sole_block_sentinel walks a &str of paragraph-inner text without allocating; paragraph_sole_block_sentinel walks a comrak paragraph node directly with the same semantics, also allocation-free.

StructsΒ§

SentinelCursor πŸ”’
Cursor over a flat sentinel-ordered slice of [NodeRef].

EnumsΒ§

BlockSentinelKind πŸ”’
Which paired sentinel a block-sentinel paragraph carries.
ControlFlow πŸ”’
Result of a Text-node walk over a comrak block.

FunctionsΒ§

flatten_registry_in_source_order πŸ”’
Walk lex_out.normalized byte-by-byte; for every PUA sentinel, query the registry and append the resulting [NodeRef] to a freshly-allocated Vec in source order.
for_each_text_descendant πŸ”’
Visit every Text descendant of node left-to-right, calling visit on each &str slice. Unlike walk_text_only_descendants, this descends into emphasis / strong / link / code subtrees and ignores their wrappers β€” we only care about the leaf text. Used to count sentinels and peek the registry for paragraph-level dispatch.
is_sentinel_char πŸ”’
True iff ch is one of the four PUA sentinel codepoints U+E001..=U+E004.
paragraph_sole_block_sentinel πŸ”’
Allocation-free analogue of sole_block_sentinel that walks a comrak paragraph node directly.
sole_block_sentinel πŸ”’
If inner consists of exactly one block-sentinel character (optionally surrounded by ASCII whitespace), return its kind. Inline sentinels never qualify.
visit_text_inner πŸ”’
walk_text_only_descendants πŸ”’
Visit every Text descendant of node left-to-right. Returns true iff the entire subtree consists of Text leaves only (no Strong / Emph / Link / Code / etc. nodes), AND the visitor closure asked to continue at every step. Used to validate the β€œsole block sentinel paragraph” invariant cheaply.