Expand description
HTML post-processing: splice Aozora sentinels into rendered comrak HTML.
The afm pipeline runs comrak verbatim against the lexer’s normalized
text. Comrak emits ordinary <p>...</p> paragraphs for the lines
the lexer planted with PUA sentinels (U+E001..U+E004 are not in
CommonMark’s HTML escape set, so they survive format_html verbatim).
This module rewrites that HTML so each sentinel becomes its real
Aozora HTML, while plain comrak output passes through unchanged.
§Sentinel taxonomy
| Sentinel | Source shape | comrak emits | We rewrite to |
|---|---|---|---|
INLINE (U+E001) | inline |...《》 | text inside a paragraph | aozora_render::render_node::render of the node |
BLOCK_LEAF (U+E002) | leaf annotation | <p>U+E002</p> | render_node output (no surrounding <p>) |
BLOCK_OPEN (U+E003) | container start | <p>U+E003</p> | render_node open-pass output |
BLOCK_CLOSE (U+E004) | container end | <p>U+E004</p> | render_node close-pass output |
§Paragraph-aware splice
Two cases beyond the sentinel-substitution above are handled per paragraph:
- Heading promotion — a paragraph carrying a
HeadingHintinline sentinel ([#「X」は大見出し]) becomes<h{level}>{target}</h{level}>. Other Aozora sentinels in the same paragraph are consumed for registry lockstep but their HTML is dropped, since the heading body is the hint’stargetfield. - Stack-balanced container close — a
BlockCloseparagraph without a matching open is silently discarded so we don’t emit orphan</div>tags. This protects the Tier-D tag-balance invariant against pathological inputs.
§Order-based dispatch
aozora_pipeline writes sentinels into normalized in source order,
and the registry tables are sorted by byte position by
construction. comrak preserves text order across <p>...</p>
boundaries, so the order we encounter sentinels in the rendered
HTML matches the order of the corresponding registry entries.
We therefore pre-flatten the registry into an ordered
Vec<NodeRef<'_>> keyed by source position and dispatch
sequentially. No byte-position lookup is needed at HTML-rewrite
time.
Structs§
- Splice
State 🔒 - String
Sink 🔒 fmt::Writeadapter over&mut String.
Functions§
- balance_
inline_ 🔒tags_ in_ paragraphs - Per-paragraph inline-tag balancer.
- consume_
inline_ 🔒sentinels - Consume every inline-sentinel registry entry that the paragraph covers. Used after a heading-hint rewrite to keep the dispatcher in lockstep without emitting any of the in-paragraph nodes.
- heading_
hint_ 🔒in_ paragraph - Peek the inline sentinels in this paragraph against the registry.
If the first inline sentinel is a
HeadingHint, return it. - process_
paragraph 🔒 - push_
html_ 🔒escaped - rebrand_
aozora_ 🔒classes_ to_ afm - Rewrite every
aozora-*class token inclass="..."attribute values toafm-*. Touches only class attributes — the brand ondata-*attributes, on link targets, on text bodies, etc. is preserved verbatim. - render_
node_ 🔒into - splice_
aozora_ 🔒html - Splice every Aozora sentinel in
comrak_htmlinto its real HTML rendering, using the registry insidelex_out. - splice_
inline_ 🔒pass - splice_
into 🔒 - wrap_
orphan_ 🔒brackets_ in_ place - Find every
[#…]inhtmlthat lives outside an HTML tag and outside an existingafm-annotationwrapper, and wrap it in a hidden<span class="afm-annotation" hidden>…</span>. The class name matchesaozora-render’s annotation wrapper sotest_support::strip_annotation_wrapperscontinues to recognise it, and the pass is idempotent: a second invocation finds theafm-annotationsubstring in the prefix and skips re-wrapping.