Skip to main content

Module post_process

Module post_process 

Source
Expand description

HTML post-processing: splice Aozora sentinels into rendered comrak HTML.

The afm pipeline runs comrak verbatim against the lexer’s normalized text. Comrak emits ordinary <p>...</p> paragraphs for the lines the lexer planted with PUA sentinels (U+E001..U+E004 are not in CommonMark’s HTML escape set, so they survive format_html verbatim). This module rewrites that HTML so each sentinel becomes its real Aozora HTML, while plain comrak output passes through unchanged.

§Sentinel taxonomy

SentinelSource shapecomrak emitsWe rewrite to
INLINE (U+E001)inline |...《》text inside a paragraphaozora_render::render_node::render of the node
BLOCK_LEAF (U+E002)leaf annotation<p>U+E002</p>render_node output (no surrounding <p>)
BLOCK_OPEN (U+E003)container start<p>U+E003</p>render_node open-pass output
BLOCK_CLOSE (U+E004)container end<p>U+E004</p>render_node close-pass output

§Paragraph-aware splice

Two cases beyond the sentinel-substitution above are handled per paragraph:

  • Heading promotion — a paragraph carrying a HeadingHint inline sentinel ([#「X」は大見出し]) becomes <h{level}>{target}</h{level}>. Other Aozora sentinels in the same paragraph are consumed for registry lockstep but their HTML is dropped, since the heading body is the hint’s target field.
  • Stack-balanced container close — a BlockClose paragraph without a matching open is silently discarded so we don’t emit orphan </div> tags. This protects the Tier-D tag-balance invariant against pathological inputs.

§Order-based dispatch

aozora_pipeline writes sentinels into normalized in source order, and the registry tables are sorted by byte position by construction. comrak preserves text order across <p>...</p> boundaries, so the order we encounter sentinels in the rendered HTML matches the order of the corresponding registry entries. We therefore pre-flatten the registry into an ordered Vec<NodeRef<'_>> keyed by source position and dispatch sequentially. No byte-position lookup is needed at HTML-rewrite time.

Structs§

SpliceState 🔒
StringSink 🔒
fmt::Write adapter over &mut String.

Functions§

balance_inline_tags_in_paragraphs 🔒
Per-paragraph inline-tag balancer.
consume_inline_sentinels 🔒
Consume every inline-sentinel registry entry that the paragraph covers. Used after a heading-hint rewrite to keep the dispatcher in lockstep without emitting any of the in-paragraph nodes.
heading_hint_in_paragraph 🔒
Peek the inline sentinels in this paragraph against the registry. If the first inline sentinel is a HeadingHint, return it.
process_paragraph 🔒
push_html_escaped 🔒
rebrand_aozora_classes_to_afm 🔒
Rewrite every aozora-* class token in class="..." attribute values to afm-*. Touches only class attributes — the brand on data-* attributes, on link targets, on text bodies, etc. is preserved verbatim.
render_node_into 🔒
splice_aozora_html 🔒
Splice every Aozora sentinel in comrak_html into its real HTML rendering, using the registry inside lex_out.
splice_inline_pass 🔒
splice_into 🔒
wrap_orphan_brackets_in_place 🔒
Find every [#…] in html that lives outside an HTML tag and outside an existing afm-annotation wrapper, and wrap it in a hidden <span class="afm-annotation" hidden>…</span>. The class name matches aozora-render’s annotation wrapper so test_support::strip_annotation_wrappers continues to recognise it, and the pass is idempotent: a second invocation finds the afm-annotation substring in the prefix and skips re-wrapping.