Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Indent & align containers (字下げ)

Aozora Bunko uses paired [#ここから…] / [#ここで…終わり] brackets to delimit blocks of text with custom layout. The five families:

FamilyOpenerCloserEffect
字下げ (indent)[#ここから2字下げ][#ここで字下げ終わり]Indent every line by N full-width chars
地付き (right-flush)[#ここから地付き][#ここで地付き終わり]Flush right (vertical: 地 = ground = bottom)
地寄せ (right-align with margin)[#ここから2字下げ、地寄せ][#ここで字下げ終わり]Right-align with N-char inset
字詰め (line-length)[#ここから30字詰め][#ここで字詰め終わり]Force a line length of N chars
中央揃え[#ここから中央揃え][#ここで中央揃え終わり]Centre each line

aozora parses every variant; the HTML renderer maps them to a <div class="aozora-indent-N"> / aozora-align-end / etc. wrapper.

Single-line forms

Some directives apply only to the next single line and don’t need a closer:

 [#地付き]平和への誓い

Renders as:

<div class="aozora-align-end">平和への誓い</div>

AST shape

pub struct Container<'src> {
    pub kind:    ContainerKind,
    pub indent:  Option<u8>,      // 字 count for indent variants
    pub form:    ContainerForm,   // SingleLine | Block
    pub children: &'src [AozoraNode<'src>],
    pub span:    Span,
}

pub enum ContainerKind {
    Indent,
    AlignEnd,
    AlignEndWithIndent,
    LineLength,
    Centre,
    /// Composite: indent + align-end on a single block.
    Composite { indent: u8, align: ContainerAlign },
    /// Bouten / 縦中横 / 鎖線 / 罫囲み container forms.
    Emphasis(EmphasisKind),
    /// Spec-listed but not present in maintained corpus.
    Unknown,
}

Why a small flat enum?

ContainerKind is closed by spec. A flat enum (vs a trait object or string tag) gives the parser O(1) variant dispatch in the lexer’s classify phase and the renderer’s HTML walk, and lets clippy’s exhaustiveness check enforce that every variant has a render path.

The Composite variant is the one place we don’t extend the enum horizontally — composite indent+align combinations would explode the enum to ~30 variants, most of which never appear in real corpus. A nested struct with a sub-enum keeps the variant count finite while staying matchable.

large_enum_variant clippy lint: Container::Composite is the largest variant at 4 bytes; the others are ≤ 2 bytes. The variant data is tiny enough that boxing would add a pointer chase for no real layout win — see the [workspace.lints.clippy] large_enum_variant = "allow" carve-out in Cargo.toml.

Composition

Containers nest:

[#ここから2字下げ]
 通常の段落。
 [#ここから地付き]
  右寄せの行。
 [#ここで地付き終わり]
 通常に戻る。
[#ここで字下げ終わり]

Renders as nested divs:

<div class="aozora-indent-2">
通常の段落。
<div class="aozora-align-end">
右寄せの行。
</div>
通常に戻る。
</div>

Mismatched closers (e.g. [#ここから地付き][#ここで字下げ終わり]) fire diagnostic E0005 and the parser auto-closes the offending opener at the closer’s position.

Why containers, not stack-based push/pop tokens?

The spec describes these as opener / closer brackets, but the natural implementation in Rust is a recursive container node. That choice:

  • Lets the renderer walk the tree once with a single match on ContainerKind, instead of maintaining a render-time stack.
  • Surfaces shape errors (mismatched closers, dangling openers) at parse time — the lexer’s classify phase already has all the information to decide.
  • Makes the canonical-serialise pass trivial (each container prints its opener, walks its children, prints its closer).

The trade-off is one extra heap touch per container — a single bumpalo slice for children. The arena is already hot, so the cost is negligible (bumpalo returns aligned pointers in O(1) bumps).

See also