Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Pandoc integration

The aozora-pandoc crate (workspace-internal, available via the aozora CLI) projects a parsed Aozora document into the Pandoc AST. Once you have Pandoc JSON, every Pandoc output format (HTML, EPUB, LaTeX/PDF, DOCX, ODT, MediaWiki, …) is one shell pipe away.

This is the recommended path if you want to convert Aozora Bunko notation into anything other than the built-in HTML renderer. Adding a new output format means adding a Pandoc filter (or none, if the default Span/Div mapping is enough), not extending the parser crate.

Quickstart

# Pandoc JSON to stdout
aozora pandoc input.txt > out.json

# Or pipe through pandoc directly
aozora pandoc input.txt | pandoc -f json -t html
aozora pandoc input.txt | pandoc -f json -t epub3 -o out.epub

# `--format` is shorthand for the pipe (requires pandoc on PATH)
aozora pandoc input.txt --format html > out.html
aozora pandoc -E sjis legacy.txt -t epub > out.epub

Projection rules

Each AozoraNode variant lifts to a Pandoc construct carrying a stable CSS class so downstream filters or stylesheets can specialise the rendering:

Aozora variantPandoc constructClass on the construct
RubySpanaozora-ruby
↳ base textnested Spanaozora-ruby-base
↳ reading textnested Spanaozora-ruby-reading
BoutenSpan over target textaozora-bouten
TateChuYokoSpanaozora-tate-chu-yoko
GaijiSpan carrying mencodeaozora-gaiji
Indent, AlignEndempty Span (marker)aozora-indent / align-end
WarichuSpan with two childrenaozora-warichu
DoubleRubySpanaozora-double-ruby
Annotation, Kaeriten, HeadingHintempty Span carrying rawaozora-annotation / etc.
PageBreakHorizontalRule block(n/a — semantic block)
SectionBreakempty Divaozora-section-break
AozoraHeadingHeader blockaozora-heading
SashiePara with Imageaozora-sashie
Container (字下げ等)Div wrapping inner blocksaozora-container-indent / etc.

The structural attribute kvs (Pandoc’s third Attr tuple) carries non-textual metadata (bouten kind / position, gaiji description / mencode, indent amount, container kind). Filters that want format-native rendering pattern-match on the class + kvs.

Why a Pandoc projection at all

Aozora notation has rich semantic markup (ruby, bouten, tate-chu-yoko, gaiji…) that no single Pandoc native construct captures. The naive shortcut of emitting RawInline("html", "<ruby>…</ruby>") would only work for the HTML writer; every other Pandoc output format would strip the raw HTML and lose the meaning.

By lifting each Aozora variant to a Span / Div with a stable class, the same JSON renders sensibly across every Pandoc format today (each format’s writer renders Span as a stylable container) and stays open for richer format-native rendering tomorrow via filters. That’s the same pattern Pandoc itself uses for [content]{.smallcaps} — semantic in the AST, format-specific in the writer.

Architecture

The library entry point is aozora_pandoc::to_pandoc:

use aozora::Document;
use aozora_pandoc::to_pandoc;

let doc = Document::new(std::fs::read_to_string("input.txt")?);
let pandoc = to_pandoc(&doc.parse());
let json = serde_json::to_string(&pandoc)?;

aozora-cli wires that into aozora pandoc so binary consumers don’t need to write Rust.