EPUB via Pandoc

Problem. You want an EPUB (or LaTeX/PDF, DOCX, ODT, …) out of an Aozora Bunko source — anything the built-in HTML renderer does not produce.

Solution

The aozora binary projects a parsed document into the Pandoc AST as JSON. Pipe that JSON into pandoc, and every Pandoc output format is one writer away. For EPUB:

aozora pandoc input.txt | pandoc -f json -t epub3 -o out.epub

That is the whole recipe. The same pipe reaches the other formats by swapping the -t writer:

aozora pandoc input.txt | pandoc -f json -t latex  -o out.tex
aozora pandoc input.txt | pandoc -f json -t docx   -o out.docx
aozora pandoc input.txt | pandoc -f json -t html   > out.html

Shift_JIS source decodes with -E sjis, exactly as for render / check:

aozora pandoc -E sjis crime.txt | pandoc -f json -t epub3 -o crime.epub

aozora pandoc also has a --format / -t shorthand that runs the pipe for you when pandoc is on PATH:

aozora pandoc input.txt -t epub > out.epub

out.epub — a valid EPUB 3 container. Each Aozora construct lifts to a Pandoc Span / Div carrying a stable CSS class (aozora-ruby, aozora-bouten, …), so you can style or filter them per format. The projection-rules table lists every variant’s mapping.

Why a Pandoc projection at all

Aozora notation has rich semantic markup (ruby, bouten, 縦中横, gaiji) that no single Pandoc native construct captures. Emitting raw HTML would only survive the HTML writer; every other format would strip it. Lifting each variant to a classed Span/Div instead means the same JSON renders sensibly across every Pandoc format today and stays open to richer format-native rendering via filters tomorrow. Adding a new output format is a Pandoc filter, never a parser change.

aozora — 青空文庫記法 Parser Handbook

EPUB via Pandoc

Solution

Expected output

Why a Pandoc projection at all

See also

Keyboard shortcuts

aozora — 青空文庫記法 Parser Handbook

EPUB via Pandoc

Solution

Expected output

Why a Pandoc projection at all

See also