Expand description
aozora — the public meta crate.
Single front door for parsing Aozora Bunko notation. Downstream
consumers should depend on this crate alone; everything they need
is re-exported through this surface or accessed via Document
and AozoraTree.
use aozora::Document;
let source = std::fs::read_to_string("crime_and_punishment.txt").unwrap();
let doc = Document::new(source);
let tree = doc.parse();
let html = tree.to_html();
println!("{html}");Tunable parses go through the builder chain:
use aozora::{Document, DiagnosticPolicy};
let doc = Document::options()
.arena_capacity(64 * 1024)
.diagnostic_policy(DiagnosticPolicy::DropInternal)
.build("|青梅《おうめ》");
let tree = doc.parse();
assert!(!tree.serialize().is_empty());§Architecture
Document owns the source buffer plus a bumpalo-backed
arena. AozoraTree borrows from that arena via the &self
lifetime returned by Document::parse. Every per-node
allocation lives inside the arena, with the
Interner deduplicating
repeated string content; dropping the Document releases the
entire tree in a single Bump::reset step.
Internal build-block crates (aozora-spec, aozora-syntax,
aozora-pipeline, aozora-render, aozora-encoding) are
publish = false and reachable only through this meta crate’s
pipeline / syntax / render / encoding / wire
modules. Depend on aozora alone; see the
Architecture chapter of the handbook
for the layered design.
Modules§
- codes
- Stable identifier strings for known
Diagnosticvariants. - document 🔒
Document— single owning handle to a parsed Aozora source buffer, andAozoraTree<'a>— borrowed view a caller walks for output rendering.- encoding
- Re-export of
aozora_encoding— Shift_JIS decoding and gaiji resolution. - html
- Borrowed-AST HTML rendering.
- pipeline
- Re-export of [
aozora_pipeline] under a stable name. - render
- Re-export of [
aozora_render] — HTML / serialize emitters and the visitor trait. - serialize
- Borrowed-AST Aozora-source serializer.
- syntax
- Re-export of
aozora_syntax— AST node types, arena, interner. - wire
- Driver-shared wire format for serialising
aozoraparser output.
Structs§
- Align
End - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. - Annotation
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Generic annotation. - Aozora
Heading - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Aozora heading (窓見出し / 副見出し). - Aozora
Tree - Borrowed view into a parsed Aozora document.
- Borrowed
LexOutput - Borrowed-AST output of the lex pipeline.
- Bouten
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Emphasis dots / sidelines. - Document
- Single owning handle to a parsed Aozora source.
- Double
Ruby - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Double angle-bracket payload. - Gaiji
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Gaiji (out-of-character-range glyph). - Heading
Hint - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Forward-reference heading hint. - Indent
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. - Kaeriten
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Chinese-reading-order mark (返り点). - Normalized
Offset - Byte offset into the normalized text the lex pipeline emits for the downstream CommonMark parser.
- Pair
Link - Resolved open/close pair, as observed by Phase 2.
- Parse
Options - Builder for the
Document::parseentry point. - Ruby
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Ruby (furigana). - Sashie
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Illustration metadata. - Slug
Entry - One row of the slug catalogue.
- Source
Node - Source-keyed registry entry — pairs a source-byte span with the classified node landed there. Lives in the bumpalo arena.
- Source
Offset - Byte offset into the sanitized source text (Phase 0 output).
- Span
- Byte-range span. Both endpoints are guaranteed to fall on UTF-8 character boundaries when produced by the parser; callers can safely slice the source with them.
- Tate
ChuYoko - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Tate-chu-yoko (horizontal embedding). - Warichu
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Warichu (split annotation).
Enums§
- Annotation
Kind - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. - Aozora
Heading Kind - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. - Aozora
Node - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Every Aozora-specific AST node, in borrowed form. - Bouten
Kind - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. - Bouten
Position - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Which side of the vertical-writing base text the bouten marks sit on. - Container
Kind - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. The kinds of Aozora container blocks the lexer classifies. - Content
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Body content for nodes whose textual payload may carry nested Aozora constructs. - Diagnostic
- Observation emitted by any lexer phase.
- Diagnostic
Policy - Diagnostic policy applied at parse time.
- Diagnostic
Source - Origin of a
Diagnostic— distinguishes user-input issues from library-internal sanity-check failures. - Internal
Check Code - Identifier of a specific pipeline-internal sanity check.
- Node
Kind - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. Cross-cutting tag for an AST node orNodeRefprojection. - NodeRef
- Unified view over a registry hit, returned by
Registry::node_at. - Pair
Kind - Pair kind. The variants enumerate every balanced delimiter Aozora notation recognises.
- Section
Kind - Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. - Segment
- Borrowed-AST node types editor surfaces match against (LSP inlay
hints, hover, completion, code actions, semantic tokens).
Re-exported so external consumers don’t have to depend on
aozora-syntaxdirectly —aozorais the single editor-facing front door. One element of aContent::Segmentsrun. - Sentinel
- Sentinel kind tag.
- Severity
- Severity of a
Diagnostic. - Slug
Family - Family / coarse category a slug belongs to. Used by the LSP
completion UI to group entries (
CompletionItem::sort_text) and pick an appropriateCompletionItemKindicon. - Trigger
Kind - Classification of a single trigger character (or merged double).
Constants§
- ALL_
SENTINELS - All four sentinels in declaration order.
- BLOCK_
CLOSE_ SENTINEL - Paired-container close line (e.g.
[#ここで字下げ終わり]). - BLOCK_
LEAF_ SENTINEL - Block-leaf Aozora line (page break, section break, leaf indent, sashie).
- BLOCK_
OPEN_ SENTINEL - Paired-container open line (e.g.
[#ここから字下げ]). - INLINE_
SENTINEL - Inline Aozora span (ruby / bouten / annotation / gaiji / TCY / kaeriten).
- SLUGS
- Canonical slug catalogue. See module docs.
Functions§
- canonicalise_
slug - Snap an input slug body (with the surrounding
[# … ]already stripped) to the canonical form, if one is recognised. - lex_
into_ arena - Run the lex pipeline and collect the result into
arena.