Extract ruby pairs
Problem. You want every ruby annotation in a document as
(base, reading) string pairs — to build a furigana glossary, audit
readings, or feed a dictionary.
Solution
Walk source_nodes() (see Walk the AST), keep only the
Ruby nodes, and read each node’s base and reading. Both are
NonEmpty<Content>; call .get() to get the Content, then
.as_plain() for the common case where the text carries no nested
constructs.
use aozora::{Document, AozoraNode, NodeRef};
fn main() {
let source = "|青梅《おうめ》街道を|逢《お》う";
let doc = Document::new(source);
let tree = doc.parse();
for sn in tree.source_nodes() {
// Ruby is always an inline construct.
if let NodeRef::Inline(AozoraNode::Ruby(ruby)) = sn.node {
// `base` / `reading` are NonEmpty<Content>; `.get()` is the
// Content, `.as_plain()` its text when there are no nested nodes.
let base = ruby.base.get().as_plain().unwrap_or("<mixed>");
let reading = ruby.reading.get().as_plain().unwrap_or("<mixed>");
println!("{base}\t{reading}");
}
}
}
Expected output
青梅 おうめ
逢 お
Notes
-
Why
NonEmpty. The parser only emits aRubynode once both base and reading have content, so the fields areNonEmpty<Content>— an empty side is unrepresentable, and you never have to guard against it..get()unwraps to the innerContent. -
The
<mixed>arm.Content::as_plain()returnsNonewhen the run carries nested constructs (a gaiji reference or annotation inside the base, for instance). That is rare for readings but does happen for bases. To flatten those too, iterate the segments instead of bailing (Segmentlives under thesyntaxmodule since it is not in the top-level re-export set):use aozora::syntax::borrowed::Segment; fn text_of(content: aozora::Content<'_>) -> String { let mut out = String::new(); for seg in content.iter() { if let Segment::Text(s) = seg { out.push_str(s); } // Segment::Gaiji / Segment::Annotation carry non-plain payloads; // handle them here if your glossary needs them. } out }Content::iter()yields aSegmentper logical run; thePlaincase yields exactly oneTextsegment, so the loop is uniform. -
delim_explicit.ruby.delim_explicitrecords whether the source used the explicit|base delimiter. It does not affect the base/reading text — see the Ruby node chapter for why both source forms classify identically.
See also
- Runnable example:
just example walk_ast(crates/aozora/examples/walk_ast.rs) shows the full node walk this recipe narrows. - Walk the AST — the general traversal.
- Ruby node reference — the
Rubystruct, the two source forms, and the rendered HTML. - Ruby notation — the
|青梅《おうめ》syntax itself.