Skip to main content

Crate aozora_encoding

Crate aozora_encoding 

Source
Expand description

Encoding utilities for Aozora Bunko source material.

The aozora parser itself is strictly UTF-8. Anything that decodes Shift_JIS or resolves gaiji (外字) mappings lives here, so the parser stays free of encoding concerns and the same logic is available to CLI, editor integrations, or downstream tools.

Modules§

gaiji
Gaiji (外字) resolution — mapping ※[#…、mencode] references to real Unicode characters.
jisx0213_table 🔒
PHF tables (single, combo, description) emitted by build.rs at compile time via phf_codegen. Lives in OUT_DIR so it’s regenerated automatically when any input TSV changes; the committed source tree carries only the data, not the perfect- hash output. See build.rs for the generator.

Enums§

DecodeError
Errors surfaced by the decode pipeline.

Functions§

decode_sjis
Decode a Shift_JIS byte slice into UTF-8 (NFC normalisation is applied by the caller after decoding).
decode_sjis_into
Decode a Shift_JIS byte slice into the caller-owned dst buffer.
has_utf8_bom
Whether the byte slice carries a UTF-8 BOM (EF BB BF).