Expand description
Encoding utilities for Aozora Bunko source material.
The aozora parser itself is strictly UTF-8. Anything that decodes Shift_JIS or
resolves gaiji (外字) mappings lives here, so the parser stays free of encoding
concerns and the same logic is available to CLI, editor integrations, or
downstream tools.
Modules§
- gaiji
- Gaiji (外字) resolution — mapping
※[#…、mencode]references to real Unicode characters. - jisx0213_
table 🔒 - PHF tables (single, combo, description) emitted by
build.rsat compile time viaphf_codegen. Lives inOUT_DIRso it’s regenerated automatically when any input TSV changes; the committed source tree carries only the data, not the perfect- hash output. Seebuild.rsfor the generator.
Enums§
- Decode
Error - Errors surfaced by the decode pipeline.
Functions§
- decode_
sjis - Decode a
Shift_JISbyte slice into UTF-8 (NFC normalisation is applied by the caller after decoding). - decode_
sjis_ into - Decode a
Shift_JISbyte slice into the caller-owneddstbuffer. - has_
utf8_ bom - Whether the byte slice carries a UTF-8 BOM (
EF BB BF).