Expand description
Re-export of [aozora_encoding] — Shift_JIS decoding and gaiji
resolution.
Phase 0 of the lex pipeline runs encoding detection first; callers that want to drive encoding without parsing can reach through this module.
Modules§
- gaiji
- Gaiji (外字) resolution — mapping
※[#…、mencode]references to real Unicode characters. - suijun
- Reverse JIS X 0208 / 0213 水準 (level) classification — issue #89.
Enums§
- Decode
Error - Errors surfaced by the decode pipeline.
- Suijun
- JIS X 0208 / 0213 水準 (level) of a character.
Functions§
- decode_
auto - Decode Aozora source bytes to UTF-8, detecting the encoding.
- decode_
auto_ into - Encoding-agnostic counterpart to
decode_sjis_into: append the decoded UTF-8 todst, detecting the source encoding. - decode_
sjis - Decode a
Shift_JISbyte slice into UTF-8 (NFC normalisation is applied by the caller after decoding). - decode_
sjis_ into - Decode a
Shift_JISbyte slice into the caller-owneddstbuffer. - has_
utf8_ bom - Whether the byte slice carries a UTF-8 BOM (
EF BB BF). - is_
platform_ dependent - Whether a character is a 機種依存文字: encodable in CP932 (Windows-31J / Shift_JIS) as a double-byte character but outside JIS X 0208.
- jis_
level - Classify a character’s JIS X 0208 / 0213 水準.
- level_
table_ sizes (第1+第2水準 cells, 第3水準 cells, 第4水準 cells)baked into the reverse table. Surfaced for the gatekeeper size pins (seetests/gatekeeper.rs); not part of the everyday classifier surface.