Skip to main content

Module encoding

Module encoding 

Source
Expand description

Re-export of [aozora_encoding] — Shift_JIS decoding and gaiji resolution.

Phase 0 of the lex pipeline runs encoding detection first; callers that want to drive encoding without parsing can reach through this module.

Modules§

gaiji
Gaiji (外字) resolution — mapping ※[#…、mencode] references to real Unicode characters.
suijun
Reverse JIS X 0208 / 0213 水準 (level) classification — issue #89.

Enums§

DecodeError
Errors surfaced by the decode pipeline.
Suijun
JIS X 0208 / 0213 水準 (level) of a character.

Functions§

decode_auto
Decode Aozora source bytes to UTF-8, detecting the encoding.
decode_auto_into
Encoding-agnostic counterpart to decode_sjis_into: append the decoded UTF-8 to dst, detecting the source encoding.
decode_sjis
Decode a Shift_JIS byte slice into UTF-8 (NFC normalisation is applied by the caller after decoding).
decode_sjis_into
Decode a Shift_JIS byte slice into the caller-owned dst buffer.
has_utf8_bom
Whether the byte slice carries a UTF-8 BOM (EF BB BF).
is_platform_dependent
Whether a character is a 機種依存文字: encodable in CP932 (Windows-31J / Shift_JIS) as a double-byte character but outside JIS X 0208.
jis_level
Classify a character’s JIS X 0208 / 0213 水準.
level_table_sizes
(第1+第2水準 cells, 第3水準 cells, 第4水準 cells) baked into the reverse table. Surfaced for the gatekeeper size pins (see tests/gatekeeper.rs); not part of the everyday classifier surface.