pub fn decompose_fragment(fragment: &str) -> Cow<'_, str>Expand description
Decompose Aozora accent digraphs anywhere inside fragment.
Call this on the body of a 〔...〕 span only; the transform is
restricted to that convention so English text (isn't, text,, word's)
doesn’t false-match legitimate spec entries (n'=ń, t,=ţ, and friends).
Guarantees:
- Returns
Cow::Borrowed(fragment)when no accent marker byte appears (zero alloc on the common Japanese-only case). - Greedy longest-match: ligatures (3-byte, e.g.
ae&= æ) beat the 2-byte digraphs that share a prefix (a&= å would otherwise apply). - Byte length of the output can be up to 3 bytes per 2-byte digraph for the
few entries that land in U+1Exx (
m'= ḿ,e~= ẽ). Most entries shrink (3-byte ligature → 2-byte UTF-8). The invariant we do hold: the result is always a valid UTF-8 string.
The implementation is linear in fragment.len(): we walk the byte stream
left-to-right, peek <= 3 bytes at a time, and commit the longest match
that’s in the table.