1 unstable release

0.2.0 Jun 30, 2022

#35 in #charset

Download history 51/week @ 2024-11-16 47/week @ 2024-11-23 67/week @ 2024-11-30 65/week @ 2024-12-07 40/week @ 2024-12-14 10/week @ 2024-12-21 10/week @ 2024-12-28 35/week @ 2025-01-04 50/week @ 2025-01-11 42/week @ 2025-01-18 101/week @ 2025-01-25 36/week @ 2025-02-01 56/week @ 2025-02-08 83/week @ 2025-02-15 48/week @ 2025-02-22 93/week @ 2025-03-01

287 downloads per month
Used in encoding-next

MIT license

20KB
266 lines

  • Interface to the character encoding.
  • Raw incremental interface

  • Methods which name starts with raw_ constitute the raw incremental interface,
  • the lowest-available API for encoders and decoders.
  • This interface divides the entire input to four parts:
    • Processed bytes do not affect the future result.
    • Unprocessed bytes may affect the future result
  • and can be a part of problematic sequence according to the future input.
    • Problematic byte is the first byte that causes an error condition.
    • Remaining bytes are not yet processed nor read,
  • so the caller should feed any remaining bytes again.
  • The following figure illustrates an example of successive raw_feed calls:
  • 1st raw_feed :2nd raw_feed :3rd raw_feed
  • ----------+----:---------------:--+--+---------
  •       |    :               :  |  |
    
  • ----------+----:---------------:--+--+---------
  • processed unprocessed | remaining
  •                           problematic
    
  • Since these parts can span the multiple input sequences to raw_feed,
  • raw_feed returns two offsets (one optional)
  • with that the caller can track the problematic sequence.
  • The first offset (the first usize in the tuple) points to the first unprocessed bytes,
  • or is zero when unprocessed bytes have started before the current call.
  • (The first unprocessed byte can also be at offset 0,
  • which doesn't make a difference for the caller.)
  • The second offset (upto field in the CodecError struct), if any,
  • points to the first remaining bytes.
  • If the caller needs to recover the error via the problematic sequence,
  • then the caller starts to save the unprocessed bytes when the first offset < the input length,
  • appends any new unprocessed bytes while the first offset is zero,
  • and discards unprocessed bytes when first offset becomes non-zero
  • while saving new unprocessed bytes when the first offset < the input length.
  • Then the caller checks for the error condition
  • and can use the saved unprocessed bytes for error recovery.
  • Alternatively, if the caller only wants to replace the problematic sequence
  • with a fixed string (like U+FFFD),
  • then it can just discard the first sequence and can emit the fixed string on an error.
  • It still has to feed the input bytes starting at the second offset again.

No runtime deps