1 unstable release

0.2.0 Jun 30, 2022

#35 in #charset

Download history 18/week @ 2024-03-16 27/week @ 2024-03-23 41/week @ 2024-03-30 96/week @ 2024-04-06 73/week @ 2024-04-13 53/week @ 2024-04-20 35/week @ 2024-04-27 135/week @ 2024-05-04 110/week @ 2024-05-11 43/week @ 2024-05-18 116/week @ 2024-05-25 86/week @ 2024-06-01 58/week @ 2024-06-08 39/week @ 2024-06-15 37/week @ 2024-06-22 6/week @ 2024-06-29

150 downloads per month
Used in encoding-next

MIT license

20KB
266 lines

  • Interface to the character encoding.
  • Raw incremental interface

  • Methods which name starts with raw_ constitute the raw incremental interface,
  • the lowest-available API for encoders and decoders.
  • This interface divides the entire input to four parts:
    • Processed bytes do not affect the future result.
    • Unprocessed bytes may affect the future result
  • and can be a part of problematic sequence according to the future input.
    • Problematic byte is the first byte that causes an error condition.
    • Remaining bytes are not yet processed nor read,
  • so the caller should feed any remaining bytes again.
  • The following figure illustrates an example of successive raw_feed calls:
  • 1st raw_feed :2nd raw_feed :3rd raw_feed
  • ----------+----:---------------:--+--+---------
  •       |    :               :  |  |
    
  • ----------+----:---------------:--+--+---------
  • processed unprocessed | remaining
  •                           problematic
    
  • Since these parts can span the multiple input sequences to raw_feed,
  • raw_feed returns two offsets (one optional)
  • with that the caller can track the problematic sequence.
  • The first offset (the first usize in the tuple) points to the first unprocessed bytes,
  • or is zero when unprocessed bytes have started before the current call.
  • (The first unprocessed byte can also be at offset 0,
  • which doesn't make a difference for the caller.)
  • The second offset (upto field in the CodecError struct), if any,
  • points to the first remaining bytes.
  • If the caller needs to recover the error via the problematic sequence,
  • then the caller starts to save the unprocessed bytes when the first offset < the input length,
  • appends any new unprocessed bytes while the first offset is zero,
  • and discards unprocessed bytes when first offset becomes non-zero
  • while saving new unprocessed bytes when the first offset < the input length.
  • Then the caller checks for the error condition
  • and can use the saved unprocessed bytes for error recovery.
  • Alternatively, if the caller only wants to replace the problematic sequence
  • with a fixed string (like U+FFFD),
  • then it can just discard the first sequence and can emit the fixed string on an error.
  • It still has to feed the input bytes starting at the second offset again.

No runtime deps