8 releases
Uses old Rust 2015
0.2.3 | Feb 21, 2015 |
---|---|
0.2.2 | Feb 20, 2015 |
0.1.3 | Feb 7, 2015 |
0.1.2 | Jan 29, 2015 |
#2672 in Algorithms
27 downloads per month
18KB
263 lines
MaybeUtf8 0.2.3
Byte container optionally encoded as UTF-8. It is intended as a byte sequence type with uncertain character encoding, while the caller might be able to determine the actual encoding.
For example, ZIP file format
originally didn't support UTF-8 file names,
assuming the archive would be extracted only in the system
with the same system encoding as the original system.
The newer ZIP standard supports explicitly UTF-8-encoded file names though.
In this case, the ZIP library may want to return either a String
or Vec<u8>
depending on the UTF-8 flag.
This crate supports two types,
MaybeUtf8Buf
(analogous to String
) and MaybeUtf8Slice
(analogous to &str
).
Both types support various conversion methods.
For example, if you know that the bytes are encoded in ISO 8859-2,
Encoding can be used to convert them:
use std::borrow::IntoCow;
use encoding::{Encoding, DecoderTrap};
use encoding::all::ISO_8859_2;
use maybe_utf8::{MaybeUtf8Buf, MaybeUtf8Slice};
let namebuf = MaybeUtf8Buf::from_bytes(vec![99,97,102,233]);
assert_eq!(format!("{}", namebuf), "caf\u{fffd}");
// borrowed slice equally works
{
let nameslice: MaybeUtf8Slice = namebuf.to_slice();
assert_eq!(format!("{:?}", nameslice), r#"b"caf\xe9""#);
assert_eq!(nameslice.map_as_cow(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
"caf\u{e9}");
}
// consuming an optionally-UTF-8-encoded buffer also works
assert_eq!(namebuf.map_into_str(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
"caf\u{e9}");
IntoMaybeUtf8
trait can be used to uniformly accept either string or vector
to construct MaybeUtf8*
values.
use maybe_utf8::IntoMaybeUtf8;
assert_eq!("caf\u{e9}".into_maybe_utf8(), b"caf\xc3\xa9".into_maybe_utf8());
Complete Documentation is available.
MaybeUtf8 is written by Kang Seonghoon and licensed under the MIT/X11 license.