4 releases
0.1.3 | Dec 25, 2022 |
---|---|
0.1.2 | Dec 23, 2022 |
0.1.1 | Dec 13, 2022 |
0.1.0 | Dec 12, 2022 |
#1059 in Text processing
66KB
822 lines
Package codes-iana-charset
This package contains an implementation of the IANA CHARSET registry.
These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. These names are expressed in ANSI_X3.4-1968 which is commonly called US-ASCII or simply ASCII. The character set most commonly use in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged.
The character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters.
The MIBenum value is a unique value for use in MIBs to identify coded character sets.
The value space for MIBenum values has been divided into three regions. The first region (3-999) consists of coded character sets that have been standardized by some standard setting organization. This region is intended for standards that do not have subset implementations. The second region (1000-1999) is for the Unicode and ISO/IEC 10646 coded character sets together with a specification of a (set of) sub-repertoires that may occur. The third region (>1999) is intended for vendor specific coded character sets.
For notes on the design of the API, see the repository README.
Example
use codes_iana_charset as charset;
let latin_1 = charset::CHARSET_4;
assert_eq!(latin_1.id(), 4);
assert_eq!(latin_1.name(), "ISO_8859-1:1987");
assert_eq!(
latin_1.source(),
"[ISO-IR: International Register of Escape Sequences] Note: The current registration authority is IPSJ/ITSCJ, Japan.",
);
assert_eq!(latin_1.preferred_alias(), Some("ISO-8859-1"));
assert_eq!(latin_1.aliases(), &[
"iso-ir-100",
"ISO_8859-1",
"ISO-8859-1",
"latin1",
"l1",
"IBM819",
"CP819",
"csISOLatin1"
]);
assert_eq!(latin_1.reference(), Some("[RFC1345][Keld_Simonsen]"));
Note that the implementation of FromStr
takes into account all aliases.
use codes_iana_charset as charset;
use std::str::FromStr;
let latin_1 = charset::CHARSET_4;
let iso_8859_1 = charset::CharacterSetCode::from_str("ISO_8859-1").unwrap();
assert_eq!(latin_1, iso_8859_1);
let some_charset = charset::CharacterSetCode::try_from(4).unwrap();
assert_eq!(some_charset, iso_8859_1);
Features
By default only the serde
feature is enabled.
serde
- Enables serialization of theCharacterSetCode
type.
Changes
Version 0.1.3
- Catering for new
build
module in codes-common
Version 0.1.2
- Implemented
Standardized
andFixedLengthCode
traits.
Version 0.1.1
- Fixed example bug
Version 0.1.0
- Initial release
TODO
TBD
Dependencies
~0.3–2.6MB
~42K SLoC