19 releases
0.3.1 | Oct 15, 2024 |
---|---|
0.2.1 | Feb 18, 2024 |
0.1.0 | Mar 29, 2023 |
0.0.15 | Oct 2, 2021 |
0.0.10 | Jul 27, 2020 |
#772 in Text processing
220 downloads per month
Used in scnr
5.5MB
99K
SLoC
Seshat πππ
A Unicode Library for Rust.
Introduction
Seshat (pronounce as Sehs-hat) is a Unicode library that written in Rust. It provides many of Unicode character data and standard algorithms. The goal of this project is to provide a ICU-like library in Rust.
Version
Seshat follows the latest version of Unicode. Currently using version 16.0.0.
Usage
[dependencies]
seshat-unicode = "0.3.1"
use seshat::unicode::Ucd;
fn main() {
println!("π¦ is {}!", 'π¦'.na());
}
Check the Unicode Version
use seshat::unicode::UNICODE_VERSION;
fn main() {
println!("{}", UNICODE_VERSION.to_string());
}
Features
Grapheme cluster break
use seshat::unicode::Segmentation;
fn main() {
let s = "Hi, π¨πΎβπ€βπ¨πΏ";
for seg in s.break_graphemes() {
println!("{}", seg);
}
}
This will prints
$ cargo run
H
i
,
π¨πΎβπ€βπ¨πΏ
Normalization
use seshat::unicode::Normalization;
fn main() {
let s1 = "Γ
";
println!("{:?}", s1.to_nfd()); // Will prints "A\u{30a}"
let s2 = "γ";
println!("{}", s2.to_nfkd()); // Will prints γ’γγγΌγ
let s3 = "e\u{0301}";
println!("{}", s3.to_nfc()); // Will prints Γ©
let s4 = "アイウエ。";
assert_eq!("γ’γ€γ¦γ¨γͺ", s4.to_nfkc());
}
Properties
use seshat::unicode::Ucd;
fn main() {
let c = 'Ν΄'; // U+0374 GREEK NUMERAL SIGN
assert_eq!(c.xids(), true); // XID_Start property of the character.
}
For enumeration property,
use seshat::unicode::Ucd;
use seshat::unicode::props::Gc;
fn main() {
assert_eq!('A'.gc(), Gc::Lu);
assert_eq!('a'.gc(), Gc::Ll);
}
Patches
0.2.1 - Exclude the tools/
directory which should not be included when
publishing.
Contribute
Add later.
License
All logo images have copyright owned by their creators and should not be used out of this project without permission.
The drawing part (writing goddess) by Frybits Inc..
Seshat is developed under MIT License. For the detail, see the LICENSE file.