28 stable releases
new 3.2.8 | Oct 29, 2024 |
---|---|
3.2.3 | Jul 1, 2024 |
3.1.2 | Mar 30, 2024 |
2.0.2 | Feb 11, 2024 |
1.4.1 | Jul 16, 2022 |
#152 in Text processing
808 downloads per month
Used in 3 crates
90KB
2.5K
SLoC
decancer
A library that removes common unicode confusables/homoglyphs from strings.
- Its core is written in Rust and utilizes a form of Binary Search to ensure speed!
- By default, it's capable of filtering 221,529 (19.88%) different unicode codepoints like:
- All whitespace characters
- All diacritics, this also eliminates all forms of Zalgo text
- Most leetspeak characters
- Most homoglyphs
- Several emojis
- Unlike other packages, this package is unicode bidi-aware where it also interprets right-to-left characters in the same way as it were to be rendered by an application!
- Its behavior is also highly customizable to your liking!
Installation
In your Cargo.toml
:
decancer = "3.2.8"
Examples
For more information, please read the documentation.
let mut cured = decancer::cure!(r"vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣 wWiIiIIttHh l133t5p3/-\|<").unwrap();
assert_eq!(cured, "very funny text with leetspeak");
// WARNING: it's NOT recommended to coerce this output to a Rust string
// and process it manually from there, as decancer has its own
// custom comparison measures, including leetspeak matching!
assert_ne!(cured.as_str(), "very funny text with leetspeak");
assert!(cured.contains("funny"));
cured.censor("funny", '*');
assert_eq!(cured, "very ***** text with leetspeak");
cured.censor_multiple(["very", "text"], '-');
assert_eq!(cured, "---- ***** ---- with leetspeak");
Donations
If you want to support my eyes for manually looking at thousands of unicode characters, consider donating! ❤
Dependencies
~0–0.8MB
~14K SLoC