#posix #locale #character #iconv #transliteration #text #os

text-transliterate

simple utility for transliterate texts using the SO iconv from POSIX

7 releases (stable)

2.0.0 Feb 16, 2020
1.1.3 Feb 9, 2020
1.0.0 Feb 8, 2020
0.1.0 Oct 29, 2017

#827 in Text processing

Apache-2.0/MIT

44KB
1K SLoC

text-transliterate-rust

Prove of concept (AKA, Not ready for production) for text transliteration. It uses the OS iconv with //TRANSLITERATE//IGNORE for transforming character between locales.

The locale must be available in the OS.

(you can see the tests for more results):

		let tt = TextTransliterate::new();
		let result = tt.transliterate("ü  ä  ö  ß  Ü  Ä  Ö ç ñ 的 😒", "de_DE.UTF-8");
		if let Ok(result) = result {
			assert_eq!("ue  ae  oe  ss  UE  AE  OE c n ? ?", result);
		}

For using the correct locale (for example, to transliterate German letter correctly) it must use the function uselocale from C. This changes the locale of the thread. For avoiding you can use the "off-thread" version, that creates a new thread for executing the uselocale and iconv

        let mut tt = TextTransliterateOffThread::new();
        let result = tt.transliterate("ü  ä  ö  ß  Ü  Ä  Ö ç ñ 的 😒", "de_DE.UTF-8");
        if let Ok(result) = result {
            assert_eq!("ue  ae  oe  ss  UE  AE  OE c n ? ?", result);
        }

Notes

  1. The test results can change between machines. Keep in mind that.
  2. The code depends of GNU libc
  3. There is a unsafe code that can create problems:

License

Apache-2.0/MIT

Dependencies

~5–14MB
~173K SLoC