14 releases (6 stable)

Uses old Rust 2015

2.0.4 Dec 7, 2017
2.0.3 Nov 28, 2017
2.0.0 Mar 18, 2017
2.0.0-pre.1 Oct 30, 2016
0.0.6 Nov 21, 2014

#45 in #character-encoding

41 downloads per month

Unlicense

670KB
8K SLoC

C++ 7K SLoC // 0.3% comments Python 1K SLoC // 0.7% comments Rust 134 SLoC // 0.1% comments C 71 SLoC // 0.4% comments Shell 21 SLoC

Latest version License Build Status Build status Documentation

Deprecated in favor of chardet, which is pure Rust. If you have use-case for this code, please feel free to open an issue. Simple PRs will still be read, and possibly accepted.

Attempts to detect the character encoding of raw text using the uchardet library.

Example:

// At the top of the file.
extern crate uchardet;
use uchardet::detect_encoding_name;

// Inside a function.
assert_eq!("UTF-8",
           detect_encoding_name(""©français"".as_bytes()).unwrap());

If you also would also like to detect the language used in the decoded text, see rust-cld2.

Getting uchardet (usually optional)

If you wish, you may install uchardet using your system package manager. For example, under Ubuntu, you can run:

sudo apt-get install libuchardet-dev

If you skip this step, Cargo will attempt to compile uchardet from the bundled source code instead. This should work if you have an appropriate g++ (or MSVC) compiler installed, as well as cmake. We test this build on Linux, OS X and Windows (both MinGW and MSVC) using Travis CI.

Contributing

As always, pull requests are welcome! Please keep any patches as simple as possible and include unit tests; that makes it much easier for me to merge them.

If you want to get the C/C++ code building on another platform, please see uchardef-sys/build.rb and this build script guide. You'll probably need to adjust some compiler options. Please don't hesitate to ask questions; I'd love for this library to support more platforms.

In your first commit message, please include the following statement:

I dedicate any and all copyright interest in my contributions to this project to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

This allows us to keep the library legally unencumbered, and free for everyone to use.

Contributors include:

  • Boris-Chengbiao Zhou. Support for newer upstream uchardet libraries and Microsoft Windows, plus the error-chain conversion.
  • Wesley Moore. Fixes for Rust beta 2 and OS X.

Thank you very much for your contributions!

License

New code in the rust-uchardet library is released into the public domain, as described in the UNLICENSE file. However, several pre-existing pieces have their own licenses:

  • The uchardet C++ library include in uchardet-sys/uchardet via a git submodule is distributed under the Mozilla Public License 1.1.

Dependencies

~2.5–3.5MB
~74K SLoC