#byte-array #bom #unicode #order #utf

unicode-bom

Unicode byte-order mark detection for files and byte arrays

13 releases (stable)

2.0.3 Nov 13, 2023
2.0.2 Feb 11, 2023
1.1.4 Feb 4, 2019
1.1.3 Oct 21, 2018
0.1.2 Oct 12, 2018

#113 in Encoding

Download history 151077/week @ 2024-12-02 151871/week @ 2024-12-09 144892/week @ 2024-12-16 97146/week @ 2024-12-23 105984/week @ 2024-12-30 154218/week @ 2025-01-06 154603/week @ 2025-01-13 151082/week @ 2025-01-20 149577/week @ 2025-01-27 166374/week @ 2025-02-03 183743/week @ 2025-02-10 183418/week @ 2025-02-17 192671/week @ 2025-02-24 197334/week @ 2025-03-03 228581/week @ 2025-03-10 190633/week @ 2025-03-17

823,470 downloads per month
Used in 324 crates (9 directly)

Apache-2.0

23KB
369 lines

unicode-bom

Build status Crate status Downloads License

Unicode byte-order mark detection for Rust projects.

What does it do?

unicode-bom will read the first few bytes from an array or a file on disk, then determine whether a byte-order mark is present.

What doesn't it do?

It won't check the rest of the data to determine whether it's actually valid according to the indicated encoding.

How do I install it?

Add it to your dependencies in Cargo.toml:

[dependencies]
unicode-bom = "2"

How do I use it?

For more detailed information see the API docs, but the general gist is as follows:

use unicode_bom::Bom;

// The BOM can be parsed from a file on disk via the `FromStr` trait...
let bom: Bom = "foo.txt".parse().unwrap();
match bom {
    Bom::Null => {
        // No BOM was detected
    }
    Bom::Bocu1 => {
        // BOCU-1 BOM was detected
    }
    Bom::Gb18030 => {
        // GB 18030 BOM was detected
    }
    Bom::Scsu => {
        // SCSU BOM was detected
    }
    Bom::UtfEbcdic => {
        // UTF-EBCDIC BOM was detected
    }
    Bom::Utf1 => {
        // UTF-1 BOM was detected
    }
    Bom::Utf7 => {
        // UTF-7 BOM was detected
    }
    Bom::Utf8 => {
        // UTF-8 BOM was detected
    }
    Bom::Utf16Be => {
        // UTF-16 (big-endian) BOM was detected
    }
    Bom::Utf16Le => {
        // UTF-16 (little-endian) BOM was detected
    }
    Bom::Utf32Be => {
        // UTF-32 (big-endian) BOM was detected
    }
    Bom::Utf32Le => {
        // UTF-32 (little-endian) BOM was detected
    }
}

// ...or you can detect the BOM in a byte array
let bytes = [0u8, 0u8, 0xfeu8, 0xffu8];
let bom = Bom::from(&bytes[0..]);
assert_eq!(bom, Bom::Utf32Be);
assert_eq(bom.len(), 4);

How do I set up the build environment?

If you don't already have Rust installed, get that first using rustup:

curl https://sh.rustup.rs -sSf | sh

Then you can build the project:

cargo b

And run the tests:

cargo t

Is there API documentation?

Yes.

Is there a change log?

Yes.

What license is it published under?

Apache-2.0.

No runtime deps