#string #utf-16 #byte-slice #nt #wide #windows

no-std nt-string

Idiomatic Rust implementations for various Windows string types

2 releases

0.1.1 Jun 13, 2023
0.1.0 May 31, 2023

#56 in Windows APIs

Download history 2081/week @ 2024-06-17 1112/week @ 2024-06-24 1822/week @ 2024-07-01 1831/week @ 2024-07-08 1591/week @ 2024-07-15 2442/week @ 2024-07-22 3267/week @ 2024-07-29 3017/week @ 2024-08-05 2041/week @ 2024-08-12 2447/week @ 2024-08-19 2496/week @ 2024-08-26 2615/week @ 2024-09-02 2911/week @ 2024-09-09 2013/week @ 2024-09-16 2834/week @ 2024-09-23 1354/week @ 2024-09-30

9,159 downloads per month
Used in 7 crates (3 directly)

MIT/Apache

82KB
1K SLoC

nt-string

crates.io docs.rs license: MIT OR Apache-2.0

by Colin Finck <colin@reactos.org>

Provides idiomatic Rust implementations for various Windows string types:

Other useful UTF-16 string types are already provided by the excellent widestring crate. This crate tries to integrate as best as possible with them.

In Action


Debugging a Rust application in WinDbg and using the dS command to display a UNICODE_STRING created in Rust. As you see, this crate's NtUnicodeString and the original UNICODE_STRING are fully compatible.

Details

The UNICODE_STRING type was designed for the C programming language, which only knows about NUL-terminated buffers of characters. To determine the length of such a buffer, you need to iterate over all characters until finding the NUL. Bad enough? It gets worse: A classic buffer overflow occurs if the buffer contains no NUL, but an algorithm attempts to find it anyway.

To overcome these performance and security hazards, UNICODE_STRINGs consist of a buffer, a buffer capacity ("maximum length"), and a field to indicate the actually used length. Determining length and capacity is now as simple as querying the corresponding fields. Length and capacity are 16-bit values and expressed in bytes.

UNICODE_STRING has been widely used by the Windows kernel team and also spilled over to some user-mode APIs. This crate makes UNICODE_STRING a first-class Rust citizen. Safety is achieved via the following measures:

  • UNICODE_STRING is split into 3 Rust types to handle references, mutable references, and owned strings separately. You should never need to call an unsafe method.
  • All methods are fallible (except for allocations and traits like Add, where Rust currently does not provide fallible alternatives).
  • The internal buffer is NUL-terminated whenever possible. While not required according to the specification, this defensive approach guards against external applications that never understood UNICODE_STRING and mistakenly treat its internal buffer as a NUL-terminated string.

Additionally, this crate provides the U16StrLe type. With UTF-16 being the ubiquitous character encoding in Windows, many on-disk strings are stored in UTF-16 Little-Endian. U16StrLe allows to perform basic operations on byte slices of such strings without converting them to another string type first. One user is the ntfs crate.

Examples

You can work with these string types just like you work with other Rust string types:

let mut string = NtUnicodeString::try_from("Hello! ").unwrap();
string.try_push_str("Moin!").unwrap();
println!("{string}");

Conversions are also supported from raw u16 string buffers as well as the U16CStr and U16Str types of the widestring crate:

let abc = NtUnicodeString::try_from_u16(&[b'A' as u16, b'B' as u16, b'C' as u16]).unwrap();
let de = NtUnicodeString::try_from_u16_until_nul(&[b'D' as u16, b'E' as u16, 0]).unwrap();
let fgh = NtUnicodeString::try_from(u16cstr!("FGH")).unwrap();
let ijk = NtUnicodeString::try_from(u16str!("IJK")).unwrap();

Just like a String automatically dereferences to a &str when you pass it to an appropriate function, you can do the same with an NtUnicodeString and it will dereference to an &NtUnicodeStr:

let string = NtUnicodeString::try_from("My String").unwrap();
subfunction(&string);

fn subfunction(str_ref: &NtUnicodeStr) {
    println!("Hello from subfunction with \"{str_ref}\".");
}

Constant UNICODE_STRINGs can be created at compile-time. This provides strings with a 'static lifetime and saves a UTF-16 conversion at runtime:

const MY_CONSTANT_STRING: NtUnicodeStr<'static> = nt_unicode_str!("My Constant String");

Finally, you most likely want to pass your NtUnicodeStr, NtUnicodeStrMut or NtUnicodeString to an FFI function that expects a pointer to a UNICODE_STRING. Use the as_ptr or as_mut_ptr methods to get an immutable or mutable pointer.

no_std support

The crate is no_std-compatible and therefore usable in all contexts.

However, the heap-allocating NtUnicodeString struct is only available via the alloc feature (enabled by default). If you want to use the crate in a pure no_std environment without heap allocations, include it with default-features = false to disable the default alloc feature.

License

This crate is licensed under either of

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~0.8–1.2MB
~23K SLoC