#ansi-codes #width #unicode-characters #string #escaping #terminal #unicode-width

ansi-width

Calculate the width of a string when printed to the terminal

1 unstable release

0.1.0 Feb 18, 2024

#1212 in Text processing

Download history 4130/week @ 2024-06-13 4207/week @ 2024-06-20 6412/week @ 2024-06-27 5595/week @ 2024-07-04 3327/week @ 2024-07-11 4396/week @ 2024-07-18 5380/week @ 2024-07-25 4062/week @ 2024-08-01 4634/week @ 2024-08-08 2998/week @ 2024-08-15 3597/week @ 2024-08-22 2962/week @ 2024-08-29 5116/week @ 2024-09-05 7295/week @ 2024-09-12 5047/week @ 2024-09-19 4341/week @ 2024-09-26

22,219 downloads per month
Used in 7 crates (4 directly)

MIT license

9KB
69 lines

Crates.io Discord License dependency status

CodeCov

ANSI width

Measure the width of a string when printed to the terminal

For ASCII, this is identical to the length of the string in bytes. However, there are 2 special cases:

  • Many unicode characters (CJK, emoji, etc.) span multiple columns.
  • ANSI escape codes should be ignored.

The first case is handled by the unicode-width crate. This function extends that crate by ignoring ANSI escape codes.

Limitations

  • We cannot know the width of a TAB character in the terminal emulator.
  • Backspace is also treated as zero width.

A Primer on ANSI escape codes (and how this crate works)

ANSI codes are created using special character sequences in a string. These sequences start with the ESC character: '\x1b', followed by some other character to determine the type of the escape code. That second character determines how long the sequence continues:

  • ESC [: until a character in the range '\x40'..='\x7E' is found.
  • ESC ]: until an ST is found.

An ST is a String Terminator and is given by the sequence ESC \ (or in Rust syntax '\x1b\x5c').

This is the subset of sequences that this library supports, since these are used by most applications that need this functionality. If you have a use case for other codes, please open an issue on the GitHub repository.

ansi-width does not parse the actual ANSI codes to improve performance, it can only skip the ANSI codes.

Examples

use ansi_width::ansi_width;

// ASCII string
assert_eq!(ansi_width("123456"), 6);

// Accents
assert_eq!(ansi_width("café"), 4);

// Emoji (2 crab emoji)
assert_eq!(ansi_width("🦀🦀"), 4);

// CJK characters (“Nǐ hǎo” or “Hello” in Chinese)
assert_eq!(ansi_width("你好"), 4);

// ANSI colors
assert_eq!(ansi_width("\u{1b}[31mRed\u{1b}[0m"), 3);

// ANSI hyperlink
assert_eq!(
    ansi_width("\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\"),
    14
);

Alternatives

  • str::len: Returns only the length in bytes and therefore only works for ASCII characters.
  • unicode-width: Does not take ANSI characters into account by design (see this issue). This might be what you want if you don't care about ANSI codes. unicode-width is used internally by this crate as well.
  • textwrap::core::display_width: Very similar functionality to this crate and it also supports hyperlinks since version 0.16.1. The advantage of this crate is that it does not require pulling in the rest of textwrap's functionality (even though that functionality is excellent if you need it).
  • console::measure_text_width: Similar to textwrap and very well-tested. However, it constructs a new string internally without ANSI codes first and then measures the width of that. The parsing is more robust than this crate though.

References

The information above is based on:

Dependencies

~1.5MB
~18K SLoC