#escaping #ansi #unicode #grapheme #unicode-text #source-string

print-positions

A library providing string segmentation on grapheme clusters and ANSI escape sequences for accurate length arithmetic based on visible print positions

5 releases

0.6.1 Feb 20, 2023
0.6.0 Feb 19, 2023
0.5.2 Feb 18, 2023
0.5.1 Feb 18, 2023
0.5.0 Feb 17, 2023

#1533 in Text processing

Download history 1899/week @ 2024-07-19 1832/week @ 2024-07-26 1264/week @ 2024-08-02 1418/week @ 2024-08-09 1733/week @ 2024-08-16 1811/week @ 2024-08-23 2170/week @ 2024-08-30 1975/week @ 2024-09-06 1771/week @ 2024-09-13 1998/week @ 2024-09-20 1844/week @ 2024-09-27 1750/week @ 2024-10-04 2027/week @ 2024-10-11 2420/week @ 2024-10-18 1792/week @ 2024-10-25 1952/week @ 2024-11-01

8,484 downloads per month
Used in 15 crates (via nu-command)

MIT/Apache

25KB
347 lines

Build and test

Crate print_positions

Iterators which return the slice of characters making up a "print position", rather than the individual characters of a source string.

The print_positions and print_position_indices functions provide iterators which return "print positions".

A print position is a generalization of a UAX#29 extended grapheme cluster. Like the grapheme, it occupies one "character" when rendered on the screen.
However, it may also contain ANSI escape codes which affect color or intensity rendering as well.

Example:

use print_positions::print_positions;

// content is e with dieresis, displayed in green with a color reset at the end.  
// Looks like 1 character on the screen.  See example "padding" to print one out.
let content = ["\u{1b}[30;42m", "\u{0065}", "\u{0308}", "\u{1b}[0m"].join("");

let print_positions:Vec<_> = print_positions(&content).collect();
assert_eq!(content.len(), 15);          // content is 15 chars long
assert_eq!(print_positions.len(), 1);   // but only 1 print position

Rationale:

When laying out a fixed-width screen application, it is useful to know how many visible columns a piece of content will consume. But the number of bytes or characters in the content is generally larger, inflated by UTF8 encoding, Unicode combining characters and zero-width joiners and, for ANSI compatible devices and applications, by control codes and escape sequences which specify text color and emphasis.

The print_position iterators account for these factors and simplify the arithmetic: the number of columns the content will consume on the screen is the number of print position slices returned by the iterator.

Known Issues:

  • No accounting for cursor motion
    ANSI control characters and sequences are all assumed to consume no space on the screen.
    This is arguably a bug in the case of backspace, tab, newline, CUP, CUU, CUD and several more. PRs or simple suggestions for improvement are welcome!

Dependencies

~355KB