1 stable release

new 1.0.0 Nov 3, 2024

#391 in Text processing

Download history 97/week @ 2024-10-31

97 downloads per month

MIT/Apache

26KB
989 lines

dbxcase

This is an implementation of text case-folding which matches how Dropbox handles file paths.

Dropbox was originally implemented using Python 2.5 (the current version at the time) and used its unicode.lower() function to compare paths case-insensitively. Python 2.5 is long gone, but its behavior of this function has been preserved to maintain backwards compatibility.

Python 2.5's case-folding is based on Unicode 4.1.0's character database, but does not implement the case-folding algorithm recommended. Instead, it simply applies the "simple lowercase mapping" which is a 1:1 character mapping and does not take any context into account. And of course, it lacks many characters added since 2003.

As a result, it differs in several ways from any modern to_lowercase() function like the one included in the Rust standard library. These differences are important if proper interoperation with the Dropbox API is desired.


lib.rs:

This crate implements the case-folding rules used by Dropbox for file paths.

It's a recreation of what Python 2.5's unicode.lower() did (which was the current version of Python at the time of Dropbox's founding).

For every character in the Unicode 4.1 character database which has a "simple lowercase mapping" property, it replaces it with the corresponding character.

This is different from a proper lowercasing, where at least one upper-case codepoint (U+0130, "Latin Capital Letter I with Dot Above") maps to two lower-case codepoints. It also uses a very old version of Unicode which lacks many characters added since 2003.

The mapping is hardcoded, but the code can be regenerated manually from the Unicode database using an included program in the codebase.

Dependencies

~0–610KB
~11K SLoC