#language #preprocessor #nlp

contractions

Contractions is a rust library to expand contractions in English

3 releases

0.5.4 May 25, 2021
0.5.3 May 25, 2021
0.5.1 May 24, 2021

#1892 in Text processing

Download history 25/week @ 2024-07-22 9/week @ 2024-07-29 10/week @ 2024-08-26 113/week @ 2024-09-02 342/week @ 2024-09-09 364/week @ 2024-09-16 675/week @ 2024-09-23 522/week @ 2024-09-30 507/week @ 2024-10-07 383/week @ 2024-10-14 477/week @ 2024-10-21 319/week @ 2024-10-28 538/week @ 2024-11-04

1,718 downloads per month
Used in yake-rust

MIT license

21KB
163 lines

Contractions

Notice: Contractions’ API is not stabilized yet and still work in progress

MIT licensed

contractions is a rust library to handle contractions in English.
So far only data sets to expand contractions are implemented.

Expands "I’m" to "I am" etc.
The default data set has a replacement for all-lowercase, all-uppercase and first letter uppercase.

Example

let contractions = contractions::Contractions::default();
assert_eq!("I am sure you would have been fine.", contractions.apply("I’m sure you’d’ve been fine."));
assert_eq!("Are you sure?", contractions.apply("R u sure?"));
let mut contractions = Contractions::new();
contractions.add_from_json(contractions::SINGLE_CONTRACTIONS_JSON);
assert_eq!("I am sad you couldn’t’ve come.", contractions.apply("I’m sad you couldn’t’ve come."));

Problem cases (default data set):

  • Ain’t "The word ’ain’t’ is a contraction for am not, is not, are not, has not, and have not in the common English language vernacular. In some dialects ain’t is also used as a contraction of do not, does not, and did not." - https://en.wikipedia.org/wiki/Ain’t
    • The default dataset replaces does not replace "Ain’t"
  • Some contractions with "’s" can be "is" or a possessive
    • The default dataset replaces "Everyone’s" => "Everyone is"
    • The default dataset replaces "Somebody’s" => "Somebody is"
    • The default dataset replaces "Someone’s" => "Someone is"
    • The default dataset replaces does not replace any other contractions with ’s such as "Carl’s"
  • She’s / He’s / It’s
    • "He’s" can be "He is" or "He has".
    • The default dataset replaces "He’s" => "He is", "She’s" => "She is", "It’s" => "It is"

Dependencies

~2.9–5MB
~91K SLoC