2 releases

0.1.1 Jan 18, 2021
0.1.0 Feb 8, 2018

#1895 in Text processing

Download history 12/week @ 2024-07-21 45/week @ 2024-07-28 14/week @ 2024-08-04 47/week @ 2024-08-11 24/week @ 2024-08-18 75/week @ 2024-08-25 66/week @ 2024-09-01 81/week @ 2024-09-08 74/week @ 2024-09-15 90/week @ 2024-09-22 107/week @ 2024-09-29 60/week @ 2024-10-06 74/week @ 2024-10-13 65/week @ 2024-10-20 40/week @ 2024-10-27 33/week @ 2024-11-03

222 downloads per month
Used in lingo

MIT license

28KB
385 lines

stopwords-rs Crates.io Build Status

Stopwords from popular text processing frameworks.

These are high-frequency grammatical words which are usually ignored in information retrieval applications.


lib.rs:

This library provides stopwords datasets from popular text processing engines.

This could help reproducing results of text analysis pipelines written using different languages and tools.

Usage

[dependencies]
stopwords = "0.1.0"
extern crate stopwords;

use std::collections::HashSet;
use stopwords::{Spark, Language, Stopwords};

fn main() {
    let stops: HashSet<_> = Spark::stopwords(Language::English).unwrap().iter().collect();
    let mut tokens = vec!("brocolli", "is", "good", "to", "eat");
    tokens.retain(|s| !stops.contains(s));
    assert_eq!(tokens, vec!("brocolli", "good", "eat"));
}

Dependencies

~260–720KB
~17K SLoC