1 stable release

1.0.0 Nov 18, 2024

#626 in Text processing

Download history 143/week @ 2024-11-18

143 downloads per month
Used in 2 crates (via wiki_corpus_parser)

MIT license

65KB
1.5K SLoC


Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format

Dependencies

~6–16MB
~197K SLoC