1 stable release

1.0.0 Nov 18, 2024

#1574 in Text processing


Used in 2 crates (via wiki_corpus_parser)

MIT license

65KB
1.5K SLoC


Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format

Dependencies

~7–17MB
~197K SLoC