3 releases
0.1.2 | Aug 8, 2021 |
---|---|
0.1.1 | May 18, 2021 |
0.1.0 | Apr 2, 2021 |
#939 in Audio
21 downloads per month
Used in 2 crates
(via layered-part-of-speech)
2MB
5.5K
SLoC
wiktionary-part-of-speech-extract
./sample.xml
is just the head of the entire wikimedia enwiktionary-20210320-pages-articles.xml
download (source).
The purpose of this generator is to parse the file in its entirety.
cargo run ./sample.xml
lib.rs
:
cargo run regenerate --release enwiktionary-pages-*.xml # regenerate "words.fst" binary
cargo publish # publish lib including "words.fst" binary
Usage
use wiktionary_part_of_speech_extract::{ENGLISH_TAG_LOOKUP, TagSet, Tag};
assert_eq!(Some(TagSet::of(&[Tag::Noun, Tag::Verb])), ENGLISH_TAG_LOOKUP.get("harbor"));
Dependencies
~1.6–2.9MB
~34K SLoC