6 releases (3 breaking)
0.5.0 | May 15, 2022 |
---|---|
0.4.0 | Jul 12, 2021 |
0.3.0 | Feb 4, 2021 |
0.2.2 | May 12, 2020 |
#535 in Machine learning
72 downloads per month
Used in 4 crates
(2 directly)
47KB
1K
SLoC
Oh No! More Lemmas
ohnomore consists of two tools to incorporate TüBa-D/Z style lemmas
into language processing pipelines. The first tool, ohnomore-preproc
takes TüBa-D/Z lemmas and transforms them into lemmas that are more
fit for machine learning pipelines. For example:
- Alternative lemmatizations are removed.
- Separable prefix markers are removed.
- Separable prefixes are removed when they are separated.
- The special reflexive lemma #refl is replaced by the lowercased form.
- Lemmas of truncations are replaced by their forms.
The second tool, ohnomore
performs the opposite transformation (as
much as is feasible).
Dependencies
~5.5MB
~93K SLoC