4 releases (2 breaking)
0.4.1 | Oct 12, 2024 |
---|---|
0.4.0 | Jun 10, 2024 |
0.3.0 | Jun 4, 2024 |
0.2.1 | Jun 3, 2024 |
#310 in Internationalization (i18n)
27KB
248 lines
Lingua-cli
This is a small command-line tool for language detection, it is a simple wrapper around the lingua-rs library for Rust, read there for extensive documentation. A distinguishing feature is that this library works better for short texts thanmany other libraries
Installation
Ensure you have Rust's package manager cargo
, then download, isntall and compile lingua-cli
in one go as follows:
$ cargo install lingua-cli
Usage
Pass text as parameter
$ lingua-cli bonjour à tous
Pass text via standard input:
$ echo "bonjour à tous" | lingua-cli
Constrain the languages you want to detect using -l
with iso-639-1 languages
codes. Constraining the list improves accuracy. Do -L
to see a list of
supported languages.
$ echo "bonjour à tous" | lingua-cli -l "fr,de,es,nl,en"
To classify input line-by-line, pass -n
.
$ echo -e "bonjour à tous\nhola a todos\nhallo allemaal" | lingua-cli -n -l "fr,de,es,nl,en"
fr 0.9069164472389637 bonjour à tous
es 0.918273871035807 hola a todos
nl 0.988293648761749 hallo allemaal
Output is TSV and consists of an iso-639-1 language code, confidence score, and in line-by-line mode, a copy of the line.
You can also classified mixed text using the --multi
option. This will then output UTF-8 byte offsets:
$ lingua-cli --multi -l fr,de,en < /tmp/test.txt
0 23 fr Parlez-vous français?
23 73 de Ich spreche ein bisschen spreche Französisch ja.
73 110 en A little bit is better than nothing.
Dependencies
~18MB
~430K SLoC