#cc-cedict #dictionary #chinese #morphological

lindera-cc-cedict

A Japanese morphological dictionary for CC-CEDICT

51 releases (29 breaking)

new 0.41.0 Apr 13, 2025
0.40.1 Mar 27, 2025
0.38.1 Nov 30, 2024
0.32.2 Jun 29, 2024
0.12.2 Mar 23, 2022

#2150 in Text processing

Download history 2619/week @ 2024-12-22 2067/week @ 2024-12-29 3441/week @ 2025-01-05 3723/week @ 2025-01-12 2437/week @ 2025-01-19 2905/week @ 2025-01-26 3191/week @ 2025-02-02 3156/week @ 2025-02-09 3725/week @ 2025-02-16 5384/week @ 2025-02-23 5341/week @ 2025-03-02 4452/week @ 2025-03-09 4423/week @ 2025-03-16 4438/week @ 2025-03-23 8208/week @ 2025-03-30 5816/week @ 2025-04-06

23,174 downloads per month
Used in 12 crates (via lindera)

MIT license

140KB
3K SLoC

Lindera CC-CE-DICT

License: MIT Crates.io

Dictionary version

This repository contains CC-CEDICT-MeCab.

Dictionary format

Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.

Index Name (Chinese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 Major POS classification
5 词类1 Middle POS classification
6 词类2 Small POS classification
7 词类3 Fine POS classification
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition

User dictionary format (CSV)

Simple version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 词类 Major POS classification
2 併音 pinyin

Detailed version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 POS
5 词类1 POS subcategory 1
6 词类2 POS subcategory 2
7 词类3 POS subcategory 3
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition
12 - - After 12, it can be freely expanded.

API reference

The API reference is available. Please see following URL:

Dependencies

~12–24MB
~387K SLoC