#japanese #japanese-morphological #dictionary #builder #ipadic #tokenize #neologd

lindera-ipadic-neologd-builder

A Japanese morphological dictionary builder for IPADIC NEologd

18 releases (9 breaking)

0.32.3 Mar 18, 2025
0.32.2 Jun 29, 2024
0.31.0 May 28, 2024
0.29.0 Mar 18, 2024
0.1.2 Feb 20, 2020

#956 in Text processing

Download history 3117/week @ 2024-12-04 3149/week @ 2024-12-11 2804/week @ 2024-12-18 1943/week @ 2024-12-25 2644/week @ 2025-01-01 4337/week @ 2025-01-08 3387/week @ 2025-01-15 2631/week @ 2025-01-22 3721/week @ 2025-01-29 3195/week @ 2025-02-05 3304/week @ 2025-02-12 3463/week @ 2025-02-19 2577/week @ 2025-02-26 2958/week @ 2025-03-05 3270/week @ 2025-03-12 3340/week @ 2025-03-19

12,561 downloads per month

MIT license

155KB
3K SLoC

Lindera IPADIC NEologd Builder

License: MIT Join the chat at https://gitter.im/lindera-morphology/lindera Crates.io

IPADIC NEologd dictionary builder for Lindera. This project fork from kuromoji-rs.

Dictionary version

This repository contains mecab-ipadic-neologd.

Dictionary format

Refer to the manual for details on the IPADIC dictionary format and part-of-speech tags.

Index Name (Japanese) Name (English) Notes
0 表層形 Surface
1 左文脈ID Left context ID
2 右文脈ID Right context ID
3 コスト Cost
4 品詞 Major POS classification
5 品詞細分類1 Middle POS classification
6 品詞細分類2 Small POS classification
7 品詞細分類3 Fine POS classification
8 活用形 Conjugation type
9 活用型 Conjugation form
10 原形 Base form
11 読み Reading
12 発音 Pronunciation

User dictionary format (CSV)

Simple version

Index Name (Japanese) Name (English) Notes
0 表層形 surface
1 品詞 Major POS classification
2 読み Reading

Detailed version

Index Name (Japanese) Name (English) Notes
0 表層形 Surface
1 左文脈ID Left context ID
2 右文脈ID Right context ID
3 コスト Cost
4 品詞 POS
5 品詞細分類1 POS subcategory 1
6 品詞細分類2 POS subcategory 2
7 品詞細分類3 POS subcategory 3
8 活用形 Conjugation type
9 活用型 Conjugation form
10 原形 Base form
11 読み Reading
12 発音 Pronunciation
13 - - After 13, it can be freely expanded.

How to use IPADIC dictionary

For more details about lindera command, please refer to the following URL:

API reference

The API reference is available. Please see following URL:

Dependencies

~9MB
~213K SLoC