#unicode #format #conllu #layer #subcommand #sentence #co-nll-u

app conllu-utils

Utilities for working with the CoNLL-U dependency format

9 releases

0.1.8 Dec 7, 2020
0.1.7 May 5, 2020
0.1.6 Apr 18, 2020
0.1.3 Mar 23, 2020

#1699 in Text processing

Apache-2.0

46KB
1K SLoC

CoNLL-U Utilities

Introduction

This is a set of utilities to process files in the CoNLL-U format. The conllu command provides the following subcommands:

  • accuracy: compute the accuracy of a system based on two treebanks
  • cleanup: normalize unicode and replace unicode punctuation
  • compare: compare two treebanks on one or more layers
  • from-text: convert tokenized text files to CoNLL-U.
  • merge: merge CoNLL-U files
  • partition: partition a CoNLL-U file in N files.
  • shuffle: shuffle the sentences in a CoNLL-U file.
  • to-text: convert CoNLL-U to tokenized plain text.

Usage

Executing a subcommand gives usage information when --help is given as an argument.

Dependencies

~7–15MB
~186K SLoC