#knowledge-graph #wikidata #wikipedia #graph-database #dbpedia

bin+lib kgdata

Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)

17 stable releases (3 major)

4.0.1 Mar 28, 2024
3.9.0 Nov 13, 2023
3.8.0 Oct 31, 2023
3.5.2 Sep 23, 2023
1.3.0 May 21, 2023

#1099 in Database interfaces

43 downloads per month

Custom license

195KB
5K SLoC

kgdata PyPI Documentation

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
  • Create embedded key-value databases to access entities from the dumps.
  • Extract Wikidata ontology.
  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
  • Create Pyserini indices to search Wikidata’s entities.
  • and more

For a full documentation, please see the website.

Installation

From PyPI (using pre-built binaries):

pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version

Dependencies

~47–77MB
~1.5M SLoC