#wikidata #knowledge-graph #wikipedia #dbpedia

bin+lib kgdata_core

Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)

1 stable release

new 4.0.1 Apr 10, 2025

#12 in #wikidata

Custom license

195KB
5K SLoC

kgdata PyPI Documentation

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
  • Create embedded key-value databases to access entities from the dumps.
  • Extract Wikidata ontology.
  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
  • Create Pyserini indices to search Wikidata’s entities.
  • and more

For a full documentation, please see the website.

Installation

From PyPI (using pre-built binaries):

pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version

Dependencies

~47–76MB
~1.5M SLoC