5 releases

0.2.2 Feb 14, 2021
0.2.1 Feb 14, 2021
0.2.0 Feb 6, 2021
0.1.1 Aug 13, 2020
0.1.0 Aug 13, 2020

#1244 in Parser implementations

MIT license

50KB
1K SLoC

rust-gedcom

a gedcom parser written in rust 🦀

About this project

GEDCOM is a file format for sharing genealogical information like family trees! It's being made obsolete by GEDCOM-X but is still widely used in many genealogy programs.

I wanted experience playing with parsers and representing tree structures in Rust, and noticed a parser for Rust did not exist. And thus, this project was born! A fun experiment to practice my Rust abilities.

It hopes to be fully mostly compliant with the Gedcom 5.5.1 specification.

I have found this 5.5.2 specification useful in its assessment of which tags are worth supporting or not.

Usage

This crate comes in two parts. The first is a binary called parse_gedcom, mostly used for my testing & development. It prints the GedcomData object and some stats about the gedcom file passed into it:

parse_gedcom ./tests/fixtures/sample.ged

# outputs tree data here w/ stats
# ----------------------
# | Gedcom Data Stats: |
# ----------------------
#   submitters: 1
#   individuals: 3
#   families: 2
#   repositories: 1
#   sources: 1
#   multimedia: 0
# ----------------------

The second is a library containing the parser.

JSON Serializing/Deserializing with serde

This crate has an optional feature called json that implements Serialize & Deserialize for the gedcom data structure. This allows you to easily integrate with the web.

For more info about serde, check them out!

The feature is not enabled by default. There are zero dependencies if just using the gedcom parsing functionality.

Use the json feature with any version >=0.2.1 by adding the following to your Cargo.toml:

gedcom = { version = "<version>", features = ["json"] }

🚧 Progress 🚧

There are still parts of the specification not yet implemented and the project is subject to change. The way I have been developing is to take a gedcom file, attempt to parse it and act on whatever errors or omissions occur. In it's current state, it is capable of parsing the sample.ged in its entirety.

Here are some notes about parsed data & tags. Page references are to the Gedcom 5.5.1 specification.

Top-level tags

  • HEAD.SOUR - p.42 - The source in the header is currently skipped.
  • SUBMISSION_RECORD - p.28 - No attempt at handling this is made.
  • MULTIMEDIA_RECORD - p.26 - Multimedia (OBJE) is not currently parsed.
  • NOTE_RECORD - p.27 - Notes (NOTE) are also unhandled. (except in header)

Tags for families (FAM), individuals (IND), repositories (REPO), sources (SOUR), and submitters (SUBM) are handled. Many of the most common sub-tags for these are handled though some may not yet be parsed. Mileage may vary.

Notes to self

  • Consider creating some Traits to handle change dates, notes, source citations, and other recurring fields.

License

© 2021, Robert Pirtle. licensed under MIT.

Dependencies

~0–295KB