#async #tokio #bags #bag-it #rfc-8493

nightly async_bagit

Create and load BagIt containers

2 unstable releases

0.2.0 Jul 31, 2024
0.1.0 Jul 20, 2024

#1112 in Algorithms

MIT/Apache

130KB
1K SLoC

async_bagit

CI status crates.io version docs.rs

Rust library to create and read BagIt containers.

Requirements to use in your crate

  • Tokio runtime
  • Nightly Rust until features shown in table below are stabilized, sorry. Feel free to propose a patch to have something working on stable in the meantime!
Nightly feature Tracking issue
iter_next_chunk #98326

Notes when using the crate

Checksum algorithms

The BagIt spec wants checksums on payload content inside bags. While this crate handles checksums and can compute checksums on files, this crate does not include any checksum algorithms. I do not want to force crate consumers to use checksum x or y, you will have to provide the algorithm.

Any struct implementing the Digest trait from the digest crate will be accepted for checksums computation. Most algorithms in the Rust ecosystem implement this trait already. I am not an expert, but as of this writing (July 2024) I would recommend using BLAKE3, BLAKE2 or SHA512, in that order.

No special treatment of files

For opening bags, the crate reads only directories on the file system containing BagIt bags, compression or encryption is out of the scope, you will need to handle that yourself.

When creating bags, the crate will copy files when adding them to the bag. Make sure you have enough storage space and writing permission when creating your bags.

RFC limitations and quirks

  • UTF-8 only. This is a Rust crate, everything is UTF-8.
  • No multi checksums, only one at a time.
  • The RFC says in section 2.4 that bag creation and validation tools MUST support the SHA-256 and SHA-512 algorithms. This crate requires you to bring your own algorithm for checksums, so I guess this crate will never be RFC compliant?
  • Checksums encoding are only supported in lowercase.
  • This crate has been written for and on a UNIX system, so Windows file paths were not considered at all. Furthermore, casing in filenames was also not considered as well. Feel free to improve this if this is an issue for you.

TODO

  • fetch.txt: I do not have any use for this yet, and seems a bunch of work to implement it
  • Support multiple checksum algorithms at the same time
  • Respect the spec regarding filename casing

I do not expect this crate to be fully compliant with the RFC or with other tools handling BagIt bags (although I am not against it, on the contrary).

My usage of BagIt will be creating bags with this crate, and opening them with this crate as well.

Acknowledgments

If you are looking for a sync Rust implementation of BagIt https://github.com/pwinckles/bagr is a nice one!

Dependencies

~3–10MB
~97K SLoC