#digest #hash #data #hashing

crev-recursive-digest

Library implementing recursive digest for filesystem directories

4 releases (breaking)

0.6.0 Apr 4, 2023
0.5.0 Jul 31, 2021
0.4.0 Mar 8, 2020
0.3.0 Mar 8, 2020
0.1.1 Dec 19, 2018

#562 in Filesystem

Download history 168/week @ 2024-08-26 259/week @ 2024-09-02 383/week @ 2024-09-09 161/week @ 2024-09-16 183/week @ 2024-09-23 200/week @ 2024-09-30 98/week @ 2024-10-07 194/week @ 2024-10-14 477/week @ 2024-10-21 292/week @ 2024-10-28 349/week @ 2024-11-04 350/week @ 2024-11-11 459/week @ 2024-11-18 235/week @ 2024-11-25 388/week @ 2024-12-02 311/week @ 2024-12-09

1,410 downloads per month
Used in 8 crates (5 directly)

MPL-2.0 OR MIT OR Apache-2.0

16KB
297 lines

Recursive file-system digest

This library implements a simple but efficient recursive file-system digest algorithm. You have a directory with some content in it, and you'd like a cryptographical digest (hash) of its content.

It was created for the purpose of checksuming source code packages in crev, but it is generic and can be used for any other purpose.

Algorithm

Given any digest algorithm H (a Hash function algorithm), a RecursiveDigest(H, path) is:

  • for a file: H("F" || file_content)
  • for a symlink: H("L" || symlink_content)
  • for a directory: H("D" || directory_content)

As you can see a one-letter ASCII prefix is used to make it impossible to create a file that has the same digest as a directory, etc. The drawback of this approach is that RecursiveDigest(H, path) of a simple file is not the same as just a normal digest of it (H(file_content)) .

file_content is just the byte content of a file.

symlink_content is just the path the symlink is pointing to, as bytes.

directory_content is created by:

  • sorting all entries of a directory by name, in ascending order, using a simple byte-sequence comparison
  • for all entries concatenating pairs of:
    • H(entry_name)
    • RecursiveDigest(H, entry_path)

If optional additional data extensions is used, the H(entry_name) above becomes H(entry_name || 0 || additional data). The format and meaning of additional data is unspecified, but was intendet for fielsystem metadata like file system permissions and ownership.

Dependencies

~0.6–7.5MB
~59K SLoC