2 releases

0.1.1 Mar 9, 2023
0.1.0 Feb 6, 2023

#11 in #backup-file

MIT/Apache

30KB
596 lines

bitwrought

bitwrought is a file integrity checker written primarily for MacOS (but it should work on linux). It stores file hashes and last modified timestamps as extended attributes on the file you point it toward. The next time you run it on those files, it will recalculate hashes based on the file contents and compare it with the saved values.

If the hashes match, the data hasn't changed. If the hashes are different but the file metadata says it was modified after the previously saved value, the file was likely modified. If the hashes do not match but the file metadata says it was not modified after the value saved, the file may have suffered from bit rot or data corruption.

The name "bitwrought" is a pun: "bit rot" is when some data that was written to disk appears different when read later. This can happen for any number of reasons: the data could have been copied around on disk incorrectly, or physical degradation could have flipped some bits. In practice, this can be ruinous: your treasured photo or video library, for example, could suffer bit rot and be corrupted. You then unknowingly back up the corrupted library, overwriting any uncorrupted backups. At some point in the future, you try unsuccessfully to open the library. The files, along with any backups, are corrupted.

Preventing this requires two things: 1) a way to notify you that files have been changed, and 2) a way to retrieve a backup from before the file changed. bitwrought fills the first role, and will tell you about the status of your files when you run it. bitwrought does NOT fill the second role: it doesn't make any backups of files. Are you wrought about your bits? Run bitwrought!

Installation

Head on over to the Releases page and download the zip. It comes with this README as well. Then put the binary anywhere you like to put binaries.

For rust nerds

You can also cargo install bitwrought if you have the necessary tooling.

Usage

Run bitwrought PATH to check the file at path, or all files residing in PATH. By default, bitwrought does not check directories recursively, but the behavior can be changed with the --recursive option. bitwrought's saved xattrs can also be removed with the --delete option. For each file specified:

  • if the file has no attributes saved by bitwrought, it will calculate a file hash and save that value, as well the last modified timestamp in file metadata, in xattrs on the file.
  • if the file has a hash, it calculates a new one and compares it to the saved one. The last modified timestamp in file metadata is compared to bitwrought's last modified timestamp (saved in an xattr) to determine if the file was modified or could have suffered from bit rot.
Usage: bitwrought [OPTIONS] <path>...

Arguments:
  <path>...  one or more files or directories

Options:
  -r, --recursive  check all files in the directory recursively
  -d, --delete     delete hash and timestamp xattrs saved by bitwrought
  -v, --verbose    verbose output including hashes and timestamps
  -h, --help       Print help
  -V, --version    Print version

Support

Bitwrought supports macOS. It is theoretically able to support Linux, but xattrs are not supported on the versions of NFS used in the GitHub actions (and ec2), so until I figure out another way to test bitwrought on Linux, I won't publish binaries for it. If you want to use bitwrought on Linux, let me know- maybe we can brainstorm a solution.

FAQ

Q: If my file suffers from bit rot, why should you expect that the saved extended attributes remain intact?

A: APFS checksums its own metadata, but not user data, meaning that if a file suffers from bit rot, there is a high chance the extended attributes are fine.

Q: How should I use bitwrought?

A: I think the best use case for bitwrought is for files that you care about that seldom change. For example, you might run bitwrought on a folder of music or photos.

Q: Is bitwrought fast?

A: Reasonably, yes. It is only single threaded right now, which limits its speed. But on my local machine it takes a couple of minutes per gigabyte. All data reads are buffered so it can handle large files, and changing the buffer size has surprisingly little impact on the speed of the program.

Dependencies

~3–12MB
~147K SLoC