4 releases
0.2.4 | Nov 19, 2024 |
---|---|
0.2.3 | Nov 19, 2024 |
0.2.2 | Nov 19, 2024 |
0.2.1 | Nov 19, 2024 |
0.2.0 |
|
#140 in Compression
27KB
504 lines
GZInspector
A robust command-line tool for inspecting and analyzing GZIP/ZLIB compressed files. GZInspector provides detailed information about compression chunks, headers, and content previews with support for both human-readable and JSON output formats.
Motivation
Most GZIP implementations discard chunk boundaries during decompression since they're typically irrelevant for the decompressed output. However, certain file formats leverage GZIP chunks as a core feature, allowing selective decompression of individual chunks when their byte offsets and lengths are known.
This chunked compression approach is particularly prevalent in web archiving formats, including:
- WARC, WET, WAT files used by web archives to store crawled content
- CDX/J and ZipNum encoded CDX files that enable efficient index lookups
These formats are actively used by major web archiving initiatives like CommonCrawl and the Internet Archive to manage and provide access to petabyte-scale web archives.
Features
- đĻ Chunk-by-chunk analysis of GZIP files
- đ Detailed compression statistics and ratios
- đ Content preview capabilities
- đ¯ Support for concatenated GZIP files
- đž Multiple output formats (human-readable and JSON)
- đ Comprehensive header information including timestamps and flags
- đ Automatic encoding detection and handling
Installation
Using Rust Cargo
cargo install gzinspector
Pre-built Binary (Linux)
To install the pre-built binary for Linux:
# Download the binary
# Download latest release from:
# https://github.com/jt55401/gzinspector/releases/latest
wget $(curl -s https://api.github.com/repos/jt55401/gzinspector/releases/latest | grep "browser_download_url.*tar\.gz" | cut -d '"' -f 4)
# Or browse all releases at:
# https://github.com/jt55401/gzinspector/releases
# Extract the binary
tar -xzf gzinspector-linux-x86_64.tar.gz
# Move the binary to a directory in your PATH
sudo mv gzinspector /usr/local/bin/
From Source
To install GZInspector from source, you'll need Rust and Cargo installed on your system. Then:
# Clone the repository
git clone https://github.com/jt55401/gzinspector.git
# Build the project
cd gzinspector
cargo build --release
# The binary will be available at target/release/gzinspector
Usage
gzinspector [OPTIONS] <FILE>
Options
-o, --output-format <FORMAT>
: Output format (human or json) [default: human]-p, --preview <PREVIEW>
: Preview content (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3 lines)-c, --chunks <CHUNKS>
: Only show first and last N chunks (format: HEAD:TAIL, e.g. '5:3' shows first 5 and last 3)-e, --encoding <ENCODING>
: Encoding for preview [default: utf-8]-h, --help
: Display help information-V, --version
: Display version information
Examples
Basic file inspection:
gzinspector example.gz
Show JSON output:
gzinspector -o json example.gz
Preview content (first 5 lines and last 3 lines):
gzinspector -p 5:3 example.gz
Output Format
Human-readable Output
The human-readable output includes:
đĻ #1 â đ 0 â đ 2.5x â đĨ 1.2KB â đ¤ 3.0KB â âšī¸ deflate|EXTRA|NAME|example.txt
Where:
- đĻ #N: Chunk number
- đ: Offset in file
- đ/đ: Compression ratio (with direction indicator)
- đĨ: Compressed size
- đ¤: Uncompressed size
- âšī¸: Header information
JSON Output
JSON output provides detailed information in a machine-readable format:
{
"chunk_number": 1,
"offset": 0,
"compressed_size": 1234,
"uncompressed_size": 3000,
"compression_ratio": 2.43,
"header_info": "deflate|EXTRA|NAME|example.txt"
}
File Summary
Both output formats include a summary showing:
- Total number of chunks
- Total compressed size
- Total uncompressed size
- Average compression ratio
Dependencies
flate2
: GZIP/ZLIB compression libraryserde
: Serialization frameworkclap
: Command line argument parsingchrono
: Date and time functionalitycrc32fast
: CRC32 checksum calculation
Building from Source
- Ensure you have Rust installed (1.56.0 or later)
- Clone the repository
- Run
cargo build --release
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Jason Grey (jason@jason-grey.com)
Version History
-
0.1.0: Initial release
- Basic GZIP file inspection
- Human-readable and JSON output formats
- Content preview functionality
-
0.2.0: Chunks release
- Ability to show first N and last N chunks of the file
- Shows progress bar during tail scan of large files
Dependencies
~6â14MB
~156K SLoC