2 releases
Uses new Rust 2024
new 0.0.1 | Mar 11, 2025 |
---|---|
0.0.1-dev1 | Mar 10, 2025 |
#14 in #estimate
159 downloads per month
22KB
261 lines
hf-mem
A (simple) command-line to estimate inference memory requirements on Hugging Face
Usage
cargo install hf-mem
And then:
hf-mem --model-id meta-llama/Llama-3.1-8B-Instruct --token ...
Features
- Fast and light command-line, with a single installable binary
- Fetches just the required bytes from the
safetensors
files on the Hugging Face Hub that contain the metadata - Provides an estimation based on the count of the parameters on the different dtypes
- Supports both shared i.e.
model-00000-of-00000.safetensors
and not shared i.e.model.safetensors
files
What's next?
- Add tracing and progress bars when fetching from the Hub
- Support other file types as e.g.
gguf
- Read metadata from local files if existing, instead of just fetching from the Hub every single time
- Add more flags to support estimations assuming quantization, extended context lengths, any added memory overhead, etc.
License
This project is licensed under either of the following licenses, at your option:
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Dependencies
~7–19MB
~244K SLoC