3 releases

Uses new Rust 2024

0.0.2	Apr 4, 2025
0.0.1	Mar 11, 2025
0.0.1-dev1	Mar 10, 2025

#94 in Machine learning

181 downloads per month

MIT/Apache

23KB
309 lines

hf-mem

CLI to estimate inference memory requirements from the Hugging Face Hub

Install

$ cargo install hf-mem

Usage

$ hf-mem --help
CLI to estimate inference memory requirements from the Hugging Face Hub

Usage: hf-mem [OPTIONS] --model-id <MODEL_ID>

Options:
  -m, --model-id <MODEL_ID>
  -r, --revision <REVISION>
  -t, --token <TOKEN>
  -h, --help                 Print help
  -V, --version              Print version

Features

Fast and light CLI with a single installable binary
Fetches just the required bytes from the safetensors files on the Hugging Face Hub that contain the metadata
Provides an estimation based on the count of the parameters on the different dtypes
Supports both sharded i.e. model-00000-of-00000.safetensors and not sharded i.e. model.safetensors files

What's next?

Add tracing and progress bars when fetching from the Hub
Support other file types as e.g. gguf
Read metadata from local files if existing, instead of just fetching from the Hub every single time
Add more flags to support estimations assuming quantization, extended context lengths, any added memory overhead, etc.

License

This project is licensed under either of the following licenses, at your option:

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~9–20MB
~260K SLoC