#llm #models #gguf #metadata #tokenizer #load #hugging-face

nightly llm_models

Load and Download LLM Models, Metadata, and Tokenizers

1 unstable release

0.0.1 Oct 10, 2024

#8 in #gguf

Download history 165/week @ 2024-10-09 31/week @ 2024-10-16 6/week @ 2024-10-23 4/week @ 2024-10-30 8/week @ 2024-11-06 3/week @ 2024-11-13 11/week @ 2024-11-20 13/week @ 2024-11-27 56/week @ 2024-12-04 32/week @ 2024-12-11 18/week @ 2024-12-18 6/week @ 2024-12-25 21/week @ 2025-01-01

95 downloads per month
Used in 3 crates

MIT license

7MB
4.5K SLoC

llm_models: Load and Download LLM Models, Metadata, and Tokenizers

This crate is part of the llm_client crate.

  • GGUFs from local storage or Hugging Face
    • Parses model metadata from GGUF file
    • Includes limited support for tokenizer from GGUF file
    • Also supports loading Metadata and Tokenizer from their respective files

LocalLlmModel

Everything you need for GGUF models. The GgugLoader wraps the loaders for convience. All loaders return a LocalLlmModel which contains the tokenizer, metadata, chat template, and anything that can be extract from the GGUF.

GgufPresetLoader

  • Presets for popular models like Llama 3, Phi, Mistral/Mixtral, and more
  • Loads the best quantized model by calculating the largest quant that will fit in your VRAM
let model: LocalLlmModel = GgufLoader::default()
    .llama3_1_8b_instruct()
    .preset_with_available_vram_gb(48) // Load the largest quant that will fit in your vram
    .load()?;

GgufHfLoader

GGUF models from Hugging Face.

let model: LocalLlmModel = GgufLoader::default()
    .hf_quant_file_url("https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
    .load()?;

GgufLocalLoader

GGUF models for local storage.

let model: LocalLlmModel = GgufLoader::default()
    .local_quant_file_path("/root/.cache/huggingface/hub/models--bartowski--Meta-Llama-3.1-8B-Instruct-GGUF/blobs/9da71c45c90a821809821244d4971e5e5dfad7eb091f0b8ff0546392393b6283")
    .load()?;

ApiLlmModel

  • Supports openai, anthropic, perplexity, and adding your own API models
  • Supports prompting, tokenization, and price estimation
    assert_eq!(ApiLlmModel::gpt_4_o(), ApiLlmModel {
        model_id: "gpt-4o".to_string(),
        context_length: 128000,
        cost_per_m_in_tokens: 5.00,
        max_tokens_output: 4096,
        cost_per_m_out_tokens: 15.00,
        tokens_per_message: 3,
        tokens_per_name: 1,
        tokenizer: Arc<LlmTokenizer>,
    })

LlmTokenizer

    let tok = LlmTokenizer::new_tiktoken("gpt-4o"); // Get a Tiktoken tokenizer
    let tok = LlmTokenizer::new_from_tokenizer_json("path/to/tokenizer.json"); // From local path
    let tok = LlmTokenizer::new_from_hf_repo(hf_token, "meta-llama/Meta-Llama-3-8B-Instruct"); // From repo
    // From LocalLlmModel or ApiLlmModel
    let tok = model.model_base.tokenizer;

Setter Traits

  • All setter traits are public, so you can integrate into your own projects if you wish.
  • For example: OpenAiModelTrait,GgufLoaderTrait,AnthropicModelTrait, and HfTokenTrait for loading models

Dependencies

~25–38MB
~428K SLoC