#tokenizer #hugging-face #gguf #model #llm #metadata #openai

nightly llm_models

llm_models: Load and download LLM models, metadata, and tokenizers

2 releases

0.0.2 Jan 29, 2025
0.0.1 Oct 10, 2024

#541 in Filesystem

Download history 2/week @ 2024-10-29 10/week @ 2024-11-05 1/week @ 2024-11-12 12/week @ 2024-11-19 10/week @ 2024-11-26 42/week @ 2024-12-03 47/week @ 2024-12-10 20/week @ 2024-12-17 7/week @ 2024-12-24 18/week @ 2024-12-31 20/week @ 2025-01-07 3/week @ 2025-01-14 145/week @ 2025-01-28 39/week @ 2025-02-04 12/week @ 2025-02-11

196 downloads per month
Used in 3 crates

MIT license

8MB
4.5K SLoC

llm_models: Load and download LLM models, metadata, and tokenizers

API Documentation

The llm_models crate is a workspace member of the llm_client project.

Features

  • GGUFs from local storage or Hugging Face
    • Parses model metadata from GGUF file
    • Includes limited support for tokenizer from GGUF file
    • Also supports loading Metadata and Tokenizer from their respective files
  • API models from OpenAI, Anthropic, and Perplexity
  • Tokenizer abstraction for Hugging Face's Tokenizer and Tiktoken

LocalLlmModel

Everything you need for GGUF models. The GgufLoader wraps the loaders for convenience. All loaders return a LocalLlmModel which contains the tokenizer, metadata, chat template, and anything that can be extracted from the GGUF.

GgufPresetLoader

  • Presets for popular models like Llama 3, Phi, Mistral/Mixtral, and more
  • Loads the best quantized model by calculating the largest quant that will fit in your VRAM
let model: LocalLlmModel = GgufLoader::default()
    .llama3_1_8b_instruct()
    .preset_with_available_vram_gb(48) // Load the largest quant that will fit in your vram
    .load()?;

GgufHfLoader

GGUF models from Hugging Face.

let model: LocalLlmModel = GgufLoader::default()
    .hf_quant_file_url("https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
    .load()?;

GgufLocalLoader

GGUF models from local storage.

let model: LocalLlmModel = GgufLoader::default()
    .local_quant_file_path("/root/.cache/huggingface/hub/models--bartowski--Meta-Llama-3.1-8B-Instruct-GGUF/blobs/9da71c45c90a821809821244d4971e5e5dfad7eb091f0b8ff0546392393b6283")
    .load()?;

ApiLlmModel

  • Supports OpenAI, Anthropic, Perplexity, and adding your own API models
  • Supports prompting, tokenization, and price estimation
assert_eq!(ApiLlmModel::gpt_4_o(), ApiLlmModel {
    model_id: "gpt-4o".to_string(),
    context_length: 128000,
    cost_per_m_in_tokens: 5.00,
    max_tokens_output: 4096,
    cost_per_m_out_tokens: 15.00,
    tokens_per_message: 3,
    tokens_per_name: 1,
    tokenizer: Arc<LlmTokenizer>,
})

LlmTokenizer

  • Simple abstract API for encoding and decoding allows for abstract LLM consumption across multiple architectures
  • Uses Hugging Face's Tokenizer library for local models and Tiktoken-rs for OpenAI and Anthropic (Anthropic doesn't have a publicly available tokenizer)
// Get a Tiktoken tokenizer
let tok = LlmTokenizer::new_tiktoken("gpt-4o");

// From local path
let tok = LlmTokenizer::new_from_tokenizer_json("path/to/tokenizer.json");

// From repo
let tok = LlmTokenizer::new_from_hf_repo(hf_token, "meta-llama/Meta-Llama-3-8B-Instruct");

// From LocalLlmModel or ApiLlmModel
let tok = model.model_base.tokenizer;

Setter Traits

  • All setter traits are public, so you can integrate into your own projects if you wish
  • Examples include: OpenAiModelTrait, GgufLoaderTrait, AnthropicModelTrait, and HfTokenTrait for loading models

Dependencies

~29–41MB
~518K SLoC