6 releases

Uses new Rust 2024

new 0.3.1	Apr 19, 2025
0.3.0	Apr 19, 2025
0.2.0	Apr 19, 2025
0.1.2	Apr 19, 2025

#217 in Machine learning

106 downloads per month

MIT license

32KB
565 lines

ezllama

An opinionated, simple Rust interface for local LLMs, powered by llama-cpp-2.

Features

Simple API: Designed for ease of use with a clean, intuitive interface
Text and Chat Completion: Support for both text and chat completion tasks
Infinite token generation: Automatically manages the cache for infinite token generation
Tracing Integration: Built-in logging via the tracing ecosystem

Right now it only supports the basics, but I might add more features in the future as I need them.

You can try out the chatbot from this repo:

cargo run --example chatbot --features <backend> -- model.gguf

Installation

Add ezllama to your Cargo.toml:

[dependencies]
ezllama = "*"

For GPU acceleration, enable the appropriate feature (if you are using CUDA, go run some errands when compiling):

[dependencies]
ezllama = { version = "*", features = ["cuda"] }  # For CUDA support
# or
ezllama = { version = "*", features = ["metal"] }  # For Metal support (macOS)
# or
ezllama = { version = "*", features = ["vulkan"] }  # For Vulkan support

Quick Start

Note: Make sure you grab a GGUF model from Hugging Face or elsewhere.

use ezllama::{ContextParams, Model, ModelParams, Result};
use std::path::PathBuf;

fn main() -> Result<()> {
    // Initialize the model
    let model_params = ModelParams {
        model_path: PathBuf::from("path/to/your/model.gguf"),
        ..Default::default()
    };
    let context_params = ContextParams {
        ctx_size: Some(2048),
        ..Default::default()
    };

    let model = Model::new(&model_params)?;

    let mut chat_session = model.create_chat_session(&context_params)?;

    // First turn
    chat_session.add_user_message("Hello, can you introduce yourself?");
    let response1 = chat_session.generate()?.take(128);
    print!("Assistant: ");
    for token in response1 {
        print!("{}", token);
        std::io::stdout().flush()?;
    }

    // Second turn (can do the same in one step)
    let response2 = chat_session.prompt("What can you help me with?")?.join();
    println!("Assistant: {}", response2);

    Ok(())
}

Advanced Usage

Text completion

// Create a text session for text completion
let mut text_session = model.create_text_session(&context_params)?;
let output = text_session.prompt("Once upon a time")?.join();

// Continue generating from the existing context
let more_output = text_session.prompt(" and then")?.join();

System Messages

// Create a chat session with a system message
let mut chat_session = model.create_chat_session_with_system(
    "You are a helpful assistant that specializes in Rust programming.",
    &context_params
)?;

// Or add a system message to an existing session
let mut chat_session = model.create_chat_session(&context_params)?;
chat_session.add_system_message("You are a helpful assistant.");

// One-shot completion with system message
let response = model.chat_completion_with_system(
    "You are a concise assistant.",
    "Explain quantum computing.",
    &context_params
)?.join();

Custom Chat Templates

// Create a chat session with a custom template
let template = "{{0_role}}: {{0_content}}\n{{1_role}}: {{1_content}}";
let mut chat_session = model.create_chat_session_with_template(template.to_string(), &context_params)?;

License

Licensed under MIT.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~7–11MB
~207K SLoC