5 releases

Uses new Rust 2024

new 0.1.10 Mar 27, 2025
0.1.9 Mar 27, 2025
0.1.8 Mar 27, 2025
0.1.6 Mar 27, 2025
0.1.5 Mar 27, 2025

#2386 in Procedural macros

39 downloads per month
Used in rstructor

MIT license

68KB
1K SLoC

RStructor: Structured LLM Outputs for Rust

Rust 2024 License: MIT Tests Status Clippy Status

RStructor is a Rust library for extracting structured data from Large Language Models (LLMs) with built-in validation. Define your schemas as Rust structs/enums, and RStructor will handle the rest—generating JSON Schemas, communicating with LLMs, parsing responses, and validating the results.

Think of it as the Rust equivalent of Instructor + Pydantic for Python, bringing the same structured output capabilities to the Rust ecosystem.

✨ Features

  • 📝 Type-Safe Definitions: Define data models as standard Rust structs/enums with attributes
  • 🔄 JSON Schema Generation: Auto-generates JSON Schema from your Rust types
  • ✅ Built-in Validation: Type checking plus custom business rule validation
  • 🔌 Multiple LLM Providers: Support for OpenAI and Anthropic, with an extensible backend system
  • 🧩 Complex Data Structures: Support for nested objects, arrays, and optional fields
  • 🔍 Custom Validation Rules: Add domain-specific validation for reliable results
  • 🔁 Async API: Fully asynchronous API for efficient operations
  • ⚙️ Builder Pattern: Fluent API for configuring LLM clients
  • 📊 Feature Flags: Optional backends via feature flags

📦 Installation

Add RStructor to your Cargo.toml:

[dependencies]
rstructor = "0.1.0"
serde = { version = "1.0", features = ["derive"] }
tokio = { version = "1.0", features = ["rt-multi-thread", "macros"] }

🚀 Quick Start

Here's a simple example of extracting structured information about a movie from an LLM:

use rstructor::{LLMModel, LLMClient, OpenAIClient, OpenAIModel};
use serde::{Serialize, Deserialize};
use std::env;

// Define your data model
#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct Movie {
    #[llm(description = "Title of the movie")]
    title: String,
    
    #[llm(description = "Director of the movie")]
    director: String,
    
    #[llm(description = "Year the movie was released", example = 2010)]
    year: u16,
    
    #[llm(description = "Brief plot summary")]
    plot: String,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Get API key from environment
    let api_key = env::var("OPENAI_API_KEY")?;
    
    // Create an OpenAI client
    let client = OpenAIClient::new(api_key)?
        .model(OpenAIModel::Gpt35Turbo)
        .temperature(0.0)
        .build();
    
    // Generate structured information with a simple prompt
    let movie: Movie = client.generate_struct("Tell me about the movie Inception").await?;
    
    // Use the structured data
    println!("Title: {}", movie.title);
    println!("Director: {}", movie.director);
    println!("Year: {}", movie.year);
    println!("Plot: {}", movie.plot);
    
    Ok(())
}

📝 Detailed Examples

Basic Example with Validation

Add custom validation rules to enforce business logic beyond type checking:

use rstructor::{LLMModel, LLMClient, OpenAIClient, OpenAIModel, RStructorError, Result};
use serde::{Serialize, Deserialize};

#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "Information about a movie")]
struct Movie {
    #[llm(description = "Title of the movie")]
    title: String,
    
    #[llm(description = "Year the movie was released", example = 2010)]
    year: u16,
    
    #[llm(description = "IMDB rating out of 10", example = 8.5)]
    rating: f32,
}

// Add custom validation
impl Movie {
    fn validate(&self) -> Result<()> {
        // Title can't be empty
        if self.title.trim().is_empty() {
            return Err(RStructorError::ValidationError(
                "Movie title cannot be empty".to_string()
            ));
        }
        
        // Year must be in a reasonable range
        if self.year < 1888 || self.year > 2030 {
            return Err(RStructorError::ValidationError(
                format!("Movie year must be between 1888 and 2030, got {}", self.year)
            ));
        }
        
        // Rating must be between 0 and 10
        if self.rating < 0.0 || self.rating > 10.0 {
            return Err(RStructorError::ValidationError(
                format!("Rating must be between 0 and 10, got {}", self.rating)
            ));
        }
        
        Ok(())
    }
}

Complex Nested Structures

RStructor supports complex nested data structures:

use rstructor::{LLMModel, LLMClient, OpenAIClient, OpenAIModel};
use serde::{Serialize, Deserialize};

// Define a nested data model for a recipe
#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct Ingredient {
    #[llm(description = "Name of the ingredient", example = "flour")]
    name: String,
    
    #[llm(description = "Amount of the ingredient", example = 2.5)]
    amount: f32,
    
    #[llm(description = "Unit of measurement", example = "cups")]
    unit: String,
}

#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct Step {
    #[llm(description = "Order number of this step", example = 1)]
    number: u16,
    
    #[llm(description = "Description of this step", 
          example = "Mix the flour and sugar together")]
    description: String,
}

#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "A cooking recipe with ingredients and instructions")]
struct Recipe {
    #[llm(description = "Name of the recipe", example = "Chocolate Chip Cookies")]
    name: String,
    
    #[llm(description = "List of ingredients needed")]
    ingredients: Vec<Ingredient>,
    
    #[llm(description = "Step-by-step cooking instructions")]
    steps: Vec<Step>,
}

// Usage:
// let recipe: Recipe = client.generate_struct("Give me a recipe for chocolate chip cookies").await?;

Working with Enums

RStructor supports both simple enums and enums with associated data.

Simple Enums

Use enums for categorical data:

use rstructor::{LLMModel, LLMClient, AnthropicClient, AnthropicModel};
use serde::{Serialize, Deserialize};

// Define an enum for sentiment analysis
#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "The sentiment of a text")]
enum Sentiment {
    #[llm(description = "Positive or favorable sentiment")]
    Positive,
    
    #[llm(description = "Negative or unfavorable sentiment")]
    Negative,
    
    #[llm(description = "Neither clearly positive nor negative")]
    Neutral,
}

#[derive(LLMModel, Serialize, Deserialize, Debug)]
struct SentimentAnalysis {
    #[llm(description = "The text to analyze")]
    text: String,
    
    #[llm(description = "The detected sentiment of the text")]
    sentiment: Sentiment,
    
    #[llm(description = "Confidence score between 0.0 and 1.0", 
          example = 0.85)]
    confidence: f32,
}

// Usage:
// let analysis: SentimentAnalysis = client.generate_struct("Analyze the sentiment of: I love this product!").await?;

Enums with Associated Data (Tagged Unions)

RStructor also supports more complex enums with associated data:

use rstructor::{LLMModel, SchemaType};
use serde::{Deserialize, Serialize};

// Enum with different types of associated data
#[derive(LLMModel, Serialize, Deserialize, Debug)]
enum UserStatus {
    #[llm(description = "The user is online")]
    Online,

    #[llm(description = "The user is offline")]
    Offline,

    #[llm(description = "The user is away with an optional message")]
    Away(String),

    #[llm(description = "The user is busy until a specific time in minutes")]
    Busy(u32),
}

// Using struct variants for more complex associated data
#[derive(LLMModel, Serialize, Deserialize, Debug)]
enum PaymentMethod {
    #[llm(description = "Payment with credit card")]
    Card {
        #[llm(description = "Credit card number")]
        number: String,
        
        #[llm(description = "Expiration date in MM/YY format")]
        expiry: String,
    },

    #[llm(description = "Payment via PayPal account")]
    PayPal(String),

    #[llm(description = "Payment will be made on delivery")]
    CashOnDelivery,
}

// Usage:
// let user_status: UserStatus = client.generate_struct("What's the user's status?").await?;

When serialized to JSON, these enum variants with data become tagged unions:

// UserStatus::Away("Back in 10 minutes")
{
  "Away": "Back in 10 minutes"
}

// PaymentMethod::Card { number: "4111...", expiry: "12/25" }
{
  "Card": {
    "number": "4111 1111 1111 1111",
    "expiry": "12/25"
  }
}

Configuring Different LLM Providers

Choose between different providers:

// Using OpenAI
let openai_client = OpenAIClient::new(openai_api_key)?
    .model(OpenAIModel::Gpt4)
    .temperature(0.2)
    .max_tokens(1500)
    .build();

// Using Anthropic
let anthropic_client = AnthropicClient::new(anthropic_api_key)?
    .model(AnthropicModel::Claude3Sonnet)
    .temperature(0.0)
    .max_tokens(2000)
    .build();

Handling Container-Level Attributes

Add metadata and examples at the container level:

#[derive(LLMModel, Serialize, Deserialize, Debug)]
#[llm(description = "Detailed information about a movie",
      title = "MovieDetails",
      examples = [
        ::serde_json::json!({
            "title": "The Matrix",
            "director": "Lana and Lilly Wachowski",
            "year": 1999,
            "genres": ["Sci-Fi", "Action"],
            "rating": 8.7,
            "plot": "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers."
        })
      ])]
struct Movie {
    // fields...
}

📚 API Reference

LLMModel Trait

The LLMModel trait is the core of RStructor. It's implemented automatically via the derive macro and provides schema generation and validation:

pub trait LLMModel: SchemaType + DeserializeOwned + Serialize {
    fn validate(&self) -> Result<()> {
        Ok(())
    }
}

Override the validate method to add custom validation logic.

LLMClient Trait

The LLMClient trait defines the interface for all LLM providers:

#[async_trait]
pub trait LLMClient {
    async fn generate_struct<T>(&self, prompt: &str) -> Result<T>
    where
        T: LLMModel + DeserializeOwned + Send + 'static;

    async fn generate(&self, prompt: &str) -> Result<String>;
}

Supported Attributes

Field Attributes

  • description: Text description of the field
  • example: A single example value
  • examples: Multiple example values

Container Attributes

  • description: Text description of the struct or enum
  • title: Custom title for the JSON Schema
  • examples: Example instances as JSON objects

🔧 Feature Flags

Configure RStructor with feature flags:

[dependencies]
rstructor = { version = "0.1.0", features = ["openai", "anthropic"] }

Available features:

  • openai: Include the OpenAI client
  • anthropic: Include the Anthropic client
  • derive: Include the derive macro (enabled by default)

📋 Examples

See the examples/ directory for complete, working examples:

  • structured_movie_info.rs: Basic example of getting movie information with validation
  • nested_objects_example.rs: Working with complex nested structures for recipe data
  • news_article_categorizer.rs: Using enums for categorization
  • enum_with_data_example.rs: Working with enums that have associated data (tagged unions)
  • event_planner.rs: Interactive event planning with user input
  • weather_example.rs: Simple model with validation demonstration

▶️ Running the Examples

# Set environment variables
export OPENAI_API_KEY=your_openai_key_here
# or
export ANTHROPIC_API_KEY=your_anthropic_key_here

# Run examples
cargo run --example structured_movie_info
cargo run --example news_article_categorizer

🛣️ Roadmap

  • Core traits and interfaces
  • OpenAI backend implementation
  • Anthropic backend implementation
  • Procedural macro for deriving LLMModel
  • Schema generation functionality
  • Custom validation capabilities
  • Support for nested structures
  • Rich validation API with custom domain rules
  • Support for enums with associated data (tagged unions)
  • Streaming responses
  • Support for additional LLM providers
  • Integration with web frameworks (Axum, Actix)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


lib.rs:

Procedural macros for the rstructor library.

This crate provides the derive macro for implementing LLMModel and SchemaType traits from the rstructor library. It automatically generates JSON Schema representations of Rust types.

Dependencies

~0.6–1.5MB
~32K SLoC