23 releases (10 breaking)

0.13.0 Nov 15, 2024
0.11.0 Oct 9, 2024
0.8.0 Jul 26, 2024
0.4.14 Mar 25, 2024
0.0.1 Mar 18, 2023

#20 in Database implementations

Download history 304/week @ 2024-08-07 128/week @ 2024-08-14 78/week @ 2024-08-21 165/week @ 2024-08-28 260/week @ 2024-09-04 397/week @ 2024-09-11 364/week @ 2024-09-18 483/week @ 2024-09-25 553/week @ 2024-10-02 616/week @ 2024-10-09 299/week @ 2024-10-16 396/week @ 2024-10-23 346/week @ 2024-10-30 258/week @ 2024-11-06 257/week @ 2024-11-13 205/week @ 2024-11-20

1,139 downloads per month
Used in 4 crates (3 directly)

Apache-2.0

465KB
9K SLoC

LanceDB Rust

img Docs.rs

LanceDB Rust SDK, a serverless vector database.

Read more at: https://lancedb.com/

[!TIP] A transitive dependency of lancedb is lzma-sys, which uses dynamic linking by default. If you want to statically link lzma-sys, you should activate it's static feature by adding the following to your dependencies:

lzma-sys = { version = "*", features = ["static"] }

lib.rs:

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.

The key features of LanceDB include:

  • Production-scale vector search with no servers to manage.
  • Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
  • Support for vector similarity search, full-text search and SQL.
  • Native Rust, Python, Javascript/Typescript support.
  • Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
  • GPU support in building vector indices[^note].
  • Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

[^note]: Only in Python SDK.

Getting Started

LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml:

cargo install lancedb

Crate Features

Experimental Features

These features are not enabled by default. They are experimental or in-development features that are not yet ready to be released.

  • remote - Enable remote client to connect to LanceDB cloud. This is not yet fully implemented and should not be enabled.

Quick Start

Connect to a database.

let db = lancedb::connect("data/sample-lancedb").execute().await.unwrap();

LanceDB accepts the different form of database path:

  • /path/to/database - local database on file system.
  • s3://bucket/path/to/database or gs://bucket/path/to/database - database on cloud object store
  • db://dbname - Lance Cloud

You can also use ConnectOptions to configure the connection to the database.

use object_store::aws::AwsCredential;
let db = lancedb::connect("data/sample-lancedb")
    .aws_creds(AwsCredential {
        key_id: "some_key".to_string(),
        secret_key: "some_secret".to_string(),
        token: None,
    })
    .execute()
    .await
    .unwrap();

LanceDB uses arrow-rs to define schema, data types and array itself. It treats FixedSizeList<Float16/Float32> columns as vector columns.

For more details, please refer to LanceDB documentation.

Create a table

To create a Table, you need to provide a arrow_schema::Schema and a arrow_array::RecordBatch stream.

use arrow_array::{RecordBatch, RecordBatchIterator};
use arrow_schema::{DataType, Field, Schema};

let schema = Arc::new(Schema::new(vec![
    Field::new("id", DataType::Int32, false),
    Field::new(
        "vector",
        DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 128),
        true,
    ),
]));
// Create a RecordBatch stream.
let batches = RecordBatchIterator::new(
    vec![RecordBatch::try_new(
        schema.clone(),
        vec![
            Arc::new(Int32Array::from_iter_values(0..256)),
            Arc::new(
                FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
                    (0..256).map(|_| Some(vec![Some(1.0); 128])),
                    128,
                ),
            ),
        ],
    )
    .unwrap()]
    .into_iter()
    .map(Ok),
    schema.clone(),
);
db.create_table("my_table", Box::new(batches))
    .execute()
    .await
    .unwrap();

Create vector index (IVF_PQ)

LanceDB is capable to automatically create appropriate indices based on the data types of the columns. For example,

  • If a column has a data type of FixedSizeList<Float16/Float32>, LanceDB will create a IVF-PQ vector index with default parameters.
  • Otherwise, it creates a BTree index by default.
use lancedb::index::Index;
tbl.create_index(&["vector"], Index::Auto)
   .execute()
   .await
   .unwrap();

User can also specify the index type explicitly, see Table::create_index.

let results = table
    .query()
    .nearest_to(&[1.0; 128])
    .unwrap()
    .execute()
    .await
    .unwrap()
    .try_collect::<Vec<_>>()
    .await
    .unwrap();

Dependencies

~98MB
~1.5M SLoC