42 releases (4 breaking)

0.4.10 Feb 13, 2024
0.4.7 Jan 31, 2024
0.4.2 Dec 30, 2023
0.3.8 Nov 19, 2023
0.1.19 Jul 27, 2023

#68 in Database implementations

Download history 1/week @ 2024-07-14 59/week @ 2024-07-21 167/week @ 2024-07-28 59/week @ 2024-08-04 23/week @ 2024-08-11 129/week @ 2024-08-18 25/week @ 2024-08-25 69/week @ 2024-09-01 14/week @ 2024-09-08 2/week @ 2024-09-15 80/week @ 2024-09-22 128/week @ 2024-09-29 25/week @ 2024-10-06 45/week @ 2024-10-13 24/week @ 2024-10-20 122/week @ 2024-10-27

228 downloads per month
Used in heapswap

Apache-2.0

150KB
3K SLoC

LanceDB Rust

img Docs.rs

LanceDB Rust SDK, a serverless vector database.

Read more at: https://lancedb.com/


lib.rs:

VectorDB (LanceDB) -- Developer-friendly, serverless vector database for AI applications

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

  • Production-scale vector search with no servers to manage.
  • Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
  • Support for vector similarity search, full-text search and SQL.
  • Native Rust, Python, Javascript/Typescript support.
  • Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
  • GPU support in building vector indices[^note].
  • Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

[^note]: Only in Python SDK.

Getting Started

LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml:

cargo install vectordb

Quick Start

Rust API is not stable yet, please expect breaking changes.

Connect to a database.

use vectordb::connect;
let db = connect("data/sample-lancedb").await.unwrap();

LanceDB accepts the different form of database path:

  • /path/to/database - local database on file system.
  • s3://bucket/path/to/database or gs://bucket/path/to/database - database on cloud object store
  • db://dbname - Lance Cloud

You can also use ConnectOptions to configure the connectoin to the database.

use vectordb::{connect_with_options, ConnectOptions};
let options = ConnectOptions::new("data/sample-lancedb")
    .index_cache_size(1024);
let db = connect_with_options(&options).await.unwrap();

LanceDB uses arrow-rs to define schema, data types and array itself. It treats FixedSizeList<Float16/Float32> columns as vector columns.

For more details, please refer to LanceDB documentation.

Create a table

To create a Table, you need to provide a arrow_schema::Schema and a arrow_array::RecordBatch stream.

use arrow_schema::{DataType, Schema, Field};
use arrow_array::{RecordBatch, RecordBatchIterator};

let schema = Arc::new(Schema::new(vec![
  Field::new("id", DataType::Int32, false),
  Field::new("vector", DataType::FixedSizeList(
    Arc::new(Field::new("item", DataType::Float32, true)), 128), true),
]));
// Create a RecordBatch stream.
let batches = RecordBatchIterator::new(vec![
    RecordBatch::try_new(schema.clone(),
        vec![
            Arc::new(Int32Array::from_iter_values(0..1000)),
            Arc::new(FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
                (0..1000).map(|_| Some(vec![Some(1.0); 128])), 128)),
        ]).unwrap()
   ].into_iter().map(Ok),
    schema.clone());
db.create_table("my_table", Box::new(batches), None).await.unwrap();

Create vector index (IVF_PQ)

tbl.create_index(&["vector"])
    .ivf_pq()
    .num_partitions(256)
    .build()
    .await
    .unwrap();
let results = table
    .search(&[1.0; 128])
    .execute_stream()
    .await
    .unwrap()
    .try_collect::<Vec<_>>()
    .await
    .unwrap();


Dependencies

~86MB
~1.5M SLoC