26 releases (12 breaking)
new 0.15.0 | Jan 14, 2025 |
---|---|
0.14.1 | Dec 24, 2024 |
0.13.0 | Nov 15, 2024 |
0.8.0 | Jul 26, 2024 |
0.0.1 | Mar 18, 2023 |
#17 in Database implementations
4,053 downloads per month
Used in 6 crates
540KB
11K
SLoC
LanceDB Rust
LanceDB Rust SDK, a serverless vector database.
Read more at: https://lancedb.com/
[!TIP] A transitive dependency of
lancedb
islzma-sys
, which uses dynamic linking by default. If you want to statically linklzma-sys
, you should activate it'sstatic
feature by adding the following to your dependencies:lzma-sys = { version = "*", features = ["static"] }
lib.rs
:
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
The key features of LanceDB include:
- Production-scale vector search with no servers to manage.
- Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
- Support for vector similarity search, full-text search and SQL.
- Native Rust, Python, Javascript/Typescript support.
- Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
- GPU support in building vector indices[^note].
- Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
[^note]: Only in Python SDK.
Getting Started
LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml
:
cargo install lancedb
Crate Features
Experimental Features
These features are not enabled by default. They are experimental or in-development features that are not yet ready to be released.
remote
- Enable remote client to connect to LanceDB cloud. This is not yet fully implemented and should not be enabled.
Quick Start
Connect to a database.
let db = lancedb::connect("data/sample-lancedb").execute().await.unwrap();
LanceDB accepts the different form of database path:
/path/to/database
- local database on file system.s3://bucket/path/to/database
orgs://bucket/path/to/database
- database on cloud object storedb://dbname
- Lance Cloud
You can also use ConnectOptions
to configure the connection to the database.
use object_store::aws::AwsCredential;
let db = lancedb::connect("data/sample-lancedb")
.aws_creds(AwsCredential {
key_id: "some_key".to_string(),
secret_key: "some_secret".to_string(),
token: None,
})
.execute()
.await
.unwrap();
LanceDB uses arrow-rs to define schema, data types and array itself.
It treats FixedSizeList<Float16/Float32>
columns as vector columns.
For more details, please refer to LanceDB documentation.
Create a table
To create a Table, you need to provide a arrow_schema::Schema
and a arrow_array::RecordBatch
stream.
use arrow_array::{RecordBatch, RecordBatchIterator};
use arrow_schema::{DataType, Field, Schema};
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new(
"vector",
DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 128),
true,
),
]));
// Create a RecordBatch stream.
let batches = RecordBatchIterator::new(
vec![RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..256)),
Arc::new(
FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
(0..256).map(|_| Some(vec![Some(1.0); 128])),
128,
),
),
],
)
.unwrap()]
.into_iter()
.map(Ok),
schema.clone(),
);
db.create_table("my_table", Box::new(batches))
.execute()
.await
.unwrap();
Create vector index (IVF_PQ)
LanceDB is capable to automatically create appropriate indices based on the data types of the columns. For example,
- If a column has a data type of
FixedSizeList<Float16/Float32>
, LanceDB will create aIVF-PQ
vector index with default parameters. - Otherwise, it creates a
BTree
index by default.
use lancedb::index::Index;
tbl.create_index(&["vector"], Index::Auto)
.execute()
.await
.unwrap();
User can also specify the index type explicitly, see Table::create_index
.
Open table and search
let results = table
.query()
.nearest_to(&[1.0; 128])
.unwrap()
.execute()
.await
.unwrap()
.try_collect::<Vec<_>>()
.await
.unwrap();
Dependencies
~80–120MB
~2M SLoC