36 releases
new 0.7.11 | Jan 15, 2025 |
---|---|
0.7.10 | Dec 18, 2024 |
0.7.9 | Nov 25, 2024 |
0.7.2 | Jul 16, 2024 |
0.2.6 |
|
#141 in Database interfaces
126 downloads per month
110KB
2K
SLoC
Rust ORM for ScyllaDB and Apache Cassandra
Charybdis is a ORM layer on top of ScyllaDB Rust Driver focused on easy of use and performance
Usage considerations:
- Provide and expressive API for CRUD & Complex Query operations on model as a whole
- Provide easy way to work with subset of model fields by using automatically
generated
partial_<model>!
macro - Provide easy way to run complex queries by using automatically generated
find_<model>!
macro - Automatic migration tool analyzes the project files and runs migrations according to differences between the model definition and database
Performance consideration:
- It uses prepared statements (shard/token aware) -> bind values
- It expects
CachingSession
as a session arg for operations - Queries are macro generated str constants (no concatenation at runtime)
- By using
find_<model>!
macro we can run complex queries that are generated at compile time as&'static str
- Although it has expressive API it's thin layer on top of scylla_rust_driver, and it does not introduce any significant overhead
Table of Contents
- Charybdis Models
- Automatic migration with
charybdis-migrate
- Basic Operations
- Configuration Options
- Batch Operations
- Partial Model
- Callbacks
- Collection
- Ignored fields
- Roadmap
Charybdis Models
Define Tables
use charybdis::macros::charybdis_model;
use charybdis::types::{Text, Timestamp, Uuid};
#[charybdis_model(
table_name = users,
partition_keys = [id],
clustering_keys = [],
global_secondary_indexes = [username],
local_secondary_indexes = [],
static_columns = []
)]
pub struct User {
pub id: Uuid,
pub username: Text,
pub email: Text,
pub created_at: Timestamp,
pub updated_at: Timestamp,
pub address: Address,
}
Define UDT
use charybdis::macros::charybdis_udt_model;
use charybdis::types::Text;
#[charybdis_udt_model(type_name = address)]
pub struct Address {
pub street: Text,
pub city: Text,
pub state: Option<Text>,
pub zip: Text,
pub country: Text,
}
🚨 UDT fields must be in the same order as they are in the database.
Note that in order for migration to correctly detect changes on each migration, type_name
has to
match struct name. So if we have struct ReorderData
we have to use
#[charybdis_udt_model(type_name = reorderdata)]
- without underscores.
Define Materialized Views
use charybdis::macros::charybdis_view_model;
use charybdis::types::{Text, Timestamp, Uuid};
#[charybdis_view_model(
table_name=users_by_username,
base_table=users,
partition_keys=[username],
clustering_keys=[id]
)]
pub struct UsersByUsername {
pub username: Text,
pub id: Uuid,
pub email: Text,
pub created_at: Timestamp,
pub updated_at: Timestamp,
}
Resulting auto-generated migration query will be:
CREATE MATERIALIZED VIEW IF NOT EXISTS users_by_email
AS SELECT created_at, updated_at, username, email, id
FROM users
WHERE email IS NOT NULL AND id IS NOT NULL
PRIMARY KEY (email, id)
Automatic migration
-
charybdis-migrate
enables automatic migration to database without need to write migrations by hand. It iterates over project files and generates migrations based on differences between model definitions and database. It supports following operations:- Create new tables
- Create new columns
- Drop columns
- Change field types (drop and recreate column
--drop-and-replace
flag) - Create secondary indexes
- Drop secondary indexes
- Create UDTs
- Create materialized views
- Table options
#[charybdis_model( table_name = commits, partition_keys = [object_id], clustering_keys = [created_at, id], global_secondary_indexes = [], local_secondary_indexes = [], table_options = r#" CLUSTERING ORDER BY (created_at DESC) AND gc_grace_seconds = 86400 "# )] #[derive(Serialize, Deserialize, Default)] pub struct Commit {...}
- ⚠️ If table exists, table options will result in alter table query that without
CLUSTERING ORDER
andCOMPACT STORAGE
options.
- ⚠️ If table exists, table options will result in alter table query that without
Model dropping is not added. If you removed model, you need to drop table manually.
-
Running migration
cargo install charybdis-migrate migrate --hosts <host> --keyspace <your_keyspace> --drop-and-replace (optional)
-
⚠️ Always run migrations from desired directories ('src' or 'test'), to avoid scanning 'target' or other large directories.
-
⚠️ If you are working with existing datasets, before running migration you need to make sure that your model definitions structure matches the database in respect to table names, column names, column types,partition keys,clustering keys and secondary indexes so you don't alter structure accidentally. If structure is matched, it will not run any migrations. As mentioned above, in case there is no model definition for table, it will not drop it. In future, we will add
modelize
command that will generatesrc/models
files from existing data source.
-
-
Programmatically running migrations
Within testing or development environment, we can trigger migrations programmatically:
use charybdis::migrate::MigrationBuilder; let migration = MigrationBuilder::new() .keyspace("test") .drop_and_replace(true) .build(&session) .await; migration.run().await;
-
Global secondary indexes
If we have model:
#[charybdis_model( table_name = users, partition_keys = [id], clustering_keys = [], global_secondary_indexes = [username] )]
resulting query will be:
CREATE INDEX ON users (username);
-
Local secondary Indexes
Indexes that are scoped to the partition key
#[charybdis_model( table_name = menus, partition_keys = [location], clustering_keys = [name, price, dish_type], global_secondary_indexes = [], local_secondary_indexes = [dish_type] )]
resulting query will be:
CREATE INDEX ON menus((location), dish_type);
Basic Operations:
For each operation you need to bring respective trait into scope. They are defined
in charybdis::operations
module.
Insert
-
use charybdis::{CachingSession, Insert}; #[tokio::main] async fn main() { let session: &CachingSession; // init sylla session // init user let user: User = User { id, email: "charybdis@nodecosmos.com".to_string(), username: "charybdis".to_string(), created_at: Utc::now(), updated_at: Utc::now(), address: Some( Address { street: "street".to_string(), state: "state".to_string(), zip: "zip".to_string(), country: "country".to_string(), city: "city".to_string(), } ), }; // create user.insert().execute(&session).await; }
Find
-
Find by primary key
let user = User {id, ..Default::default()}; let user = user.find_by_primary_key().execute(&session).await?;
-
Find by partition key
let users = User {id, ..Default::default()}.find_by_partition_key().execute(&session).await;
-
Find by primary key associated
let users = User::find_by_primary_key_value(val: User::PrimaryKey).execute(&session).await;
-
Available find functions
use scylla::CachingSession; use charybdis::errors::CharybdisError; use charybdis::macros::charybdis_model; use charybdis::stream::CharybdisModelStream; use charybdis::types::{Date, Text, Uuid}; #[charybdis_model( table_name = posts, partition_keys = [date], clustering_keys = [category_id, title], global_secondary_indexes = [category_id], local_secondary_indexes = [title] )] pub struct Post { pub date: Date, pub category_id: Uuid, pub title: Text, } impl Post { async fn find_various(db_session: &CachingSession) -> Result<(), CharybdisError> { let date = Date::default(); let category_id = Uuid::new_v4(); let title = Text::default(); let posts: CharybdisModelStream<Post> = Post::find_by_date(date).execute(db_session).await?; let posts: CharybdisModelStream<Post> = Post::find_by_date_and_category_id(date, category_id).execute(db_session).await?; let posts: Post = Post::find_by_date_and_category_id_and_title(date, category_id, title.clone()).execute(db_session).await?; let post: Post = Post::find_first_by_date(date).execute(db_session).await?; let post: Post = Post::find_first_by_date_and_category_id(date, category_id).execute(db_session).await?; let post: Option<Post> = Post::maybe_find_first_by_date(date).execute(db_session).await?; let post: Option<Post> = Post::maybe_find_first_by_date_and_category_id(date, category_id).execute(db_session).await?; let post: Option<Post> = Post::maybe_find_first_by_date_and_category_id_and_title(date, category_id, title.clone()).execute(db_session).await?; // find by local secondary index let posts: CharybdisModelStream<Post> = Post::find_by_date_and_title(date, title.clone()).execute(db_session).await?; let post: Post = Post::find_first_by_date_and_title(date, title.clone()).execute(db_session).await?; let post: Option<Post> = Post::maybe_find_first_by_date_and_title(date, title.clone()).execute(db_session).await?; // find by global secondary index let posts: CharybdisModelStream<Post> = Post::find_by_category_id(category_id).execute(db_session).await?; let post: Post = Post::find_first_by_category_id(category_id).execute(db_session).await?; let post: Option<Post> = Post::maybe_find_first_by_category_id(category_id).execute(db_session).await?; Ok(()) } }
-
Custom filtering:
Lets use our
Post
model as an example:#[charybdis_model( table_name = posts, partition_keys = [category_id], clustering_keys = [date, title], global_secondary_indexes = [] )] pub struct Post {...}
We get automatically generated
find_post!
macro that follows conventionfind_<struct_name>!
. It can be used to create custom queries.Following will return stream of
Post
models, and query will be constructed at compile time as&'static str
.// automatically generated macro rule let posts = find_post!("category_id in ? AND date > ?", (categor_vec, date)) .execute(session) .await?;
We can also use
find_first_post!
macro to get single result:let post = find_first_post!("category_id in ? AND date > ? LIMIT 1", (date, categor_vec)) .execute(session) .await?;
If we just need the
Query
and not the result, we can usefind_post_query!
macro:let query = find_post_query!("date = ? AND category_id in ?", (date, categor_vec));
Update
-
let user = User::from_json(json); user.username = "scylla".to_string(); user.email = "some@email.com"; user.update().execute(&session).await;
-
Collection:
- Let's use our
User
model as an example:#[charybdis_model( table_name = users, partition_keys = [id], clustering_keys = [], )] pub struct User { id: Uuid, tags: Set<Text>, post_ids: List<Uuid>, }
push_to_<field_name>
andpull_from_<field_name>
methods are generated for each collection field.let user: User; user.push_tags(vec![tag]).execute(&session).await; user.pull_tags(vec![tag]).execute(&session).await; user.push_post_ids(vec![tag]).execute(&session).await; user.pull_post_ids(vec![tag]).execute(&session).await;
- Let's use our
-
Counter
- Let's define post_counter model:
#[charybdis_model( table_name = post_counters, partition_keys = [id], clustering_keys = [], )] pub struct PostCounter { id: Uuid, likes: Counter, comments: Counter, }
- We can use
increment_<field_name>
anddecrement_<field_name>
methods to update counter fields.let post_counter: PostCounter; post_counter.increment_likes(1).execute(&session).await; post_counter.decrement_likes(1).execute(&session).await; post_counter.increment_comments(1).execute(&session).await; post_counter.decrement_comments(1).execute(&session).await;
- Let's define post_counter model:
Delete
-
let user = User::from_json(json); user.delete().execute(&session).await;
-
Macro generated delete helpers
Lets use our
Post
model as an example:#[charybdis_model( table_name = posts, partition_keys = [date], clustering_keys = [categogry_id, title], global_secondary_indexes = []) ] pub struct Post { date: Date, category_id: Uuid, title: Text, id: Uuid, ... }
We have macro generated functions for up to 3 fields from primary key.
Post::delete_by_date(date: Date).execute(&session).await?; Post::delete_by_date_and_category_id(date: Date, category_id: Uuid).execute(&session).await?; Post::delete_by_date_and_category_id_and_title(date: Date, category_id: Uuid, title: Text).execute(&session).await?;
-
Custom delete queries
We can use
delete_post!
macro to create custom delete queries.delete_post!("date = ? AND category_id in ?", (date, category_vec)).execute(&session).await?
Configuration
Every operation returns CharybdisQuery
that can be configured before execution with method
chaining.
let user: User = User::find_by_id(id)
.consistency(Consistency::One)
.timeout(Some(Duration::from_secs(5)))
.execute(&app.session)
.await?;
let result: QueryResult = user.update().consistency(Consistency::One).execute(&session).await?;
Supported configuration options:
consistency
serial_consistency
timestamp
timeout
page_size
timestamp
Batch
CharybdisModelBatch
operations are used to perform multiple operations in a single batch.
-
Batch Operations
let users: Vec<User>; let batch = User::batch(); // inserts batch.append_inserts(users); // or updates batch.append_updates(users); // or deletes batch.append_deletes(users); batch.execute(&session).await?;
-
Chunked Batch Operations
Chunked batch operations are used to operate on large amount of data in chunks.
let users: Vec<User>; let chunk_size = 100; User::batch().chunked_inserts(&session, users, chunk_size).await?; User::batch().chunked_updates(&session, users, chunk_size).await?; User::batch().chunked_deletes(&session, users, chunk_size).await?;
-
Batch Configuration
Batch operations can be configured before execution with method chaining.
let batch = User::batch() .consistency(Consistency::One) .retry_policy(Some(Arc::new(DefaultRetryPolicy::new()))) .chunked_inserts(&session, users, 100) .await?;
We could also use method chaining to append operations to batch:
let batch = User::batch() .consistency(Consistency::One) .retry_policy(Some(Arc::new(DefaultRetryPolicy::new()))) .append_update(&user_1) .append_update(&user_2) .execute(data.db_session()) .await?;
-
Statements Batch
We can use batch statements to perform collection operations in batch:
let batch = User::batch(); let users: Vec<User>; for user in users { batch.append_statement(User::PUSH_TAGS_QUERY, (vec![tag], user.id)); } batch.execute(&session).await;
Partial Model:
-
Use auto generated
partial_<model>!
macro to run operations on subset of the model fields. This macro generates a new struct with same structure as the original model, but only with provided fields. Macro is automatically generated by#[charybdis_model]
. It follows conventionpartial_<struct_name>!
.// auto-generated macro - available in crate::models::user partial_user!(UpdateUsernameUser, id, username);
Now we have new struct
UpdateUsernameUser
that is equivalent toUser
model, but only withid
andusername
fields.let mut update_user_username = UpdateUsernameUser { id, username: "updated_username".to_string(), }; update_user_username.update().execute(&session).await?;
-
Partial Model Considerations:
partial_<model>
requires#[derive(Default)]
on native modelpartial_<model>
require complete primary key in definition- All derives that are defined bellow
#charybdis_model
macro will be automatically added to partial model. partial_<model>
struct implements same field attributes as native model, so if we have#[serde(rename = "rootId")]
on native model field, it will be present on partial model field.partial_<model>
should be defined in same file as native model, so it can reuse imports required by native model
-
As Native
In case we need to run operations on native model, we can use
as_native
method:let native_user: User = update_user_username.as_native().find_by_primary_key().execute(&session).await?; // action that requires native model authorize_user(&native_user);
as_native
works by returning new instance of native model with fields from partial model. For other fields it uses default values. -
Recommended naming convention is
Purpose
+Original Struct Name
. E.g:UpdateAdresssUser
,UpdateDescriptionPost
.
Callbacks
Callbacks are convenient way to run additional logic on model before or after certain operations. E.g.
- we can use
before_insert
to set default values and/or validate model before insert. - we can use
after_update
to update other data sources, e.g. elastic search.
Implementation:
- Let's say we define custom extension that will be used to
update elastic document on every post update:
pub struct AppExtensions { pub elastic_client: ElasticClient, }
- Now we can implement Callback that will utilize this extension:
#[charybdis_model(...)] pub struct Post {} impl Callback for Post { type Extention = AppExtensions; type Error = AppError; // From<CharybdisError> // use before_insert to set default values async fn before_insert( &mut self, _session: &CachingSession, extension: &AppExtensions, ) -> Result<(), CustomError> { self.id = Uuid::new_v4(); self.created_at = Utc::now(); Ok(()) } // use before_update to set updated_at async fn before_update( &mut self, _session: &CachingSession, extension: &AppExtensions, ) -> Result<(), CustomError> { self.updated_at = Utc::now(); Ok(()) } // use after_update to update elastic document async fn after_update( &mut self, _session: &CachingSession, extension: &AppExtensions, ) -> Result<(), CustomError> { extension.elastic_client.update(...).await?; Ok(()) } // use after_delete to delete elastic document async fn after_delete( &mut self, _session: &CachingSession, extension: &AppExtensions, ) -> Result<(), CustomError> { extension.elastic_client.delete(...).await?; Ok(()) } }
-
Possible Callbacks:
before_insert
before_update
before_delete
after_insert
after_update
after_delete
-
Triggering Callbacks
In order to trigger callback we use<operation>_cb
. method:insert_cb
,update_cb
,delete_cb
according traits. This enables us to have clear distinction betweeninsert
and insert with callbacks (insert_cb
). Just as on main operation, we can configure callback operation query before execution.use charybdis::operations::{DeleteWithCallbacks, InsertWithCallbacks, UpdateWithCallbacks}; post.insert_cb(app_extensions).execute(&session).await; post.update_cb(app_extensions).execute(&session).await; post.delete_cb(app_extensions).consistency(Consistency::All).execute(&session).await;
Collections
For each collection field, we get following:
PUSH_<field_name>_QUERY
static strPUSH_<field_name>_IF_EXISTS_QUERY
static str'PULL_<field_name>_QUERY
static strPULL_<field_name>_IF_EXISTS_QUERY
static strpush_<field_name>
methodpush_<field_name>_if_exists
methodpull_<field_name>
methodpull_<field_name>_if_exists
method
-
Model:
#[charybdis_model( table_name = users, partition_keys = [id], clustering_keys = [] )] pub struct User { id: Uuid, tags: Set<Text>, post_ids: List<Uuid>, books_by_genre: Map<Text, Frozen<List<Text>>>, }
-
Generated Collection Queries:
Generated query will expect value as first bind value and primary key fields as next bind values.
impl User { const PUSH_TAGS_QUERY: &'static str = "UPDATE users SET tags = tags + ? WHERE id = ?"; const PUSH_TAGS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET tags = tags + ? WHERE id = ? IF EXISTS"; const PULL_TAGS_QUERY: &'static str = "UPDATE users SET tags = tags - ? WHERE id = ?"; const PULL_TAGS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET tags = tags - ? WHERE id = ? IF EXISTS"; const PUSH_POST_IDS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids + ? WHERE id = ?"; const PUSH_POST_IDS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids + ? WHERE id = ? IF EXISTS"; const PULL_POST_IDS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids - ? WHERE id = ?"; const PULL_POST_IDS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids - ? WHERE id = ? IF EXISTS"; const PUSH_BOOKS_BY_GENRE_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre + ? WHERE id = ?"; const PUSH_BOOKS_BY_GENRE_IF_EXISTS_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre + ? WHERE id = ? IF EXISTS"; const PULL_BOOKS_BY_GENRE_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre - ? WHERE id = ?"; const PULL_BOOKS_BY_GENRE_IF_EXISTS_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre - ? WHERE id = ? IF EXISTS"; }
Now we could use this constant within Batch operations.
let batch = User::batch(); let users: Vec<User>; for user in users { batch.append_statement(User::PUSH_TAGS_QUERY, (vec![tag], user.id)); } batch.execute(&session).await;
-
Generated Collection Methods:
push_to_<field_name>
andpull_from_<field_name>
methods are generated for each collection field.let user: User::new(); user.push_tags(tags: HashSet<T>).execute(&session).await; user.push_tags_if_exists(tags: HashSet<T>).execute(&session).await; user.pull_tags(tags: HashSet<T>).execute(&session).await; user.pull_tags_if_exists(tags: HashSet<T>).execute(&session).await; user.push_post_ids(ids: Vec<T>).execute(&session).await; user.push_post_ids_if_exists(ids: Vec<T>).execute(&session).await; user.pull_post_ids(ids: Vec<T>).execute(&session).await; user.pull_post_ids_if_exists(ids: Vec<T>).execute(&session).await; user.push_books_by_genre(map: HashMap<K, V>).execute(&session).await; user.push_books_by_genre_if_exists(map: HashMap<K, V>).execute(&session).await; user.pull_books_by_genre(map: HashMap<K, V>).execute(&session).await; user.pull_books_by_genre_if_exists(map: HashMap<K, V>).execute(&session).await;
Ignored fields
We can ignore fields by using #[charybdis(ignore)]
attribute:
#[charybdis_model(...)]
pub struct User {
id: Uuid,
#[charybdis(ignore)]
organization: Option<Organization>,
}
So field organization
will be ignored in all operations and
default value will be used when deserializing from other data sources.
It can be used to hold data that is not persisted in database.
Dependencies
~13–24MB
~335K SLoC