2 unstable releases
0.2.0 | Jan 31, 2025 |
---|---|
0.1.0 | Jan 15, 2025 |
#183 in #blockchain
231 downloads per month
Used in cido-ethereum
370KB
10K
SLoC
Cido
Cido is a framework for indexing events from blockchains and other services with great scalability and an easy to use graphql api. Cido is a hosted service and takes care of deploying and managing indexers and graphql apis for you.
Table of contents
Getting Started
Setting up a project
Working with generated code
Implementing Cidomap
Handling Events
Defining an Event Handler
Event Handler functions
Handler Function
Generator Function
Preprocessor Function
Entities and Events
Available Types
Getting started
Setting up a project
This crate contains all the core interfaces and implementations needed to index.
It also needs an implementation of the Network
trait. The only current implementation
is for Ethereum (which includes all geth api compatible networks) in the
cido-ethereum crate. We plan on adding support
for more networks soon. To get started with indexing the minimum dependencies are on
this crate, a network crate, and the async-graphql crate (which we use for serving the
graphql api).
# async-graphql is required because our generated code adds async-graphql derives
# which requires end users to depend on it because of how their code is generated.
cargo add cido cido-ethereum async-graphql@=7.0.14
Working with generated code
Due to the way things have been implemented, we generate a lot of code. Everything is done through
the cidomap
, event_handler
, entity
, and event
attribute macros. If you are ever running into
issues and it's not clear what code is being generated or why something isn't working, each of the
macros accept a top level flag called embed_generated_code
. This causes the code to be written
to disk and included so that the compiler will give better error messages instead of pointing
at the annotation.
While you can get kind of the same output with cargo-expand
, it requires rebuilding proc-macro2
which then requires rebuilding most of the project and it expands everything including format!
and tracing::info!
macros (which can be enormous and hard to ignore).
Implementing Cidomap
Once the project is setup, you need an implementation of the Cidomap
trait. This can
be done by declaring a struct and annotating it with the cidomap
attribute. There are
several required fields as documented below. If you've worked with other indexing frameworks
before, we don't use any yaml files. Everything is in rust code, which can really cut down
on boilerplate by being able to just change out the few things that need to change across
different chains when the contracts are identical like starting block, contract addresses, etc.
// cido_ethereum::prelude contains everything in
// cido::prelude plus the ethereum specific items
use cido_ethereum::prelude::*;
#[cidomap(
// config for implementing the Cidomap trait
config = {
// (required) What network implementation will be used (you can
// create your own, but there are a lot of traits to it)
network = EthereumNetwork,
// (required) will be optional in the future and default to
// `CidomapError` but you can define your own error type that all
// handlers will return as the error.
error = CidomapError,
// (required) which enum defines events (covered in the next section)
events = UniswapEvents,
// (required) const expression of which block indexing should start from
// (there is only one, so it needs to be set to the earliest contract that
// you are interested in indexing)
start_block = START_BLOCK,
// (required) Name of function that provides initial filters.
initial_filters = initial_filters,
// (optional) Name of function to run once on first initialization
init = init,
// (optional) Name of idempotent function to run once at each startup
// after loading the cache and after each rollback.
create = create,
// (optional, defaults to 1) This can be set to 1, 2, or 3 depending
// on how many levels of contracts need to be followed. (In the case
// of Uniswap there is a factory contract that spawns off pairs so we
// need to have two processing cycles)
max_processing_order = 2,
}
)]
struct Uniswap {
// all entities/events that will be indexed and available in the graphql api
pair: Pair,
}
// Block to start indexing from. This is tied to the network.
const START_BLOCK: EthereumBlockNumber = EthereumBlockNumber::new(10000834);
// Filters of starting contracts. These are tied to the network. The function
// must return a `Result<Vec<Network::TriggerFilter>, Network::Error>`
fn initial_filters() -> Result<
Vec<<EthereumNetwork as Network>::TriggerFilter>,
<EthereumNetwork as Network>::Error
> {
Ok(vec![
// ...
])
}
// init is any async function that takes a `Context`
// and returns a `Result<(), Cidomap::Error>`
async fn init(cx: Context<'_, Uniswap>) -> Result<(), CidomapError> {
Ok(())
}
// create is any idempotent async function that takes a `Context`
// and returns a `Result<(), Cidomap::Error>`
async fn create(cx: Context<'_, Synthr>) -> Result<(), CidomapError> {
Ok(())
}
Handling events
Processing order
Cido handles events in batches of blocks until caught up to the latest. There are multiple steps
in processing that can happen in parallel. The only two steps that you need to be concerned with
are what we call the preprocessing
and sync
steps. The sync
step is where the main processing
logic is handled and it is done one event at a time serially, just like the blockchain does. This
prevents any inconsistencies between runs, but it can also be a bottleneck if you're waiting for
database reads or for I/O over the network. The preprocessing
step allows for any I/O to be
completed before the sync
and those results are "cached" and then made available during the
generator
step and the sync
step.
The difference between the generator
and sync
is that if new events are being searched for, we
need to run through all the steps multiple times. The generator
function is called for all but
the last time and the sync
step is called the last time through once all the events have been
gathered. All events are processed in blockchain order in the generator
and sync
steps. The
functions for the generator
and sync
functions do not need to be Send
because they are always
handled in the same thread. This may change in the future when there are partitioned blockchains
and we can process events in parallel like the blockchain does.
Defining an EventHandler
To handle events you need to create an enum that contains all of the events you're interested in
processing. The event_handler
macro does all the necessary implementations for you. This is
what it looks like:
// The `cidomap` field allows us to reuse the `Network`
// and other definitions from the `Cidomap` struct.
#[event_handler(cidomap = Uniswap)]
pub enum UniswapEvents {
// Each variant needs to be annotated with a handler.
#[handler(
// (Required) This is the handler function where most logic is implemented
fn = factory::handle_new_pair,
// (Optional) Only needed if spawning off new contracts to look for more events
generator = factory::pair_generator,
// (Optional) Only needed if you need to access the `Network` before the
// handler fn is called In this example some attributes of the `Token`s
// that make up a pair need to be queried so that we don't have to wait
// for those results. If this is not set then the cache type is set to
// the unit type `()`
preprocessor = {
// (Required) Signature described below
fn = factory::pair_preprocessor,
// (Required) The type that will be returned from the preprocessor function
// and will be made available in the above handler and generator functions
cache = Option<factory::CachedPair>
}
)]
PairCreated(factory_contract::PairCreated),
// pair events:
#[handler(fn = core::handle_burn)]
Burn(pair_contract::Burn),
#[handler(fn = core::handle_mint)]
Mint(pair_contract::Mint),
#[handler(fn = core::handle_sync)]
Sync(pair_contract::Sync),
#[handler(fn = core::handle_swap)]
Swap(pair_contract::Swap),
#[handler(
fn = core::handle_transfer,
preprocessor = {
fn = core::transfer_preprocessor,
cache = CachedTransfer
}
)]
Transfer(pair_contract::Transfer),
}
Event Handler Functions
Each of the different handler functions take roughly the same types with minor differences. They've
been designed after web frameworks like Axum so that you can change the order or kind of values that
your function accepts so that you don't have to have ignored variables. If the signature doesn't match
what is expected you'll get errors from the event_handler
annotation about arguments being incorrect
if you are borrowing in the handler, or expecting something owned in the other functions. You can also
get an error like
the trait `Handler<_, _, Cidomap>` is not implemented for fn item
if the types don't implement the necessary traits. In that case, make sure you are wrapping the event
and cache types with the Event
and Cache
wrappers.
Handler Function
The handler function is expecting the path to a function that does not need to be Send
with a signature
that contains at least one of the parameters in any order like:
async fn handle_event(
// Context for interacting with the database and network
cx: Context<'_, YourCidomap>,
// Information about the event, like what filter generated it
// , what Log/Block it is from, etc.
meta: MetaEvent<YourCidomap>,
// The actual generated event. For ethereum this is generally
// events from a contract.
Event(event): Event<network::Event>,
// The struct generated from the preprocessor step. If not used,
// you can ignore this as it will be the unit type
Cache(cache): Cache<CacheStruct>,
)
The event and cache fields come wrapped in their own types so that the compiler can be convinced that
the impl that allows you to swap the ordering of any of the fields doesn't have conflicting impls.
This function is also somewhat different from the other two because it takes ownership of the values.
Once this function runs there is no need to keep any more of the event or block information.
Generator Function
The generator function is expecting the path to a function that does not need to be Send
with a
signature that contains at least one of the parameters in any order like:
async fn generator(
cx: GeneratorContext<'_, YourCidomap>,
meta: &MetaEvent<YourCidomap>,
Event(event): Event<&network::Event>,
Cache(cache): Cache<&CacheStruct>,
) -> {...}
This function borrows each of the types. Because they will be used in the handler function later.
The GeneratorContext
only has access to the network and to spawn off more event filters. This is
the only place that can happen to prevent subtle bugs dealing with spawning event filters too late
in the process
Preprocessor Function
The preprocessor function is expecting the path to a function that must be Send
with a signature
that contains at least one of the parameters in any order like:
async fn preprocessor(
cx: &PreprocessingContext<YourCidomap>,
meta: &MetaEvent<YourCidomap>,
Event(event): Event<&network::Event>,
) -> Result<CacheStruct, Cidomap::Error> {...}
The return value needs to match the cache type in the annotation. Because this function generates the
cache type it is not available. The PreprocessingContext
has access to the network and a synchronization
primitive that can be used to ensure consistent results every time.
Entities and Events
There are currently two classifications of structs that can be stored in the database. Entities are
structs that can change over time so we keep track of the blocks where they change and create new
rows so that we can do historical point in time queries. Events are things that get stored once and
never change. Once an event has been created any attempts to update it will fail after the block it
was created on is finished processing. Annotating a struct with either entity
or event
will
implement the required Transformer
traits. Both annotations use the same underlying code generation
for the most part, but they are different enough that we believe they warranted top level annotations
instead of just an extra option like immutable = true
or something like that.
Here is an example of creating the Pair
struct mentioned in the Uniswap
struct above:
#[entity(
// (required) Sets the related Cidomap for implementing some required traits
cidomap = Uniswap
)]
#[derive(Debug, Clone, PartialEq, Hash, SmartDefault)]
pub struct Pair {
pub id: H160,
#[default(Utc::now())]
pub created_at_timestamp: DateTime<Utc>,
pub created_at_blocknumber: BigInt,
#[entity]
pub token0: Token,
#[entity]
pub token1: Token,
pub reserve0: BigDecimal,
pub reserve1: BigDecimal,
pub total_supply: BigDecimal,
/// Price in terms of the asset pair
pub token0_price: BigDecimal,
pub token1_price: BigDecimal,
// lifetime volume stats
pub volume_token0: BigDecimal,
pub volume_token1: BigDecimal,
#[gql(rename = volumeUSD)]
pub volume_usd: BigDecimal,
#[gql(rename = untrackedVolumeUSD)]
pub untracked_volume_usd: BigDecimal,
pub tx_count: BigInt,
pub liquidity_provider_count: BigInt,
#[indexed]
#[gql(rename = reserveETH)]
pub reserve_eth: BigDecimal,
#[indexed]
#[gql(rename = reserveUSD)]
pub reserve_usd: BigDecimal,
#[indexed]
#[gql(rename = trackedReserveETH)]
pub tracked_reserve_eth: BigDecimal,
#[derived_from(field = pair)]
pub pair_hour_data: Vec<PairHourData>,
#[derived_from(field = pair)]
pub liquidity_positions: Vec<LiquidityPosition>,
#[derived_from(field = pair)]
pub liquidity_position_snapshots: Vec<LiquidityPositionSnapshot>,
#[derived_from(field = pair)]
pub mints: Vec<Mint>,
#[derived_from(field = pair)]
pub burns: Vec<Burn>,
#[derived_from(field = pair)]
pub swaps: Vec<Swap>,
}
Only the cidomap field is required in the annotation. There are several more options to customize
functionality and naming in the graphql api. An id field is required either by naming it id
or
by annotating it with #[id]
.
Any referenced fields (in this case token0
and token1
) need to be annotated with what type they
are. This allows filtering based on the referenced type in the graphql api and makes code generation
use the correct type (the id of the related type). Cido only indexes the fields indicated so that we
can better manage the cost of inserts/updates and keeping data long term. Any field that is annotated
with #[entity]
or #[event]
implies the #[indexed]
annotation. Only index fields that you will
be using as filters in queries.
Any fields annotated with the #[derived_from]
annotation are not actually available in the struct,
they are resolved in the graphql api and the field
tells us how to tie the queries together.
Available types
To have a type be used for indexing it must implement the necessary async-graphql
, sqlx
, and
stable-hash
traits. The following is an incomplete table of types that are supported:
Type |
---|
bool |
i16 |
i32 |
i64 |
String |
cido::H<N> |
cido::U<N> |
cido::BigDecimal |
cido::BigInt |
cido::Bytes |
chrono::DateTime |
uuid::Uuid |
Dependencies
~66MB
~1.5M SLoC