#framework #index #blockchain #crypto #data

cido

Core traits and implementations for indexing with cido

2 unstable releases

0.2.0 Jan 31, 2025
0.1.0 Jan 15, 2025

#183 in #blockchain

Download history 113/week @ 2025-01-14 18/week @ 2025-01-21 145/week @ 2025-01-28 50/week @ 2025-02-04 13/week @ 2025-02-11

231 downloads per month
Used in cido-ethereum

Apache-2.0

370KB
10K SLoC

Cido

Cido is a framework for indexing events from blockchains and other services with great scalability and an easy to use graphql api. Cido is a hosted service and takes care of deploying and managing indexers and graphql apis for you.

Table of contents

Getting Started
  Setting up a project
  Working with generated code
Implementing Cidomap
Handling Events
  Defining an Event Handler
  Event Handler functions
    Handler Function
    Generator Function
    Preprocessor Function
Entities and Events
Available Types

Getting started

Setting up a project

This crate contains all the core interfaces and implementations needed to index. It also needs an implementation of the Network trait. The only current implementation is for Ethereum (which includes all geth api compatible networks) in the cido-ethereum crate. We plan on adding support for more networks soon. To get started with indexing the minimum dependencies are on this crate, a network crate, and the async-graphql crate (which we use for serving the graphql api).

# async-graphql is required because our generated code adds async-graphql derives 
# which requires end users to depend on it because of how their code is generated.
cargo add cido cido-ethereum async-graphql@=7.0.14

Working with generated code

Due to the way things have been implemented, we generate a lot of code. Everything is done through the cidomap, event_handler, entity, and event attribute macros. If you are ever running into issues and it's not clear what code is being generated or why something isn't working, each of the macros accept a top level flag called embed_generated_code. This causes the code to be written to disk and included so that the compiler will give better error messages instead of pointing at the annotation.
While you can get kind of the same output with cargo-expand, it requires rebuilding proc-macro2 which then requires rebuilding most of the project and it expands everything including format! and tracing::info! macros (which can be enormous and hard to ignore).

Implementing Cidomap

Once the project is setup, you need an implementation of the Cidomap trait. This can be done by declaring a struct and annotating it with the cidomap attribute. There are several required fields as documented below. If you've worked with other indexing frameworks before, we don't use any yaml files. Everything is in rust code, which can really cut down on boilerplate by being able to just change out the few things that need to change across different chains when the contracts are identical like starting block, contract addresses, etc.

// cido_ethereum::prelude contains everything in
// cido::prelude plus the ethereum specific items
use cido_ethereum::prelude::*;

#[cidomap(
  // config for implementing the Cidomap trait
  config = {
    // (required) What network implementation will be used (you can 
    // create your own, but there are a lot of traits to it)
    network = EthereumNetwork,
    // (required) will be optional in the future and default to
    // `CidomapError` but you can define your own error type that all
    // handlers will return as the error.
    error = CidomapError,
    // (required) which enum defines events (covered in the next section)
    events = UniswapEvents,
    // (required) const expression of which block indexing should start from 
    // (there is only one, so it needs to be set to the earliest contract that
    // you are interested in indexing)
    start_block = START_BLOCK,
    // (required) Name of function that provides initial filters.
    initial_filters = initial_filters,
    // (optional) Name of function to run once on first initialization
    init = init,
    // (optional) Name of idempotent function to run once at each startup 
    // after loading the cache and after each rollback.
    create = create,
    // (optional, defaults to 1) This can be set to 1, 2, or 3 depending
    // on how many levels of contracts need to be followed. (In the case
    // of Uniswap there is a factory contract that spawns off pairs so we
    // need to have two processing cycles)
    max_processing_order = 2,
  }
)]
struct Uniswap {
  // all entities/events that will be indexed and available in the graphql api
  pair: Pair,
}

// Block to start indexing from. This is tied to the network.
const START_BLOCK: EthereumBlockNumber = EthereumBlockNumber::new(10000834);

// Filters of starting contracts. These are tied to the network. The function
// must return a `Result<Vec<Network::TriggerFilter>, Network::Error>`
fn initial_filters() -> Result<
  Vec<<EthereumNetwork as Network>::TriggerFilter>,
  <EthereumNetwork as Network>::Error
> {
  Ok(vec![
    // ...
  ])
}

// init is any async function that takes a `Context`
// and returns a `Result<(), Cidomap::Error>`
async fn init(cx: Context<'_, Uniswap>) -> Result<(), CidomapError> {
  Ok(())
}

// create is any idempotent async function that takes a `Context`
// and returns a `Result<(), Cidomap::Error>`
async fn create(cx: Context<'_, Synthr>) -> Result<(), CidomapError> {
  Ok(())
}

Handling events

Processing order

Cido handles events in batches of blocks until caught up to the latest. There are multiple steps in processing that can happen in parallel. The only two steps that you need to be concerned with are what we call the preprocessing and sync steps. The sync step is where the main processing logic is handled and it is done one event at a time serially, just like the blockchain does. This prevents any inconsistencies between runs, but it can also be a bottleneck if you're waiting for database reads or for I/O over the network. The preprocessing step allows for any I/O to be completed before the sync and those results are "cached" and then made available during the generator step and the sync step.

The difference between the generator and sync is that if new events are being searched for, we need to run through all the steps multiple times. The generator function is called for all but the last time and the sync step is called the last time through once all the events have been gathered. All events are processed in blockchain order in the generator and sync steps. The functions for the generator and sync functions do not need to be Send because they are always handled in the same thread. This may change in the future when there are partitioned blockchains and we can process events in parallel like the blockchain does.

Defining an EventHandler

To handle events you need to create an enum that contains all of the events you're interested in processing. The event_handler macro does all the necessary implementations for you. This is what it looks like:

// The `cidomap` field allows us to reuse the `Network`
// and other definitions from the `Cidomap` struct.
#[event_handler(cidomap = Uniswap)]
pub enum UniswapEvents {
  // Each variant needs to be annotated with a handler.
  #[handler(
    // (Required) This is the handler function where most logic is implemented
    fn = factory::handle_new_pair,
    // (Optional) Only needed if spawning off new contracts to look for more events
    generator = factory::pair_generator,
    // (Optional) Only needed if you need to access the `Network` before the 
    // handler fn is called In this example some attributes of the `Token`s
    // that make up a pair need to be queried so that we don't have to wait
    // for those results. If this is not set then the cache type is set to
    // the unit type `()`
    preprocessor = {
      // (Required) Signature described below 
      fn = factory::pair_preprocessor,
      // (Required) The type that will be returned from the preprocessor function
      // and will be made available in the above handler and generator functions
      cache = Option<factory::CachedPair>
    }
  )]
  PairCreated(factory_contract::PairCreated),
  // pair events:
  #[handler(fn = core::handle_burn)]
  Burn(pair_contract::Burn),
  #[handler(fn = core::handle_mint)]
  Mint(pair_contract::Mint),
  #[handler(fn = core::handle_sync)]
  Sync(pair_contract::Sync),
  #[handler(fn = core::handle_swap)]
  Swap(pair_contract::Swap),
  #[handler(
    fn = core::handle_transfer,
    preprocessor = {
      fn = core::transfer_preprocessor,
      cache = CachedTransfer
    }
  )]
  Transfer(pair_contract::Transfer),
}

Event Handler Functions

Each of the different handler functions take roughly the same types with minor differences. They've been designed after web frameworks like Axum so that you can change the order or kind of values that your function accepts so that you don't have to have ignored variables. If the signature doesn't match what is expected you'll get errors from the event_handler annotation about arguments being incorrect if you are borrowing in the handler, or expecting something owned in the other functions. You can also get an error like

the trait `Handler<_, _, Cidomap>` is not implemented for fn item

if the types don't implement the necessary traits. In that case, make sure you are wrapping the event and cache types with the Event and Cache wrappers.

Handler Function

The handler function is expecting the path to a function that does not need to be Send with a signature that contains at least one of the parameters in any order like:

async fn handle_event(
  // Context for interacting with the database and network
  cx: Context<'_, YourCidomap>,
  // Information about the event, like what filter generated it
  // , what Log/Block it is from, etc.
  meta: MetaEvent<YourCidomap>,
  // The actual generated event. For ethereum this is generally
  // events from a contract.
  Event(event): Event<network::Event>,
  // The struct generated from the preprocessor step. If not used,
  // you can ignore this as it will be the unit type
  Cache(cache): Cache<CacheStruct>,
)

The event and cache fields come wrapped in their own types so that the compiler can be convinced that the impl that allows you to swap the ordering of any of the fields doesn't have conflicting impls.
This function is also somewhat different from the other two because it takes ownership of the values. Once this function runs there is no need to keep any more of the event or block information.

Generator Function

The generator function is expecting the path to a function that does not need to be Send with a signature that contains at least one of the parameters in any order like:

async fn generator(
  cx: GeneratorContext<'_, YourCidomap>,
  meta: &MetaEvent<YourCidomap>,
  Event(event): Event<&network::Event>,
  Cache(cache): Cache<&CacheStruct>,
) -> {...}

This function borrows each of the types. Because they will be used in the handler function later. The GeneratorContext only has access to the network and to spawn off more event filters. This is the only place that can happen to prevent subtle bugs dealing with spawning event filters too late in the process

Preprocessor Function

The preprocessor function is expecting the path to a function that must be Send with a signature that contains at least one of the parameters in any order like:

async fn preprocessor(
  cx: &PreprocessingContext<YourCidomap>,
  meta: &MetaEvent<YourCidomap>,
  Event(event): Event<&network::Event>,
) -> Result<CacheStruct, Cidomap::Error> {...}

The return value needs to match the cache type in the annotation. Because this function generates the cache type it is not available. The PreprocessingContext has access to the network and a synchronization primitive that can be used to ensure consistent results every time.

Entities and Events

There are currently two classifications of structs that can be stored in the database. Entities are structs that can change over time so we keep track of the blocks where they change and create new rows so that we can do historical point in time queries. Events are things that get stored once and never change. Once an event has been created any attempts to update it will fail after the block it was created on is finished processing. Annotating a struct with either entity or event will implement the required Transformer traits. Both annotations use the same underlying code generation for the most part, but they are different enough that we believe they warranted top level annotations instead of just an extra option like immutable = true or something like that.

Here is an example of creating the Pair struct mentioned in the Uniswap struct above:

#[entity(
  // (required) Sets the related Cidomap for implementing some required traits
  cidomap = Uniswap
)]
#[derive(Debug, Clone, PartialEq, Hash, SmartDefault)]
pub struct Pair {
  pub id: H160,
  #[default(Utc::now())]
  pub created_at_timestamp: DateTime<Utc>,
  pub created_at_blocknumber: BigInt,
  #[entity]
  pub token0: Token,
  #[entity]
  pub token1: Token,
  pub reserve0: BigDecimal,
  pub reserve1: BigDecimal,
  pub total_supply: BigDecimal,
  /// Price in terms of the asset pair
  pub token0_price: BigDecimal,
  pub token1_price: BigDecimal,

  // lifetime volume stats
  pub volume_token0: BigDecimal,
  pub volume_token1: BigDecimal,
  #[gql(rename = volumeUSD)]
  pub volume_usd: BigDecimal,
  #[gql(rename = untrackedVolumeUSD)]
  pub untracked_volume_usd: BigDecimal,
  pub tx_count: BigInt,
  pub liquidity_provider_count: BigInt,
  #[indexed]
  #[gql(rename = reserveETH)]
  pub reserve_eth: BigDecimal,
  #[indexed]
  #[gql(rename = reserveUSD)]
  pub reserve_usd: BigDecimal,
  #[indexed]
  #[gql(rename = trackedReserveETH)]
  pub tracked_reserve_eth: BigDecimal,

  #[derived_from(field = pair)]
  pub pair_hour_data: Vec<PairHourData>,

  #[derived_from(field = pair)]
  pub liquidity_positions: Vec<LiquidityPosition>,

  #[derived_from(field = pair)]
  pub liquidity_position_snapshots: Vec<LiquidityPositionSnapshot>,

  #[derived_from(field = pair)]
  pub mints: Vec<Mint>,

  #[derived_from(field = pair)]
  pub burns: Vec<Burn>,

  #[derived_from(field = pair)]
  pub swaps: Vec<Swap>,
}

Only the cidomap field is required in the annotation. There are several more options to customize functionality and naming in the graphql api. An id field is required either by naming it id or by annotating it with #[id].

Any referenced fields (in this case token0 and token1) need to be annotated with what type they are. This allows filtering based on the referenced type in the graphql api and makes code generation use the correct type (the id of the related type). Cido only indexes the fields indicated so that we can better manage the cost of inserts/updates and keeping data long term. Any field that is annotated with #[entity] or #[event] implies the #[indexed] annotation. Only index fields that you will be using as filters in queries.

Any fields annotated with the #[derived_from] annotation are not actually available in the struct, they are resolved in the graphql api and the field tells us how to tie the queries together.

Available types

To have a type be used for indexing it must implement the necessary async-graphql, sqlx, and stable-hash traits. The following is an incomplete table of types that are supported:

Type
bool
i16
i32
i64
String
cido::H<N>
cido::U<N>
cido::BigDecimal
cido::BigInt
cido::Bytes
chrono::DateTime
uuid::Uuid

Dependencies

~66MB
~1.5M SLoC