1 unstable release
0.1.0 | Jul 12, 2024 |
---|
#4 in #l1
209 downloads per month
Used in 37 crates
(32 directly)
1.5MB
33K
SLoC
DAL (Data access Layer)
This crate provides read and write access to the main database (which is Postgres), that acts as a primary source of truth.
Current schema is managed by sqlx
. Schema changes are stored in the migrations
directory.
Schema overview
This overview skips prover-related and Ethereum sender-related tables, which are specific to the main node.
L2 blocks and L1 batches
-
miniblocks
. Stores L2 block headers. The naming is due to historic reasons. -
miniblocks_consensus
. Stores L2 block data related to the consensus algorithm used by the decentralized sequencer. Tied one-to-one to L2 blocks (the consensus side of the relation is optional). -
l1_batches
. Stores L1 batch headers. -
commitments
. Stores a part of L1 batch commitment data (event queue and bootloader memory commitments). In the future, other commitment-related columns will be moved here froml1_batches
.
Transactions
transactions
. Stores all transactions received by the node, both L2 and L1 ones. Transactions in this table are not necessarily included into an L2 block; i.e., the table is used as a persistent mempool as well.
VM storage
See zksync_state
crate for more context.
-
storage_logs
. Stores all the VM storage write logs for all transactions, as well as non-transaction writes generated by the bootloader. This is the source of truth for the VM storage; all other VM storage implementations (see thezksync_state
crate) are based on it (e.g., by adding persistent or in-memory caching). Used by multiple components including Metadata calculator, Commitment generator, API server (both for reading one-off values like account balance, and as a part of the VM sandbox) etc. -
initial_writes
. Stores initial writes information for each L1 batch, i.e., the enumeration index assigned for each key. Used when generating L1 batch metadata in Metadata calculator and Commitment generator components, and in the VM sandbox in API server for fee estimation. -
protective_reads
. Stores protective read information for each L1 batch, i.e., keys influencing VM execution for the batch that were not modified. Used when generating L1 batch metadata in Commitment generator. -
factory_deps
. Stores bytecodes of all deployed L2 contracts. -
storage
. Obsolete, going to be removed; must not be used in new code.
Other VM artifacts
-
events
. Stores all events (aka logs) emitted by smart contracts during VM execution. -
l2_to_l1_logs
. Stores L2-to-L1 logs emitted by smart contracts during VM execution. -
call_traces
. Stores call traces for transactions emitted during VM execution. (Unlike with L1 node implementations, in Era call traces are currently proactively generated for all transactions.) -
tokens
. Stores all ERC-20 tokens registered in the L1–L2 bridge. -
transaction_traces
. Obsolete, going to be removed; must not be used in new code.
Snapshot generation and recovery
See snapshots_creator
and snapshots_applier
crates for the overview of application-level nodes snapshots.
-
snapshots
. Stores metadata for all snapshots generated bysnapshots_creator
, such as the L1 batch of the snapshot. -
snapshot_recovery
. Stores metadata for the snapshot used during node recovery, if any. Currently, this table is expected to have no more than one row.
Logical invariants
In addition to foreign key constraints and other constraints manifested directly in the DB schema, the following invariants are expected to be upheld:
- If a header is present in the
miniblocks
table, it is expected that the DB contains all artifacts associated with the L2 block execution, such asevents
,l2_to_l1_logs
,call_traces
,tokens
etc. (See State keeper I/O logic for the exact definition of these artifacts.) - Likewise, if a header is present in the
l1_batches
table, all artifacts associated with the L1 batch execution are also expected in the DB, e.g.initial_writes
andprotective_reads
. (See State keeper I/O logic for the exact definition of these artifacts.) - L2 blocks and L1 batches present in the DB form a continuous range of numbers. If a DB is recovered from a node
snapshot, the first L2 block / L1 batch is the next one after the snapshot L2 block / L1 batch mentioned in the
snapshot_recovery
table. Otherwise, L2 blocks / L1 batches must start from number 0 (aka genesis). address
andkey
fields in thestorage_logs
table are not null for all blocks executed on the node (i.e., blocks the header of which is present inminiblocks
). On the other hand,address
andkey
fields may be null for snapshot storage logs. These fields are needed for some components post-processing L1 batches, such as the Merkle tree and the commitment generator. Both use(address, key)
tuples to sort logs in a batch to get canonical ordering. Since a snapshot is not post-processed in such a way, it is acceptable to skip them for the snapshot logs (and only for them).
Contributing to DAL
Some tips and tricks to make contributing to DAL easier:
- If you want to add a new DB query, search the DAL code or the
.sqlx
directory for the identical / equivalent queries. Reuse is almost always better than duplication. - It usually makes sense to instrument your queries using
instrument
tooling. See theinstrument
module docs for details. - It's best to cover added queries with unit tests to ensure they work and don't break in the future.
sqlx
has compile-time schema checking, but it's not a panacea. - If there are doubts as to the query performance, run a query with
EXPLAIN
/EXPLAIN ANALYZE
prefixes against a production-size database.
Backward compatibility
All DB schema changes are expected to be backward-compatible. That is, old code must be able to function with the new schema. As an example, dropping / renaming columns is not allowed. Instead, a 2-phase migration should be used:
- The column should be marked as obsolete, with its mentions replaced in all queries. If the column should be renamed, a new column should be created and data (if any) should be copied from the old column (see also: Programmatic migrations).
- After a significant delay (order of months), the old column may be removed in a separate migration.
Programmatic migrations
We cannot afford non-trivial amount of downtime caused by a data migration. That is, if a migration may cause such downtime (e.g., it copies non-trivial amount of data), it must be organized as a programmatic migration and run in the node background (perhaps, splitting work into chunks with a delay between them so that the migration doesn't hog all DB resources).
Dependencies
~87MB
~1.5M SLoC