#cache #low-latency #distributed #storage #byte-vector #sha-256

yanked blobnet

Non-volatile, distributed file cache backed by content-addressed storage

25 releases

0.3.14 Jun 20, 2023
0.3.8 May 29, 2023
0.3.5 Mar 30, 2023
0.2.7 Dec 30, 2022
0.1.0 Jul 26, 2022

#5 in #sha256

45 downloads per month

MIT license

125KB
2.5K SLoC

blobnet

Latest Version API Documentation

A configurable, low-latency blob storage server for content-addressed data.

See the API documentation for more information.

Installation

cargo install blobnet  # install server CLI / binary
cargo add blobnet      # add to your project

Authors

This library is created by the team behind Modal.


lib.rs:

Blobnet

A configurable, low-latency blob storage server for content-addressed data.

This acts as a non-volatile, over-the-network content cache. Clients can add binary blobs (fixed-size byte vectors) to the cache, and the data is indexed by its SHA-256 hash. Any blob can be retrieved given its hash and the range of bytes to read.

Data stored in blobnet is locally cached and durable.

Providers

The core of blobnet is the Provider trait. This trait defines the interface shared by all blobnet instances. It is used like so:

use std::io::Cursor;
use blobnet::ReadStream;
use blobnet::provider::{self, Provider};

// Create a new provider.
let provider = provider::Memory::new();

// Insert data, returning its hash.
let data: ReadStream = Box::pin(b"hello blobnet world!" as &[u8]);
let hash = provider.put(data).await?;

// Check if a blob exists and return its size.
let size = provider.head(&hash).await?;
assert_eq!(size, 20);

// Read the content as a binary stream.
provider.get(&hash, None).await?;
provider.get(&hash, Some((0, 10))).await?; // Requests the first 10 bytes.

You can combine these operations in any order, and they can run in parallel, since they take shared &self receivers. The semantics of each operation should behave the same regardless of provider.

The Provider trait is public, and several providers are offered, supporting storage in a local directory, network file system, or in AWS S3.

Network Server

Blobnet allows you to run it as a server and send data over the network. This serves responses to blob operations over the HTTP/2 protocol. For example, you can run a blobnet server on a local machine with

export BLOBNET_SECRET=my-secret
blobnet --source localdir:/tmp/blobnet --port 7609

This specifies the provider using a string syntax for the --source flag. You can connect to the server as a provider in another process:

use blobnet::{client::FileClient, provider};

let client = FileClient::new_http("http://localhost:7609", "my-secret");
let provider = provider::Remote::new(client);

Why would you want to share a blobnet server over the network? One use case is for shared caches.

Caching

Blobnet supports two-tiered caching of data with the Cached provider. This breaks up files into chunks with a configurable page size, storing them in a local cache directory and an in-memory page cache. By adding a cache in non-volatile storage, we can speed up file operations by multiple orders of magnitude compared to a network file system, such as:

use blobnet::provider;

// Create a new provider targeting a local NFS mount.
let provider = provider::LocalDir::new("/mnt/nfs");

/// Add a caching layer on top of the provider, with 2 MiB page size.
let provider = provider::Cached::new(provider, "/tmp/blobnet-cache", 1 << 21);

Caching is also useful for accessing remote blobnet servers. It composes well and can add more tiers to the dataflow, improving system efficiency and network load.

use blobnet::{client::FileClient, provider};

// Create a new provider fetching content over the network.
let client = FileClient::new_http("http://localhost:7609", "my-secret");
let provider = provider::Remote::new(client);

/// Add a caching layer on top of the provider, with 2 MiB page size.
let provider = provider::Cached::new(provider, "/tmp/blobnet-cache", 1 << 21);

Together these abstractions allow you to create a configurable, very low-latency content-addressed storage system.

Dependencies

~68MB
~1M SLoC