#ann-data #structure #layer #object #capabilities #ecosystem

anndata-memory

Thread-safe AnnData-like structure for single-cell genomics data in Rust. Provides controlled mutability, efficient memory management, and flexible data manipulation. Ideal for concurrent bioinformatics applications.

11 releases (1 stable)

new 1.0.0 Mar 27, 2025
0.1.2-alpha.0 Dec 23, 2024
0.1.1-alpha.2 Sep 11, 2024
0.1.0-alpha.3 Aug 28, 2024

#82 in Biology

Download history 3/week @ 2024-12-08 126/week @ 2024-12-22 4/week @ 2024-12-29 1/week @ 2025-01-05 1/week @ 2025-01-12 11/week @ 2025-02-16 4/week @ 2025-02-23 12/week @ 2025-03-02 2/week @ 2025-03-09 1/week @ 2025-03-16 98/week @ 2025-03-23

113 downloads per month
Used in single_rust

Custom license

63KB
1K SLoC

AnnData-Memory

Version License

A high-performance, thread-safe, in-memory implementation of the AnnData data structure for the SingleRust ecosystem.

Overview

AnnData-Memory provides a thread-safe, high-performance implementation of the AnnData data structure for single-cell genomics data analysis in Rust. It serves as a companion to the anndata-rs crate, focusing on efficient in-memory operations with controlled mutability and concurrent access patterns.

This library is designed to:

  • Accelerate AnnData operations through optimized in-memory structures
  • Enable safe multi-threaded access to AnnData objects
  • Provide flexible and efficient data manipulation capabilities
  • Seamlessly integrate with the broader SingleRust ecosystem

Key Features

  • Thread-Safe Data Access: Built on parking_lot locks for efficient concurrent operations
  • Controlled Mutability: Fine-grained locking mechanisms allow for safe concurrent reads and writes
  • Memory Efficiency: Optimized data structures to reduce memory overhead
  • Format Conversion: Seamless conversion between CSR and CSC sparse matrix formats
  • Efficient Subsetting: Fast subsetting operations (both in-place and copy-based)
  • H5 Interoperability: Convert between H5-backed AnnData and in-memory structures
  • Comprehensive Data Model: Full support for AnnData components (X, obs, var, layers, obsm, obsp, varm, varp, uns)

Installation

Add AnnData-Memory to your Cargo.toml:

[dependencies]
anndata-memory = "1.0.0"

Usage

Creating an AnnData Object

use anndata::{ArrayData, data::DynCsrMatrix};
use anndata_memory::{IMAnnData, IMArrayElement};
use nalgebra_sparse::{CooMatrix, CsrMatrix};

// Create a sparse matrix
let mut coo_matrix = CooMatrix::new(nrows, ncols);
coo_matrix.push(0, 0, 1.0);
coo_matrix.push(1, 2, 2.0);
// ... add more entries

let csr_matrix = CsrMatrix::from(&coo_matrix);
let matrix = DynCsrMatrix::from(csr_matrix);
let array_data = ArrayData::CsrMatrix(matrix);

// Create the AnnData object
let adata = IMAnnData::new_basic(
    array_data,
    vec!["cell1".to_string(), "cell2".to_string(), "cell3".to_string()],
    vec!["gene1".to_string(), "gene2".to_string(), "gene3".to_string()]
).unwrap();

Converting from H5-backed AnnData

use anndata::{AnnData};
use anndata_hdf5::H5;
use anndata_memory::convert_to_in_memory;

// Open an H5-backed AnnData file
let h5_file = H5::open("data.h5ad").unwrap();
let anndata = AnnData::<H5>::open(h5_file).unwrap();

// Convert to in-memory representation
let imanndata = convert_to_in_memory(anndata).unwrap();

Working with Layers

use anndata_memory::{IMAnnData, IMArrayElement};

// Add a layer to the AnnData object
let layer_name = "normalized".to_string();
adata.add_layer(layer_name.clone(), normalized_data).unwrap();

// Retrieve a layer
let layer = adata.get_layer("normalized").unwrap();

Subsetting Data

use anndata::data::SelectInfoElem;

// Create selection criteria
let obs_selection = SelectInfoElem::Index(vec![0, 2]); // Select observations 0 and 2
let var_selection = SelectInfoElem::Index(vec![1, 2]); // Select variables 1 and 2

// Create a subset of the data (creating a new object)
let subset = adata.subset(&[&obs_selection, &var_selection]).unwrap();

// Or subset in-place
adata.subset_inplace(&[&obs_selection, &var_selection]).unwrap();

Matrix Format Conversion

use anndata_memory::IMArrayElement;

// Get the X matrix and convert between CSR and CSC formats
let x = adata.x();
x.convert_matrix_format().unwrap(); // Converts CSR to CSC or vice versa

Thread Safety

AnnData-Memory is designed for safe concurrent access. The IMAnnData structure itself isn't wrapped in a lock, but each of its fields (x, obs, var, layers, etc.) is individually wrapped in a thread-safe RwSlot that allows multiple readers or a single writer at any time. This provides fine-grained control over concurrency.

use std::thread;
use std::sync::{Arc, RwLock};
use anndata_memory::IMAnnData;

// For thread-safe access to the whole object, wrap it in Arc<RwLock<>>
let adata = Arc::new(RwLock::new(adata));

// Example 1: Multiple threads accessing individual fields (safer)
let handles: Vec<_> = (0..10).map(|i| {
    let adata_clone = Arc::clone(&adata);
    thread::spawn(move || {
        // Lock the whole object only briefly to get references to fields
        let data = adata_clone.read().unwrap();
        
        // Now work with the thread-safe fields
        let x = data.x(); // Each field is already in a RwSlot
        let shape = x.get_shape().unwrap();
        
        // Process field-specific data...
        println!("Thread {} working with matrix of shape {:?}", i, shape);
    })
}).collect();

// Example 2: When you need to modify the IMAnnData structure itself
let handle = {
    let adata_clone = Arc::clone(&adata);
    thread::spawn(move || {
        // Get write lock on the entire object
        let mut data = adata_clone.write().unwrap();
        
        // Now you can safely modify any aspect of the IMAnnData
        data.subset_inplace(&[&obs_selection, &var_selection]).unwrap();
    })
};

// Wait for all threads to complete
for handle in handles {
    handle.join().unwrap();
}

Note: When performing mutations from multiple threads, you need to take extra care to avoid lock races since IMAnnData itself isn't thread-safe (only its individual fields are). For multi-threaded write operations, consider wrapping your IMAnnData instance in a RwLock or Mutex, or use the deep_clone() method to create independent copies when necessary.

Performance Considerations

  • Use get_layer_shallow() for read-only access to layers to avoid unnecessary cloning
  • Consider converting between CSR and CSC formats based on your access patterns (row-wise vs. column-wise)
  • For multi-threaded applications, balance the granularity of operations to minimize lock contention

Architecture

AnnData-Memory uses a component-based architecture:

  • IMAnnData: The main container structure, containing individually thread-safe fields
  • IMArrayElement: Thread-safe wrapper for array data (using RwSlot)
  • IMDataFrameElement: Thread-safe wrapper for DataFrames with index (using RwSlot)
  • IMAxisArrays: Thread-safe collection of arrays associated with an axis (using RwSlot)
  • IMElementCollection: Thread-safe collection of unstructured annotations (using RwSlot)
  • RwSlot: Basic building block providing controlled access to data with read-write locking

Limitations

  • View support is limited (subsetting creates copies, not views)
  • The IMAnnData structure itself isn't thread-safe, only its individual fields are
  • Some operations may involve lock races when writing to multiple fields from different threads
  • Care must be taken with concurrent operations to prevent deadlocks (as noted in the source code comments)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE.md file for details.

Acknowledgments

  • anndata-rs team for the core AnnData implementation in Rust
  • The SingleRust ecosystem contributors

Dependencies

~64MB
~1M SLoC