9 releases
new 0.3.1 | Nov 26, 2024 |
---|---|
0.3.0 | Nov 19, 2024 |
0.2.5 | Nov 16, 2024 |
0.1.0 | Nov 16, 2024 |
#261 in Data structures
1,022 downloads per month
37KB
287 lines
Chunk
A Rust implementation of a persistent data structure that provides O(1) append and concatenation operations through structural sharing.
Features
- O(1) Append/Prepend Operations: Add elements to your chunk in constant time
- O(1) Concatenation: Combine two chunks efficiently
- Immutable/Persistent: All operations create new versions while preserving the original
- Memory Efficient: Uses structural sharing via reference counting
- Safe Rust: Implemented using 100% safe Rust
Theoretical Background
This implementation is inspired by the concepts presented in Hinze and Paterson's work on Finger Trees[^1], though simplified for our specific use case. While our implementation differs in structure, it shares similar performance goals and theoretical foundations.
Relationship to Finger Trees
Finger Trees are a functional data structure that supports:
- Access to both ends in amortized constant time
- Concatenation in logarithmic time
- Persistence through structural sharing
Our Chunk
implementation achieves similar goals through a simplified approach:
- We use
Append
nodes for constant-time additions - The
Concat
variant enables efficient concatenation Rc
(Reference Counting) provides persistence and structural sharing
Like Finger Trees, our structure can be viewed as an extension of Okasaki's implicit deques[^2], but optimized for our specific use cases. While Finger Trees offer a more general-purpose solution with additional capabilities, our implementation focuses on providing:
- Simpler implementation
- More straightforward mental model
- Specialized performance characteristics for append/concat operations
Performance Trade-offs
While Finger Trees achieve logarithmic time for concatenation, our implementation optimizes for constant-time operations through lazy evaluation. This means:
- Append, Prepend and concatenation are always O(1)
- The cost is deferred to when we need to materialize the sequence (via
as_vec()
) - Memory usage grows with the number of operations until materialization
This trade-off is particularly beneficial in scenarios where:
- Write operations need to be performed extensively
- Multiple transformations are chained
- Not all elements need to be materialized
- Structural sharing can be leveraged across operations
Installation
Add this to your Cargo.toml
:
[dependencies]
tailcall-chunk = "0.1.0"
Quick Start
use chunk::Chunk;
// Create a new chunk and append some elements
let chunk1 = Chunk::default()
.append(1)
.append(2);
// Create another chunk
let chunk2 = Chunk::default()
.append(3)
.append(4);
// Concatenate chunks in O(1) time
let combined = chunk1.concat(chunk2);
// Convert to vector when needed
assert_eq!(combined.as_vec(), vec![1, 2, 3, 4]);
Detailed Usage
Working with Custom Types
use chunk::Chunk;
#[derive(Debug, PartialEq)]
struct Person {
name: String,
age: u32,
}
let people = Chunk::default()
.append(Person {
name: "Alice".to_string(),
age: 30
})
.append(Person {
name: "Bob".to_string(),
age: 25
});
// Access elements
let people_vec = people.as_vec();
assert_eq!(people_vec[0].name, "Alice");
assert_eq!(people_vec[1].name, "Bob");
Memory Efficiency
The Chunk
type uses structural sharing through reference counting (Rc
), which means:
- Appending or concatenating chunks doesn't copy the existing elements
- Memory is automatically freed when no references remain
- Multiple versions of the data structure can coexist efficiently
use chunk::Chunk;
let original = Chunk::default().append(1).append(2);
let version1 = original.clone().append(3); // Efficient cloning
let version2 = original.clone().append(4); // Both versions share data
Performance Characteristics
Operation | Time Complexity | Space Complexity |
---|---|---|
new() |
O(1) | O(1) |
append() |
O(1) | O(1) |
concat() |
O(1) | O(1) |
transform() |
O(1) | O(1) |
transform_flatten() |
O(1) | O(1) |
as_vec() |
O(n) | O(n) |
clone() |
O(1) | O(1) |
Benchmark Comparison
The following table compares the actual performance of Chunk vs Vector operations based on our benchmarks (lower is better):
Operation | Chunk Performance | Vector Performance | Faster |
---|---|---|---|
Append | 604.02 µs | 560.58 µs | Vec is ~1.08 times faster |
Prepend | 1.63 ms | 21.95 ms | Chunk is ~13.47 times faster |
Concat | 71.17 ns | 494.45 µs | Chunk is ~6,947 times faster |
Clone | 4.16 ns | 1.10 µs | Chunk is ~264 times faster |
Note: These benchmarks represent specific test scenarios and actual performance may vary based on usage patterns. Chunk operations are optimized for bulk operations and scenarios where structural sharing provides benefits. View the complete benchmark code and results in our operations.rs benchmark file.
Implementation Details
The Chunk<A>
type is implemented as an enum with four variants:
Empty
: Represents an empty chunkSingle
: Represents a chunk with a single elementConcat
: Represents the concatenation of two chunksTransformFlatten
: Represents a lazy transformation and flattening of elements
The data structure achieves its performance characteristics through:
- Structural sharing using
Rc
- Lazy evaluation of concatenation and transformations
- Immutable operations that preserve previous versions
Contributing
We welcome contributions! Here's how you can help:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -am 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Please make sure to:
- Update documentation
- Add tests for new features
- Follow the existing code style
- Update the README.md if needed
License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
References
[^1]: Ralf Hinze and Ross Paterson. "Finger Trees: A Simple General-purpose Data Structure", Journal of Functional Programming 16(2):197-217, 2006. [^2]: Chris Okasaki. "Purely Functional Data Structures", Cambridge University Press, 1998.