5 releases
0.2.1 | Feb 26, 2020 |
---|---|
0.2.0 | Feb 26, 2020 |
0.1.2 | Oct 21, 2019 |
0.1.1 | Jan 1, 2018 |
0.1.0 | Jan 1, 2018 |
#2234 in Algorithms
24KB
312 lines
Lazy Transducer
Lazy transducers are generic, lazy, parallel, indexable iterators transforming one data source into n
output data types.
See the online documentation for more information.
Using
Add this to your Cargo.toml
[dependencies]
lazy_transducer = "0.1"
Example
extern crate lazy_transducer;
extern crate rayon;
use rayon::prelude::*;
use lazy_transducer::LazyTransducer;
fn main() {
let data = [0xdeadbeefu32, 0xcafed00d];
let lt: LazyTransducer<&[u32], u64> = LazyTransducer::new(&data, 2, |input, idx| input[idx] as u64);
let cafedood = lt.get(1).expect("has 2 elements");
assert_eq!(cafedood, 0xcafed00d);
lt.into_par_iter().for_each(|elem| {
println!("{}", elem);
});
}
lib.rs
:
Introduction
A lazy transducer is a declarative specification for the transformation of one type of data to another.
The transformer is called the transducer, which receives the original source input, and an index
corresponding to the i
th element, and returns the corresponding i
th element out of that source.
Importantly, lazy transducers are:
- Lazy - it never parses any elements unless you request it to do so
- Iterable - one can iterate over every element
- Indexable - accessing an element is O(1)
- Parallel - one can iterate in parallel over every element
When constructing a lazy transducer, it's important to remember that the transducer must be a fn
with signature (Input, usize -> Output)
; a closure which does not capture its environment also
can be used.
The input source needs to be copy; i.e., it should typically be a reference to Vec<_>
, or a slice of bytes,
etc.
Another detail that you should keep in mind is that the index is the i
th element; therefore if
your backing data source is bytes, you need to know the fixed length of your data structure you
are parsing out of the byte slice. It is up to the transducer implementation to determine this.
In some cases, the size of the data structure/output element isn't known statically via sizeof
, etc. In these cases
it is suggested to pass this "context" in the input source during construction as a tuple, e.g.:
(datasource, context)
, and have your transducer use the datasource + context + the index
to correctly
marshall your data structure out of the bytes.
This is the approach that the scroll-based transducer takes. See also the bincode example for a similar approach.
The parallel implementation uses rayon.
Example
extern crate lazy_transducer;
extern crate rayon;
use rayon::prelude::*;
use lazy_transducer::LazyTransducer;
use std::mem::size_of;
use std::mem::transmute;
let bytes: Vec<u8> = vec![1u8, 0, 2, 0, 3, 0, 4, 0];
let lt: LazyTransducer<_, &u16> = LazyTransducer::new(&bytes, 4, |input, index| {
let start = size_of::<u16>() * index;
unsafe { transmute::<_, &u16>(&input[start]) }
});
// now iterate in parallel
lt.into_par_iter().for_each(|n| {
// do stuff
println!("{}", n);
});
Dependencies
~1.4–2MB
~42K SLoC