1 unstable release
Uses new Rust 2024
new 0.1.0 | Mar 30, 2025 |
---|
#410 in #table
Used in tpchgen-cli
3.5MB
4K
SLoC
TPCH DataGenerator in Arrow format
This crate generates TPCH data directly into Apache Arrow format using the arrow crate
Example usage:
See docs.rs: TODO once this is published
Testing:
This crate ensures correct results using two methods.
- Basic functional tests are in Rust doc tests in the source code (
cargo test --doc
) - The
reparse
integration test ensures that the Arrow generators produce the same results as parsing the originaltbl
format (cargo test --test reparse
)
Contributing:
Please see CONTRIBUTING.md for more information on how to contribute to this project.
lib.rs
:
Generate TPCH data as Arrow RecordBatches
This crate provides generators for TPCH tables that directly produces
Arrow RecordBatch
es. This is significantly faster than generating TBL or CSV
files and then parsing them into Arrow.
Example
// Create a SF=1 generator for the LineItem table
let generator = LineItemGenerator::new(1.0, 1, 1);
let mut arrow_generator = LineItemArrow::new(generator)
.with_batch_size(10);
// The generator is a Rust iterator, producing RecordBatch
let batch = arrow_generator.next().unwrap();
// compare the output by pretty printing it
let formatted_batches = pretty_format_batches(&[batch]).unwrap().to_string();
assert_eq!(formatted_batches.lines().collect::<Vec<_>>(), vec![
"+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+-------------------------------------+",
"| l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |",
"+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+-------------------------------------+",
"| 1 | 155190 | 7706 | 1 | 17.00 | 21168.23 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | egular courts above the |",
"| 1 | 67310 | 7311 | 2 | 36.00 | 45983.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | MAIL | ly final dependencies: slyly bold |",
"| 1 | 63700 | 3701 | 3 | 8.00 | 13309.60 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | riously. regular, express dep |",
"| 1 | 2132 | 4633 | 4 | 28.00 | 28955.64 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | lites. fluffily even de |",
"| 1 | 24027 | 1534 | 5 | 24.00 | 22824.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | pending foxes. slyly re |",
"| 1 | 15635 | 638 | 6 | 32.00 | 49620.16 | 0.07 | 0.02 | N | O | 1996-01-30 | 1996-02-07 | 1996-02-03 | DELIVER IN PERSON | MAIL | arefully slyly ex |",
"| 2 | 106170 | 1191 | 1 | 38.00 | 44694.46 | 0.00 | 0.05 | N | O | 1997-01-28 | 1997-01-14 | 1997-02-02 | TAKE BACK RETURN | RAIL | ven requests. deposits breach a |",
"| 3 | 4297 | 1798 | 1 | 45.00 | 54058.05 | 0.06 | 0.00 | R | F | 1994-02-02 | 1994-01-04 | 1994-02-23 | NONE | AIR | ongside of the furiously brave acco |",
"| 3 | 19036 | 6540 | 2 | 49.00 | 46796.47 | 0.10 | 0.00 | R | F | 1993-11-09 | 1993-12-20 | 1993-11-24 | TAKE BACK RETURN | RAIL | unusual accounts. eve |",
"| 3 | 128449 | 3474 | 3 | 27.00 | 39890.88 | 0.06 | 0.07 | A | F | 1994-01-16 | 1993-11-22 | 1994-01-23 | DELIVER IN PERSON | SHIP | nal foxes wake. |",
"+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------+-------------------------------------+"
]);
Dependencies
~15–22MB
~296K SLoC