#read-write #async-read #parquet #async-write #thrift #hadoop #file-format

parquet-format-async-temp

Temporary crate containing thrift library + parquet definitions compiled to support read+write async

5 unstable releases

0.3.1 Jun 14, 2022
0.3.0 Mar 15, 2022
0.2.0 Aug 28, 2021
0.1.1 Aug 10, 2021
0.1.0 Aug 8, 2021

#611 in Asynchronous

Download history 89/week @ 2024-11-16 127/week @ 2024-11-23 147/week @ 2024-11-30 254/week @ 2024-12-07 253/week @ 2024-12-14 128/week @ 2024-12-21 61/week @ 2024-12-28 171/week @ 2025-01-04 257/week @ 2025-01-11 228/week @ 2025-01-18 163/week @ 2025-01-25 367/week @ 2025-02-01 433/week @ 2025-02-08 202/week @ 2025-02-15 240/week @ 2025-02-22 114/week @ 2025-03-01

1,093 downloads per month

Apache-2.0

465KB
10K SLoC

parquet-format-async-temp

This is a temporary crate containing a subset of rust's thirft library and parquet to support native async parquet read and write.

Specifically, it:

  • supports async read API (via futures)
  • supports async write API (via futures)
  • the write API returns the number of written bytes

It must be used with the fork of thrift's compiler available at https://github.com/jorgecarleitao/thrift/tree/write_size .

Why

To read and write files with thrift (e.g. parquet) without commiting to a particular runtime (e.g. tokio, hyper, etc.), the protocol needs to support AsyncRead + AsyncSeek and AsyncWrite respectively.

To not require Seek and AsyncSeek on write, the protocol must return the number of written bytes on its write_* API.

This crate addresses these two concerns for parquet. It is essentially:

Dependencies

~1–1.8MB
~35K SLoC