4 releases
0.1.3 | Feb 23, 2023 |
---|---|
0.1.2 | Feb 23, 2023 |
0.1.1 | Feb 23, 2023 |
0.1.0 | Feb 10, 2023 |
#2233 in Database interfaces
14KB
109 lines
CSV Uploader
A custom CSV -> DB uploader program.
Speed
Trust me, you'll need speed when uploading 5M records.
Parallelized in a two step process (looped):
- We buffer records in an array as we read and parse (ex. 1000 records). This is the reader (main thread)
- Once that array fills up, we push the asynchronous upload future/task to a stack to be executed. (ex. 4 uploader threads)
Warning!: the paralellization between threads (step 2) is still being worked on. I'm still reading up on the tokio
library lol. :)
Custom Data
As a secondary goal. We normalize the data while we parse it.
This is highly variable and dependant on two things:
- The DB and the Data Types it uses.
- The datasets we're uploading and the type of data we've seen so far.
So our current process is:
- Parse to JSON data types
- Drop any empty String values
- Parse "False" -> false, "True" -> true
- Replace ' inside Strings to " and try parsing again (because there's been some datasets in which that's been the case)
Supported DB's (for now)
- RethinkDB
Data Tested
Dependencies
~18–32MB
~576K SLoC