21 releases
0.10.3 | Mar 31, 2024 |
---|---|
0.10.1 | Dec 20, 2023 |
0.10.0 | Aug 1, 2023 |
0.9.4 | Jun 7, 2023 |
0.2.1 |
|
#427 in Parser implementations
1,848 downloads per month
Used in 2 crates
56KB
1K
SLoC
qsv CSV sniffer
qsv-sniffer
provides methods to infer CSV file metadata (delimiter choice, quote character,
number of fields, field names, field data types, etc.). See the documentation for more details.
Its a detached fork of csv-sniffer with these additional capabilities, detecting:
- utf-8 encoding
- field names
- number of rows
- average record length
- additional data types - Date/DateTime and NULL
- smarter Boolean type detection - "true" and "false" are not the only Boolean values it detects. It now also detects 1/0, yes/no, y/n, true/false, t/f - case insensitive
ℹ️ NOTE: This fork is optimized to support qsv, and its development will be primarily dictated by qsv's requirements.
Setup
As a Command-line application
cargo install qsv-sniffer
This will install a binary named sniff
.
As a Library
Add this to your Cargo.toml
:
[dependencies]
qsv-sniffer = "0.9"
and this to your crate root:
use qsv_sniffer;
Feature flags
cli
- to build thesniff
binaryruntime-dispatch-simd
- enables detection of SIMD capabilities at runtime, which allows using the SSE2 and AVX2 code paths (only works on Intel and AMD architectures. Ignored on other architectures).generic-simd
- enables architecture-agnostic SIMD capabilities, but only works with Rust nightly.
The SIMD features are mutually exclusive and increase sampling performance.
Example
This example shows how to write a simple command-line tool for discovering the metadata of a CSV file:
use qsv_sniffer;
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() != 2 {
eprintln!("Usage: {} <file>", args[0]);
::std::process::exit(1);
}
// sniff the path provided by the first argument
match qsv_sniffer::Sniffer::new().sniff_path(&args[1]) {
Ok(metadata) => {
println!("{}", metadata);
},
Err(err) => {
eprintln!("ERROR: {}", err);
}
}
}
This example is provided as the primary binary for this crate. In the source directory, this can be run as:
$ cargo run -- tests/data/library-visitors.csv
Dependencies
~8–15MB
~158K SLoC