5 releases
new 0.2.1 | Feb 20, 2025 |
---|---|
0.2.0 | Feb 20, 2025 |
0.1.2 | Feb 6, 2025 |
0.1.1 | Feb 6, 2025 |
0.1.0 | Feb 5, 2025 |
#573 in Parser implementations
391 downloads per month
270KB
7K
SLoC
json-five-rs
This project provides a handwritten JSON5 tokenizer and recursive descent parser compatible with serde
.
Key Features
- Compatible with
serde
data model - Supports round-trip use cases with preservation/editing of whitespace and comments
- Supports formatting (indent, compact formats, etc.) in serialization
- Supports both model-based (AST) edits and token-based round-tripping
- Performance-focused default tokenizer/parser that avoids copying input
- Ergonomics-focused round-trip tokenizer/parser that produce structures with solely owned types for ease of editing
- Supports basic parsing and serialization without
serde
(you may disable the defaultserde
feature!)
Usage
You can use this lib with serde
in the typical way:
use json_five::from_str;
use serde::Deserialize;
#[derive(Debug, PartialEq, Deserialize)]
struct MyData {
name: String,
count: i64,
maybe: Option<f64>,
}
fn main() {
let source = r#"
// A possible JSON5 input
{
name: 'Hello',
count: 42,
maybe: null
}
"#;
let parsed = from_str::<MyData>(source).unwrap();
let expected = MyData {name: "Hello".to_string(), count: 42, maybe: None};
assert_eq!(parsed, expected);
}
Serializing also works in the usual way. The re-exported to_string
function comes from the ser
module and works
how you'd expect with default formatting.
use serde::Serialize;
use json_five::to_string;
#[derive(Serialize)]
struct Test {
int: u32,
seq: Vec<&'static str>,
}
let test = Test {
int: 1,
seq: vec!["a", "b"],
};
let expected = r#"{"int": 1, "seq": ["a", "b"]}"#;
assert_eq!(to_string(&test).unwrap(), expected);
You may also use the to_string_formatted
with a FormatConfiguration
to control the output format, including
indentation, trailing commas, and key/item separators.
use serde::Serialize;
use json_five::{to_string_formatted, FormatConfiguration, TrailingComma};
#[derive(Serialize)]
struct Test {
int: u32,
seq: Vec<&'static str>,
}
let test = Test {
int: 1,
seq: vec!["a", "b"],
};
let config = FormatConfiguration::with_indent(4, TrailingComma::ALL);
let formatted_doc = to_string_formatted(&test, config).unwrap();
let expected = r#"{
"int": 1,
"seq": [
"a",
"b",
],
}"#;
assert_eq!(formatted_doc, expected);
Examples
See the examples/
directory for examples of programs that utilize round-tripping features.
examples/json5-doublequote-fixer
gives an example of tokenization-based round-tripping editsexamples/json5-trailing-comma-formatter
gives an example of model-based round-tripping edits
Benchmarking
Benchmarks are available in the benches/
directory. Test data is in the data/
directory. A couple of benchmarks use
big files that are not committed to this repo. So run ./data/setupdata.sh
to download the required data files
so that you don't skip the big benchmarks. The benchmarks compare json_five
(this crate) to
serde_json and json5-rs.
Notwithstanding the general caveats of benchmarks, in initial testing, json_five
definitively outperforms json5-rs
.
In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests on extremely large data.
At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve,
if at least marginally, in the future.
These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won't be updated unless significant changes happen.
test | json_five | json5 | serde_json |
---|---|---|---|
big (25MB) | 580.31 ms | 3.0861 s | 150.39 ms |
medium-ascii (5MB) | 199.88 ms | 706.94 ms | 59.008 ms |
empty | 228.62 ns | 708.00 ns | 38.786 ns |
arrays | 578.24 ns | 1.3228 µs | 100.95 ns |
objects | 922.91 ns | 2.0748 µs | 205.75 ns |
nested-array | 22.990 µs | 29.356 µs | 5.0483 µs |
nested-objects | 50.659 µs | 132.75 µs | 14.755 µs |
string | 421.17 ns | 3.5691 µs | 91.051 ns |
number | 238.75 ns | 779.13 ns | 36.179 ns |
deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
Notes
Status
This project is in very early phases. While the crate is usable right now, more thorough testing is needed to ensure that the tokenizer/parser rejects invalid documents.
Questions, discussions, and contributions are welcome. Right now, things are moving fast, so the best way to contribute is likely to just open an issue.
Expect breaking changes for now, even in patch releases.
Serde is optional
Using serde
is actually optional. Some use cases may not require the use of serde's various deserialization methods
and may only need to rely on the tokenizer and/or AST tree features. By default, the serde feature is enabled,
but this can be disabled. Even without the serde
feature, the parser modules provide functions and methods for
parsing and serialization, including the ability to customize the style.
Dependencies
~320–540KB
~11K SLoC