#json5 #serde #json #parser #data-model #derive-debug #round-trip

json-five

JSON5 parser with round-trip capabilities and compatible with the serde data model

5 releases

new 0.2.1 Feb 20, 2025
0.2.0 Feb 20, 2025
0.1.2 Feb 6, 2025
0.1.1 Feb 6, 2025
0.1.0 Feb 5, 2025

#573 in Parser implementations

Download history 61/week @ 2025-01-30 330/week @ 2025-02-06

391 downloads per month

MIT license

270KB
7K SLoC

json-five-rs

This project provides a handwritten JSON5 tokenizer and recursive descent parser compatible with serde.

Crates.io Version docs.rs

Key Features

  • Compatible with serde data model
  • Supports round-trip use cases with preservation/editing of whitespace and comments
  • Supports formatting (indent, compact formats, etc.) in serialization
  • Supports both model-based (AST) edits and token-based round-tripping
  • Performance-focused default tokenizer/parser that avoids copying input
  • Ergonomics-focused round-trip tokenizer/parser that produce structures with solely owned types for ease of editing
  • Supports basic parsing and serialization without serde (you may disable the default serde feature!)

Usage

You can use this lib with serde in the typical way:

use json_five::from_str;
use serde::Deserialize;
#[derive(Debug, PartialEq, Deserialize)]
struct MyData {
    name: String,
    count: i64,
    maybe: Option<f64>,
}

fn main() {
    let source = r#"
    // A possible JSON5 input
    {
      name: 'Hello',
      count: 42,
      maybe: null
    }
"#;

    let parsed = from_str::<MyData>(source).unwrap();
    let expected = MyData {name: "Hello".to_string(), count: 42, maybe: None};
    assert_eq!(parsed, expected);
}

Serializing also works in the usual way. The re-exported to_string function comes from the ser module and works how you'd expect with default formatting.

use serde::Serialize;
use json_five::to_string;
#[derive(Serialize)]
struct Test {
    int: u32,
    seq: Vec<&'static str>,
}
let test = Test {
    int: 1,
    seq: vec!["a", "b"],
};
let expected = r#"{"int": 1, "seq": ["a", "b"]}"#;
assert_eq!(to_string(&test).unwrap(), expected);

You may also use the to_string_formatted with a FormatConfiguration to control the output format, including indentation, trailing commas, and key/item separators.

use serde::Serialize;
use json_five::{to_string_formatted, FormatConfiguration, TrailingComma};
#[derive(Serialize)]
struct Test {
    int: u32,
    seq: Vec<&'static str>,
}
let test = Test {
    int: 1,
    seq: vec!["a", "b"],
};

let config = FormatConfiguration::with_indent(4, TrailingComma::ALL);
let formatted_doc = to_string_formatted(&test, config).unwrap();

let expected = r#"{
    "int": 1,
    "seq": [
        "a",
        "b",
    ],
}"#;

assert_eq!(formatted_doc, expected);

Examples

See the examples/ directory for examples of programs that utilize round-tripping features.

  • examples/json5-doublequote-fixer gives an example of tokenization-based round-tripping edits
  • examples/json5-trailing-comma-formatter gives an example of model-based round-tripping edits

Benchmarking

Benchmarks are available in the benches/ directory. Test data is in the data/ directory. A couple of benchmarks use big files that are not committed to this repo. So run ./data/setupdata.sh to download the required data files so that you don't skip the big benchmarks. The benchmarks compare json_five (this crate) to serde_json and json5-rs.

Notwithstanding the general caveats of benchmarks, in initial testing, json_five definitively outperforms json5-rs. In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests on extremely large data. At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve, if at least marginally, in the future.

These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won't be updated unless significant changes happen.

test json_five json5 serde_json
big (25MB) 580.31 ms 3.0861 s 150.39 ms
medium-ascii (5MB) 199.88 ms 706.94 ms 59.008 ms
empty 228.62 ns 708.00 ns 38.786 ns
arrays 578.24 ns 1.3228 µs 100.95 ns
objects 922.91 ns 2.0748 µs 205.75 ns
nested-array 22.990 µs 29.356 µs 5.0483 µs
nested-objects 50.659 µs 132.75 µs 14.755 µs
string 421.17 ns 3.5691 µs 91.051 ns
number 238.75 ns 779.13 ns 36.179 ns
deserialize (size 10) 6.9898µs 58.398µs 886.33ns
deserialize (size 10) 6.9898µs 58.398µs 886.33ns
deserialize (size 10) 6.9898µs 58.398µs 886.33ns
deserialize (size 100) 66.005µs 830.79µs 9.9705µs
deserialize (size 100) 66.005µs 830.79µs 9.9705µs
deserialize (size 100) 66.005µs 830.79µs 9.9705µs
deserialize (size 1000) 599.39µs 8.4952ms 69.110µs
deserialize (size 1000) 599.39µs 8.4952ms 69.110µs
deserialize (size 1000) 599.39µs 8.4952ms 69.110µs
deserialize (size 10000) 5.9841ms 82.591ms 734.40µs
deserialize (size 10000) 5.9841ms 82.591ms 734.40µs
deserialize (size 10000) 5.9841ms 82.591ms 734.40µs
deserialize (size 100000) 66.841ms 955.37ms 11.638ms
deserialize (size 100000) 66.841ms 955.37ms 11.638ms
deserialize (size 100000) 66.841ms 955.37ms 11.638ms
deserialize (size 1000000) 674.13ms 9.5758s 119.03ms
deserialize (size 1000000) 674.13ms 9.5758s 119.03ms
deserialize (size 1000000) 674.13ms 9.5758s 119.03ms
serialize (size 10) 2.3496µs 48.915µs 891.85ns
serialize (size 10) 2.3496µs 48.915µs 891.85ns
serialize (size 10) 2.3496µs 48.915µs 891.85ns
serialize (size 100) 19.602µs 458.98µs 6.7109µs
serialize (size 100) 19.602µs 458.98µs 6.7109µs
serialize (size 100) 19.602µs 458.98µs 6.7109µs
serialize (size 1000) 194.19µs 4.6035ms 62.667µs
serialize (size 1000) 194.19µs 4.6035ms 62.667µs
serialize (size 1000) 194.19µs 4.6035ms 62.667µs
serialize (size 10000) 2.2104ms 47.253ms 761.10µs
serialize (size 10000) 2.2104ms 47.253ms 761.10µs
serialize (size 10000) 2.2104ms 47.253ms 761.10µs
serialize (size 100000) 24.418ms 502.35ms 11.410ms
serialize (size 100000) 24.418ms 502.35ms 11.410ms
serialize (size 100000) 24.418ms 502.35ms 11.410ms
serialize (size 1000000) 245.26ms 4.6211s 115.84ms
serialize (size 1000000) 245.26ms 4.6211s 115.84ms
serialize (size 1000000) 245.26ms 4.6211s 115.84ms

Notes

Status

This project is in very early phases. While the crate is usable right now, more thorough testing is needed to ensure that the tokenizer/parser rejects invalid documents.

Questions, discussions, and contributions are welcome. Right now, things are moving fast, so the best way to contribute is likely to just open an issue.

Expect breaking changes for now, even in patch releases.

Serde is optional

Using serde is actually optional. Some use cases may not require the use of serde's various deserialization methods and may only need to rely on the tokenizer and/or AST tree features. By default, the serde feature is enabled, but this can be disabled. Even without the serde feature, the parser modules provide functions and methods for parsing and serialization, including the ability to customize the style.

Dependencies

~320–540KB
~11K SLoC