#json-array #json-file #subset #input #convert #valid #elements

bin+lib headj

A utility that converts input JSON arrays into valid JSON that contains only a subset of the elements

2 releases

0.1.1 Sep 30, 2022
0.1.0 Sep 28, 2022

#11 in #json-array

MIT license

29KB
554 lines

headj

A utility that converts input JSON arrays into valid JSON that contains only a subset of the elements

crates.io docs.rs License Latest version All downloads Downloads of latest version

Description

A utility to take enormous JSON files & cut them down to size.

Sometimes one has a JSON file with a very, very large array in it, but you really would like to have a subset of the data. For example, you have a JSON file representing a DB dump of millions of records & you'd like to have a workable number of rows to test with, or code with, or just examine.

You could use an editor, but even if the editor can load a file that large, it's likely to be very unpleasant to use.

You could just use some kind of text processor (like the unix/linux head command) but there are two issues with that:

  1. If the file doesn't have newlines, that's not helpful.
  2. The text processor will know nothing about JSON, so it will mangle it.

You could write a script/program to do the work for you. I did that. Then I decided to package it up, so you don't have to.

headj is a command line utility, similar to the head command, for producing a subset of a JSON file that is itself valid JSON. It allows you to ingest JSON containing a huge JSON array I produce JSON with a manageable JSON array.

THIS IS ALPHA LEVEL SOFTWARE

It appears to work, but I'm working on it.

One very large caveat is: It freely discards JSON that surrounds the array of interest. So, if you have a complex JSON object with a huge array in it, you will get a JSON file back that only includes the (reduced) array & whatever JSON structure it was found in. Everything else will be elided. (This actually works correctly now)

For example:

Input

{
  "a": 1,
  "b": [
    1,
    2,
    3,
    4,
    5
  ],
  "c": true
}

command: headj --key 'b' --count 3

output

{
  "b": [
    1,
    2,
    3
  ]
}

Installation

With Cargo

cargo install headj

Usage

USAGE:
    headj [OPTIONS] [INPUT_FILE]

ARGS:
    <INPUT_FILE>    The JSON file to read from. If none is specified, reads from Standard Input

OPTIONS:
    -c, --count <COUNT>          Number of elements to copy to the output (default: 100) [default:
                                 100]
    -d, --debug                  Activate extra debugging output
    -f, --format-output          Nicely format the output JSON with indentation & newlines
    -h, --help                   Print help information
    -k, --key <KEY>              The JSON key of the array to copy from. If none specified, treat
                                 the input JSON as an array
    -n, --no-context             Output _only_ the target JSON array
    -o, --out-file <OUT_FILE>    File to write the JSON results to (default: Standard Output)
    -q, --quiet                  Don't print any status, diagnostic or error messages
    -s, --skip <SKIP>            Number of elements to skip before copying (default: 0) [default: 0]
    -V, --version                Print version information

Examples

headj <<- JSON
[1,2,3,4,5]
JSON
# Output: [1, 2, 3, 4, 5]

headj -c 1 <<- JSON
[1,2,3,4,5]
JSON
# Output: [1]

headj -c 1 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3]

headj -c 2 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4]

headj -k 'foo' <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: {"foo": [1, 2, 3, 4, 5]}

headj -k 'foo' -n <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: [1, 2, 3, 4, 5]

headj -c 2 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4]

headj -c 25 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4, 5]

headj -c 2 -s 2 <<- JSON
[1,2,3,4,5]
JSON
# Output: [3, 4]

headj -c 2 -s 2 -f <<- JSON
[1,2,3,4,5]
JSON
# Output: [\n     3,\n     4\n]

headj -k 'foo.bar' -c 2 -s 2 -n <<- JSON
{"foo":{"bar":[1,2,3,4,5]}}
JSON
# Output: [3, 4]

headj -k 'foo.bar' -c 2 -s 2 <<- JSON
{"foo":{
"bar":[1,2,3,4,5]}
}
JSON
# Output: {"bar": {"foo": [3, 4]}}

headj -k 'foo' -c 2 -s 2 <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: {"foo": [3, 4]}

headj -k 'foo' -c 2 -s 2 -n <<- JSON
{"foo":[1,2,3,4,5]}
JSON
# Output: [3, 4]

Documentation

The TBD

  • The --key is based, in the most vague sense, on the JSON Schema specification. It's more of a gesture in the general direction of JSON Schema. The reason that it doesn't use the full JSON Schema is, the most natural way to write keys if you don't feel like reading a specification would not necessarily start at the root, which would be potentially confusing. Insisting that users begin with a '$' would likely seem arbitrary & annoying. So, the "just dots & backslashes" implementation seemed reasonable.
  • The deletion of all JSON elements except the ones of interest is "bad". It needs to be fixed (or at least optional).
  • The error messages can be comically unhelpful.
  • The examples could be improved a trifle.
  • The --format option currently DOES NOTHING. Working on it.

License

MIT

Dependencies

~6–14MB
~179K SLoC