#command-line-tool #sum #column #numbers #awk #text-file #file-input

bin+lib sumcol

A command-line tool to sum a column of numbers

4 releases

0.1.3 Dec 18, 2023
0.1.2 Dec 14, 2023
0.1.1 Nov 10, 2023
0.1.0 Nov 10, 2023

#1425 in Command line utilities

MIT/Apache

17KB
177 lines

sumcol

sumcol CI sumcol crates.io

sumcol is a simple unix-style command-line tool for summing numbers from a column of text. It's a replacement for the tried and true Unix-isms, like awk '{s += $3} END {print s}' (prints the sum of the numbers in the third whitespace delimited column), without all the verbosity.

Quick Install

$ cargo install sumcol

Examples

NOTE: If you don't have sumcol installed in your path, you can run the following commands directly out of this repo by replacing sumcol with cargo run -q --.

Help

$ sumcol -h
A command-line tool to sum a column of numbers.

Usage: sumcol [OPTIONS] [FILES]...

Arguments:
  [FILES]...  Files to read input from, otherwise uses stdin

Options:
  -f, --field <FIELD>          The field to sum. If not specified, uses the full line [default: 0]
  -x, --hex                    Treat all numbers as hex, not just those with a leading 0x
  -d, --delimiter <DELIMITER>  The regex on which to split fields [default: \s+]
  -v, --verbose                Print each number that's being summed, along with some metadata
  -h, --help                   Print help
  -V, --version                Print version

Sum file sizes

Here we'll sum the sizes of all the files in my current directory:

$ ls -l
total 48
-rw-r--r--  1 greg  staff  14938 Nov 10 13:56 Cargo.lock
-rw-r--r--  1 greg  staff    399 Nov 10 15:06 Cargo.toml
-rw-r--r--  1 greg  staff   1871 Nov 10 15:16 README.md
drwxr-xr-x  3 greg  staff     96 Nov 10 11:55 src
drwxr-xr-x@ 6 greg  staff    192 Nov 10 11:59 target
drwxr-xr-x  3 greg  staff     96 Nov 10 11:59 tests

The size is shown in column -- or field -- number 5 (starting from 1), so we can use sumcol as follows:

$ ls -l | sumcol -f5
17469

Which is equivalent to (but shorter than) the classic awk incantation:

$ ls -l | awk '{s += $5} END {print s}'
17469

Sum all input

Sometimes you use other tools to extact a column of numbers, in which case you can still use sumcol with no arguments to simply sum all of the input. Using the file listing from above, we could do the following:

$ ls -l | awk '{print $5}' | sumcol 
17469

Summing hex numbers

Programmers are often dealing with numbers written in hex. Typically in forms like 0x123abc or even simply 0000abcd. When sumcol sees a number starting with 0x it always assumes it's written in hex and parses it accordingly. However, a hex number written without that prefix requires that we tell sumcol to use hex.

For this example we'll sum the sizes of each section in the compiled sumcol binary. We can see this information with the objdump command.

$ objdump -h target/release/sumcol

target/release/sumcol:     file format mach-o-arm64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0014c350  0000000100000c0c  0000000100000c0c  00000c0c  2**2
                  CONTENTS, ALLOC, LOAD, CODE
  1 __TEXT.__stubs 000003b4  000000010014cf5c  000000010014cf5c  0014cf5c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .const        0004f458  000000010014d310  000000010014d310  0014d310  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 __TEXT.__gcc_except_tab 0000cae8  000000010019c768  000000010019c768  0019c768  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  4 __TEXT.__unwind_info 000087c8  00000001001a9250  00000001001a9250  001a9250  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  5 .eh_frame     0002e5e0  00000001001b1a18  00000001001b1a18  001b1a18  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 __DATA_CONST.__got 00000280  00000001001e0000  00000001001e0000  001e0000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  7 __DATA_CONST.__const 0002c9c0  00000001001e0280  00000001001e0280  001e0280  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  8 .data         00000028  0000000100210000  0000000100210000  00210000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  9 __DATA.__thread_vars 00000108  0000000100210028  0000000100210028  00210028  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 10 __DATA.__thread_data 00000040  0000000100210130  0000000100210130  00210130  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 11 __DATA.__thread_bss 00000090  0000000100210170  0000000100210170  00210170  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 12 __DATA.__common 00000038  0000000100210200  0000000100210200  00000000  2**3
                  ALLOC
 13 .bss          00000148  0000000100210238  0000000100210238  00000000  2**3
                  ALLOC

We see here that the size is in field three and it's written in hex without a leading 0x. Let's look at field three:

$ objdump -h target/release/sumcol | awk '{print $3}'

format


Size
0014c350
LOAD,
000003b4
LOAD,
0004f458
LOAD,
0000cae8
LOAD,
000087c8
LOAD,
0002e5e0
LOAD,
00000280
LOAD,
0002c9c0
LOAD,
00000028
LOAD,
00000108
LOAD,
00000040
LOAD,
00000090
LOAD,
00000038

00000148

Yuck. That has numbers, and non-numbers. Luckily, sumcol will easily handle this! It quietly ignores non-numbers treating them as if they're a 0. So let's see what answer we get:

$ objdump -h target/release/sumcol | sumcol -f3
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "0014c350". Consider using -x
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "000003b4". Consider using -x
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "0004f458". Consider using -x
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "0000cae8". Consider using -x
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "000087c8". Consider using -x
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "0002e5e0". Consider using -x
[2023-11-10T21:02:06Z WARN  sumcol] Failed to parse "0002c9c0". Consider using -x
732

Interesting. Sumcol quietly ignores non-numbers like LOAD in the above example, but here it's warning us that it's seeing strings that look like hex numbers but we didn't tell it to parse the numbers as hex. Let's try again following the recommendation to use -x.

$ objdump -h target/release/sumcol | sumcol -f3 -x
0x20C3AC

NOTE: If the hex numbers started with a leading '0x, sumcol would have silently parsed them correctly and omitted the warning.

Debugging

If sumcol doesn't seem to be working right, feel free to look at the code on github (it's pretty straight forward), or run it with the -v or --verbose flag, or even enable the RUST_LOG=debug environment variable set. For example:

$ printf "1\n2.5\nOOPS\n3" | sumcol -v
1       # n=Integer(1) sum=Integer(1) cnt=1 radix=10 raw_str="1"
2.5     # n=Float(2.5) sum=Float(3.5) cnt=2 radix=10 raw_str="2.5"
0       # n=Integer(0) sum=Float(3.5) cnt=2 radix=10 raw_str="OOPS" err="ParseFloatError { kind: Invalid }"
3       # n=Integer(3) sum=Float(6.5) cnt=3 radix=10 raw_str="3"
==
6.5

The metadata that's displayed on each line is

Name Description
n The parsed numeric value
sum The running sum up to and including the current n
cnt The running count of successfully parsed numbers. If a number fails to parse and 0 is used instead, it will not be included in cnt
radix The radix used when trying to parse the number as an integer
raw_str The raw string data that was parsed
err If present, this shows the error from trying to parse the string into a number

This should be enough to help you debug the problem you're seeing. However, if that's not enough, give it a try with RUST_LOG=debug.

Dependencies

~4–16MB
~146K SLoC