Lib.rs

›

#suffix #list #domain #parser #nom #public #cache

nom-psl

Fast public suffix list domain parsing, written in nom

by Daniel N. Werner

4 releases (stable)

1.2.0	Jun 3, 2019
1.1.0	May 16, 2019
1.0.0	Oct 25, 2018
0.1.0	Oct 25, 2018

#18 in #suffix

BSD-3-Clause

75KB
468 lines

Faster public suffix domain parsing.

The scope of this library is limited to finding the tld+1 of a given domain from the public suffix list.

Approach:

Load public suffix list entries into memory
Match immutable, owned values of domains to be parsed
Leverage a user-sized lru cache for entries

Goals:

provide (mostly) compliant public suffix domain parsing.
avoid allocations during domain parsing.
offload as much work as possible to parsing stage.
avoid depedencies that might themselves bring unwanted baggage
inputs are not mutated, outputs are slices of inputs

Caveats:

still rely on idna crate for punycode parsing
we don't lower-case anything (for performance we ignore this)

Environment Variables

PUBLIC_SUFFIX_LIST_FILE=somefile - override which file will be loaded in place of public_suffix_list.dat

Example:

lazy_static! {
    static ref LIST: List = {
        let list = List::parse_source_file("public_suffix_list.dat", 10_000_000);
        list.expect("unable to parse PSL file")
    };
}

...

fn foo() {
    let domain = "abc.one.two.example.co.uk";
    let tldp1 = LIST.parse_domain(domain);
    
    assert_eq!(tldp1, Some("example.co.uk"));
}

TODO:

benchmarks

Dependencies

~2.5MB
~60K SLoC