#suffix #list #domain #parser #nom #public #cache

nom-psl

Fast public suffix list domain parsing, written in nom

4 releases (stable)

1.2.0 Jun 3, 2019
1.1.0 May 16, 2019
1.0.0 Oct 25, 2018
0.1.0 Oct 25, 2018

#2439 in Parser implementations

BSD-3-Clause

75KB
468 lines

Crates.io

Faster public suffix domain parsing.

The scope of this library is limited to finding the tld+1 of a given domain from the public suffix list.

Approach:

  • Load public suffix list entries into memory
  • Match immutable, owned values of domains to be parsed
  • Leverage a user-sized lru cache for entries

Goals:

  • provide (mostly) compliant public suffix domain parsing.
  • avoid allocations during domain parsing.
  • offload as much work as possible to parsing stage.
  • avoid depedencies that might themselves bring unwanted baggage
  • inputs are not mutated, outputs are slices of inputs

Caveats:

  • still rely on idna crate for punycode parsing
  • we don't lower-case anything (for performance we ignore this)

Environment Variables

PUBLIC_SUFFIX_LIST_FILE=somefile - override which file will be loaded in place of public_suffix_list.dat

Example:

lazy_static! {
    static ref LIST: List = {
        let list = List::parse_source_file("public_suffix_list.dat", 10_000_000);
        list.expect("unable to parse PSL file")
    };
}

...

fn foo() {
    let domain = "abc.one.two.example.co.uk";
    let tldp1 = LIST.parse_domain(domain);
    
    assert_eq!(tldp1, Some("example.co.uk"));
}

TODO:

  • benchmarks

Dependencies

~2.5MB
~60K SLoC