7 releases

0.3.0 Apr 15, 2024
0.2.6 Apr 1, 2024
0.2.5 Feb 10, 2022
0.2.4 May 15, 2021
0.1.1 Feb 27, 2021

#39 in Parser tooling

Download history 6609/week @ 2024-07-21 7001/week @ 2024-07-28 5868/week @ 2024-08-04 7311/week @ 2024-08-11 6573/week @ 2024-08-18 6371/week @ 2024-08-25 5953/week @ 2024-09-01 5469/week @ 2024-09-08 4990/week @ 2024-09-15 5398/week @ 2024-09-22 4528/week @ 2024-09-29 4328/week @ 2024-10-06 4395/week @ 2024-10-13 5162/week @ 2024-10-20 4923/week @ 2024-10-27 5303/week @ 2024-11-03

20,013 downloads per month
Used in 34 crates (10 directly)

Apache-2.0

56KB
564 lines

crates.io version license: Apache 2.0 unsafe forbidden pipeline status

A safe regular expression library.

Features

  • forbid(unsafe_code)
  • Good test coverage (~80%)
  • Runtime is linear.
  • Memory usage is constant. Does not allocate.
  • Compiles your regular expression to a simple Rust function
  • Rust compiler checks and optimizes the matcher
  • Supports basic regular expression syntax:
    • Any byte: .
    • Sequences: abc
    • Classes: [-ab0-9], [^ab]
    • Repetition: a?, a*, a+, a{1}, a{1,}, a{,1}, a{1,2}, a{,}
    • Alternates: a|b|c
    • Capturing groups: a(bc)?
    • Non-capturing groups: a(?:bc)?
  • no_std, by omitting the default "std" feature

Limitations

  • Only works on byte slices, not strings.

  • Partially optimized. Runtime is about 10 times slower than regex crate. Here are relative runtimes measured with safe-regex-rs/bench run on a 2018 Macbook Pro:

    regex safe_regex expression
    1 6 find phone num .*([0-9]{3})[-. ]?([0-9]{3})[-. ]?([0-9]{4}).*
    1 20 find date time .*([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+).*
    1 0.75 parse date time ([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+)
    1 50 check PEM Base64 [a-zA-Z0-9+/]{0,64}=*
    1 20-500 substring search .*(2G8H81RFNZ).*

Alternatives

  • regex
    • Mature & Popular
    • Maintained by the core Rust language developers
    • Contains unsafe code.
    • Allocates
    • Compiles your regular expression at runtime at first use.
    • Subsequent uses must retrieve it from the cache.
  • pcre2
    • Uses PCRE library which is written in unsafe C.
  • regular-expression
    • No documentation
  • rec

Cargo Geiger Safety Report


Metric output format: x/y
    x = unsafe code used by the build
    y = total unsafe code found in the crate

Symbols: 
    🔒  = No `unsafe` usage found, declares #![forbid(unsafe_code)]= No `unsafe` usage found, missing #![forbid(unsafe_code)]
    ☢️  = `unsafe` usage found

Functions  Expressions  Impls  Traits  Methods  Dependency

0/0        0/0          0/0    0/0     0/0      🔒  safe-regex 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒  └── safe-regex-macro 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒      ├── safe-proc-macro2 1.0.68
0/0        0/0          0/0    0/0     0/0      🔒      │   └── unicode-xid 0.2.4
0/0        0/0          0/0    0/0     0/0      🔒      └── safe-regex-compiler 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒          ├── safe-proc-macro2 1.0.68
0/0        0/0          0/0    0/0     0/0      🔒          └── safe-quote 1.0.15
0/0        0/0          0/0    0/0     0/0      🔒              └── safe-proc-macro2 1.0.68

0/0        0/0          0/0    0/0     0/0    

Examples

use safe_regex::{regex, Matcher0};
let matcher: Matcher0<_> =
    regex!(br"[ab][0-9]*");
assert!(matcher.is_match(b"a42"));
assert!(!matcher.is_match(b"X"));
use safe_regex::{regex, Matcher3};
let matcher: Matcher3<_> =
    regex!(br"([ab])([0-9]*)(suffix)?");
let (prefix, digits, suffix) =
    matcher.match_slices(b"a42").unwrap();
assert_eq!(b"a", prefix);
assert_eq!(b"42", digits);
assert_eq!(b"", suffix);
let (prefix_range, digits_r, suffix_r)
    = matcher.match_ranges(b"a42").unwrap();
assert_eq!(0..1_usize, prefix_range);
assert_eq!(1..3_usize, digits_r);
assert_eq!(0..0_usize, suffix_r);

Changelog

  • v0.3.0 - Add assert_match and default std feature.
  • v0.2.6 - Fix some Clippy warnings on regex! macro invocation sites.
  • v0.2.5 - Fix no_std. Thank you, Soares Chen! github.com/soareschen gitlab.com/soareschen-informal
  • v0.2.4
    • Bug fixes, reducing performance.
    • Optimize non-match runtime.
  • v0.2.3
    • Rename match_all -> match_slices.
    • Add match_ranges.
  • v0.2.2 - Simplify match_all return type
  • v0.2.1 - Non-capturing groups, bug fixes
  • v0.2.0
    • Linear-time & constant-memory algorithm! :)
    • Work around rustc optimizer hang on regexes with exponential execution paths like "a{,30}". See src/bin/uncompilable/main.rs.
  • v0.1.1 - Bug fixes and more tests.
  • v0.1.0 - First published version

TO DO

  • 11+ capturing groups
  • Increase coverage
  • Add fuzzing tests
  • Common character classes: whitespace, letters, punctuation, etc.
  • Match strings
  • Repeated capturing groups: (ab|cd)*. Idea: Return an MatcherNIter struct that is an iterator that returns MatcherN structs.
  • Implement optimizations explained in https://swtch.com/%7Ersc/regexp/regexp3.html . Some of the code already exists in tests/dfa_single_pass.rs and tests/nfa_without_capturing.rs.
  • Once const generics are stable, use the feature to simplify some types.
  • Once trait bounds on const fn parameters are stable, make the MatcherN::new functions const.

Development

License: Apache-2.0

Dependencies

~315KB