7 releases

0.3.0	Apr 15, 2024
0.2.6	Apr 1, 2024
0.2.5	Feb 10, 2022
0.2.4	May 15, 2021
0.1.1	~~Feb 27, 2021~~

#380 in Parser implementations

52,690 downloads per month
Used in 78 crates (10 directly)

Apache-2.0

56KB
564 lines

A safe regular expression library.

Features

forbid(unsafe_code)
Good test coverage (~80%)
Runtime is linear.
Memory usage is constant. Does not allocate.
Compiles your regular expression to a simple Rust function
Rust compiler checks and optimizes the matcher
Supports basic regular expression syntax:
- Any byte: .
- Sequences: abc
- Classes: [-ab0-9], [^ab]
- Repetition: a?, a*, a+, a{1}, a{1,}, a{,1}, a{1,2}, a{,}
- Alternates: a|b|c
- Capturing groups: a(bc)?
- Non-capturing groups: a(?:bc)?
no_std, by omitting the default "std" feature

Limitations

Only works on byte slices, not strings.

Partially optimized. Runtime is about 10 times slower than regex crate. Here are relative runtimes measured with safe-regex-rs/bench run on a 2018 Macbook Pro:

`regex`	`safe_regex`	expression
1	6	find phone num `.([0-9]{3})[-. ]?([0-9]{3})[-. ]?([0-9]{4}).`
1	20	find date time `.([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+).`
1	0.75	parse date time `([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+)`
1	50	check PEM Base64 `[a-zA-Z0-9+/]{0,64}=*`
1	20-500	substring search `.(2G8H81RFNZ).`

Alternatives

regex
- Mature & Popular
- Maintained by the core Rust language developers
- Contains unsafe code.
- Allocates
- Compiles your regular expression at runtime at first use.
- Subsequent uses must retrieve it from the cache.
pcre2
- Uses PCRE library which is written in unsafe C.
regular-expression
- No documentation
rec

Cargo Geiger Safety Report


Metric output format: x/y
    x = unsafe code used by the build
    y = total unsafe code found in the crate

Symbols: 
    🔒  = No `unsafe` usage found, declares #![forbid(unsafe_code)]
    ❓  = No `unsafe` usage found, missing #![forbid(unsafe_code)]
    ☢️  = `unsafe` usage found

Functions  Expressions  Impls  Traits  Methods  Dependency

0/0        0/0          0/0    0/0     0/0      🔒  safe-regex 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒  └── safe-regex-macro 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒      ├── safe-proc-macro2 1.0.68
0/0        0/0          0/0    0/0     0/0      🔒      │   └── unicode-xid 0.2.4
0/0        0/0          0/0    0/0     0/0      🔒      └── safe-regex-compiler 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒          ├── safe-proc-macro2 1.0.68
0/0        0/0          0/0    0/0     0/0      🔒          └── safe-quote 1.0.15
0/0        0/0          0/0    0/0     0/0      🔒              └── safe-proc-macro2 1.0.68

0/0        0/0          0/0    0/0     0/0

Examples

use safe_regex::{regex, Matcher0};
let matcher: Matcher0<_> =
    regex!(br"[ab][0-9]*");
assert!(matcher.is_match(b"a42"));
assert!(!matcher.is_match(b"X"));

use safe_regex::{regex, Matcher3};
let matcher: Matcher3<_> =
    regex!(br"([ab])([0-9]*)(suffix)?");
let (prefix, digits, suffix) =
    matcher.match_slices(b"a42").unwrap();
assert_eq!(b"a", prefix);
assert_eq!(b"42", digits);
assert_eq!(b"", suffix);
let (prefix_range, digits_r, suffix_r)
    = matcher.match_ranges(b"a42").unwrap();
assert_eq!(0..1_usize, prefix_range);
assert_eq!(1..3_usize, digits_r);
assert_eq!(0..0_usize, suffix_r);

Changelog

v0.3.0 - Add assert_match and default std feature.
v0.2.6 - Fix some Clippy warnings on regex! macro invocation sites.
v0.2.5 - Fix no_std. Thank you, Soares Chen! github.com/soareschen gitlab.com/soareschen-informal
v0.2.4
- Bug fixes, reducing performance.
- Optimize non-match runtime.
v0.2.3
- Rename match_all -> match_slices.
- Add match_ranges.
v0.2.2 - Simplify match_all return type
v0.2.1 - Non-capturing groups, bug fixes
v0.2.0
- Linear-time & constant-memory algorithm! :)
- Work around rustc optimizer hang on regexes with exponential execution paths like "a{,30}". See src/bin/uncompilable/main.rs.
v0.1.1 - Bug fixes and more tests.
v0.1.0 - First published version

TO DO

11+ capturing groups
Increase coverage
Add fuzzing tests
Common character classes: whitespace, letters, punctuation, etc.
Match strings
Repeated capturing groups: (ab|cd)*. Idea: Return an MatcherNIter struct that is an iterator that returns MatcherN structs.
Implement optimizations explained in https://swtch.com/%7Ersc/regexp/regexp3.html . Some of the code already exists in tests/dfa_single_pass.rs and tests/nfa_without_capturing.rs.
Once const generics are stable, use the feature to simplify some types.
Once trait bounds on const fn parameters are stable, make the MatcherN::new functions const.

Development

An overview of how this library works: https://news.ycombinator.com/item?id=27301320

License: Apache-2.0

Dependencies

~390KB