2 unstable releases

0.1.0	Nov 29, 2023
0.0.0	Aug 8, 2023

#1885 in Text processing

MIT license

16KB
212 lines

webreg (web regex)

CLI tool for testing regexes against web pages.

Test if a list of websites match a given regex

Installation

cargo install webreg

Usage

webreg [OPTIONS] <REGEX>

Arguments:
<REGEX> A regular expression to match against the site content

Options:
-u, --urls <URLS>       Comma separated list of urls
-i, --file <FILE>       A file containing a list of urls
-c, --case-insensitive  Case insensitive search
-f, --fix-urls          Fix urls that don't start with http:// or https://
-r, --retry             Retry failed urls
-s, --save              Saves the output to the results folder (./results/<regex>)
-h, --help              Print help

Examples

Basic usage

webreg -u "https://example.com" "Hello World"

This will check if the string "Hello World" is present in the content of https://example.com. If it is, it will print the url to stdout.

Multiple urls

webreg -u "https://example.com,https://example.org" "Hello World"

Domains

webreg -u -f "example.com,example.org" "Hello World"

The -f flag will fix urls that don't start with http:// or https://

Case insensitive

webreg -u -c "https://example.com" "hello world"

The -c flag will make the search case insensitive.

File input

webreg -i urls.txt "Hello World"

urls.txt:

https://example.com
https://example.org

The -i flag will read the urls from a file. The file should contain one url per line. Empty lines will be ignored and whitespace will be trimmed.

Pipe input

cat urls.txt | webreg -i "Hello World"

urls.txt:

https://example.com
https://example.org

Save the output

webreg -u -s "https://example.com" "Hello World"

The -s flag will save the output to the results folder (./results/<regex>). This will also output lists urls that couldn't be fetched and urls that didn't match the regex.

Dependencies

~33–48MB
~847K SLoC