2 stable releases

Uses new Rust 2024

1.0.2	Mar 5, 2025

#469 in Command line utilities

77 downloads per month

Custom license

18KB
189 lines

spider

A command line interface for crawling websites and storing their content.

usage

USAGE:
    ss [FLAGS] [OPTIONS] --domain <DOMAIN>

FLAGS:
    -h, --help              Prints help information
    -r, --respect-robots    Respect robots.txt file and not scrape not allowed files
    -V, --version           Prints version information
    -v, --verbose           Turn verbose logging on

OPTIONS:
    -c, --concurrency <NUM>                 How many request can be run simultaneously
    -d, --domain <DOMAIN>                   Domain to crawl
    -p, --polite-delay <DELAY_IN_MILLIS>    Polite crawling delay in milli seconds
    -m, --max-depth <DEPTH>                 Maximum crawl depth from the starting URL
    -t, --timeout <SECONDS>                 Timeout for HTTP requests in seconds
    -u, --user-agent <USER_AGENT>           Custom User-Agent string for HTTP requests
    -o, --output-dir <OUTPUT_DIR>           Directory to store output (default: ./spider-output)

Dependencies

~9–21MB
~292K SLoC