#sitemap #url #list #output #read #cache #urllist

bin+lib sitemap2urllist

Read a sitemap and output a list of URLs

5 releases

0.1.4 Jan 2, 2025
0.1.3 Jan 2, 2025
0.1.2 Jan 1, 2025
0.1.1 Jan 1, 2025
0.1.0 Jan 1, 2025

#1197 in Command line utilities

Download history 424/week @ 2024-12-30 30/week @ 2025-01-06

454 downloads per month

BlueOak-1.0.0

29KB
562 lines

🌐
sitemap2urllist

Read a sitemap and output a list of URLs.


sitemap2urllist is a CLI tool for parsing a sitemap and outputting a simple list of URLs, which can easily be piped into other tools (e.g., lychee).

Install

cargo install --locked sitemap2urllist

Usage

Read a sitemap and output a list of URLs.

Usage: sitemap2urllist [OPTIONS] <URL>

Arguments:
  <URL>  The URL to a sitemap

Options:
  -c, --cache                          Use request cache stored on disk at `.sitemapcache` (recommended)
      --max-cache-age <MAX_CACHE_AGE>  Discard all cached requests older than this duration [default: 14d]
  -v, --verbose...                     Increase logging verbosity
  -q, --quiet...                       Decrease logging verbosity
  -h, --help                           Print help (see more with '--help')
  -V, --version                        Print version

Example Usage with Lychee

At some point, it is likely link checkers like lychee obviate the need for this tool by implementing recursive link checking.

In the meantime, it is easy to run a link check from your local machine on an entire website as defined by its sitemap by doing something like the following.

sitemap2urllist https://www.numbersstation.ai/sitemap.xml --cache | xargs lychee --cache

Note you can combine this with lychee's configuration to do things like cache or ignore certain errors, etc.

  • Sitemap-to-Urllist (rust/shell/typescript): Simple sitemap.xml to urllist.txt converter (abandoned)

Dependencies

~20–36MB
~555K SLoC