#pdf #archive #wayback #machine #url #links #internet

app archive-pdf-urls

Extract all links from a PDF and archive the URLs in the Internet Archive's Wayback Machine

11 unstable releases (3 breaking)

new 0.5.1 Mar 17, 2025
0.4.6 Mar 7, 2025
0.4.5 Nov 25, 2024
0.4.1 Jun 17, 2024
0.3.0 Mar 27, 2024

#155 in Compression

Download history 227/week @ 2024-11-20 71/week @ 2024-11-27 4/week @ 2024-12-04 11/week @ 2024-12-11 4/week @ 2024-12-18 3/week @ 2025-01-29 28/week @ 2025-02-12 16/week @ 2025-02-26 250/week @ 2025-03-05

294 downloads per month

Apache-2.0

48KB
615 lines

Archive PDF URLs

This command-line tool extracts URLs from a PDF file and archives them using the Wayback Machine.

Build status Crates.io

Installation

You can build and install the tool using Cargo:

cargo install archive-pdf-urls

Usage

The tool reads URLs from standard input, one URL per line, and archives them using the Wayback Machine.

Example usage:

archive-pdf-urls file.pdf --exclude https://some.pattern/\*

Docker usage

docker run --rm -v ./file.pdf:/file.pdf ghcr.io/thoth-pub/archive-pdf-urls file.pdf

Dependencies

~25–38MB
~525K SLoC