#pdf #archive #pdf-file #url #extract #machine #wayback

app archive-pdf-urls

Extract all links from a PDF and archive the URLs in the Internet Archive's Wayback Machine

8 releases

new 0.4.5 Nov 25, 2024
0.4.4 Nov 25, 2024
0.4.2 Aug 8, 2024
0.4.1 Jun 17, 2024
0.2.0 Mar 27, 2024

#172 in Compression

Download history 111/week @ 2024-08-07 7/week @ 2024-08-14 7/week @ 2024-09-18 25/week @ 2024-09-25 3/week @ 2024-10-02 3/week @ 2024-10-09 3/week @ 2024-10-30 5/week @ 2024-11-06 1/week @ 2024-11-13 214/week @ 2024-11-20

223 downloads per month

Apache-2.0

48KB
596 lines

Archive PDF URLs

This command-line tool extracts URLs from a PDF file and archives them using the Wayback Machine.

Build status Crates.io

Installation

You can build and install the tool using Cargo:

cargo install archive-pdf-urls

Usage

The tool reads URLs from standard input, one URL per line, and archives them using the Wayback Machine.

Example usage:

archive-pdf-urls file.pdf --exclude https://some.pattern/\*

Docker usage

docker run --rm -v ./file.pdf:/file.pdf ghcr.io/thoth-pub/archive-pdf-urls file.pdf

Dependencies

~24–38MB
~519K SLoC