#pdf #image-conversion #image #conversion #text-content #encoding

pdf_process

Library for rendering and extracting metadata/text from PDF files using poppler

3 releases (breaking)

0.2.0 Aug 20, 2024
0.1.0 Aug 11, 2024
0.0.0 Aug 5, 2024

#323 in Images

Download history 105/week @ 2024-08-03 115/week @ 2024-08-10 136/week @ 2024-08-17 12/week @ 2024-08-24 40/week @ 2024-09-14 22/week @ 2024-09-21 13/week @ 2024-09-28 1/week @ 2024-10-05

111 downloads per month

MIT license

57KB
976 lines

PDF Process

Library for processing PDF files in Rust, wraps the CLI utilities provided by Poppler specifically pdftotext (Text extraction), pdftocairo (Image rendering), pdfinfo (Extracting basic details)

Provides functionality for:

  • Extracting PDF text contents
  • Rendering PDF files to images (PNG/JPEG/TIFF)
  • Basic PDF Details (Encryption, Page Count, Subject, Title, Creator, Author, etc..)

Prerequisites

Library developed against a Linux host. Windows is not supported

Requires Plopper be installed on your system and the utilities on your PATH. Lots of distributions will come with this pre-installed. You can check if its installed by using pdfinfo -v which should produce an output similar to:

pdfinfo version 24.02.0
Copyright 2005-2024 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC,

Otherwise you can install it with one of the commands below:

Fedora:

sudo dnf install poppler-utils

Adjust the command above for your specific Linux distribution

Installation

Install with cargo:

cargo add pdf_process

Or add the following to the [dependencies] section of your Cargo.toml:

pdf_process = "0.1.0"

Tested

Tested against:

  • pdftotext version 24.02.0
  • pdftocairo version 24.02.0
  • pdfinfo version 24.02.0

Dependencies

~17–27MB
~483K SLoC