#epub #ebook #book #convert #parse #structure #content

epubparse

Parse epub and convert to text-only Book structure

4 releases

0.2.2 Jan 16, 2022
0.2.1 May 12, 2021
0.2.0 May 12, 2021
0.1.0 Mar 29, 2021

#1849 in Text processing

24 downloads per month

MIT license

39KB
845 lines

Epubparse-rs

⚠️ Work in progress

Requires Rust 1.56 to compile

This library aims to convert Epub files into text-only Book structures that can be used to do analysis of the contained text. It is published both as a Rust crate to crates.io and as a NPM package (ESM module) to npm. See the project repo for all components.

Design goals

  • ✅ serve as core to the epubparse-wasm library (must compile to WASM)
  • ✅ perform a reasonable conversion into a book with chapters
  • ✅ support Epub version 2 table of contents (.ncx)
  • ❌ support Epub version 3 table of contents (.xhtml) (not yet implemented, but
    many version 3 epubs also include version 2 table of contents, these should also work)

lib.rs:

A library to parse epub files

Design goals:

  • ✅ serve as core to the epubparse-wasm library (must compile to WASM)
  • ✅ perform a reasonable conversion into a book with chapters
  • ✅ support Epub version 2 table of contents (.ncx)
  • ❌ support Epub version 3 table of contents (.xhtml) (not yet implemented, but many version 3 epubs also include version 2 table of contents, these should also work)

Dependencies

~3.5–5.5MB
~99K SLoC