#recipe #structured #web #scraping #data #parser #extract

recipe-scraper

A library for parsing structured recipes from the web

2 releases

0.1.1 Dec 27, 2024
0.1.0 Nov 24, 2024

#1672 in Database interfaces

Download history 110/week @ 2024-11-19 40/week @ 2024-11-26 7/week @ 2024-12-03 11/week @ 2024-12-10 135/week @ 2024-12-24 3/week @ 2025-01-07

138 downloads per month

MIT license

19KB
481 lines

recipe-scraper

GitHub crates.io docs.rs

recipe-scraper is a Rust library for scraping structured recipe data from the web

It provides a simple set of APIs to find and parse compliant recipe formats

Support

recipe-scraper is fairly pragmatic, and extracts minimal data from recipes. It currently extracts the following (meta-)data:

And supports the following structured recipe formats:

recipe-scraper provides methods that operate on HTML strings, as well as directly on JSON. It explicitly does not provide HTTP client functionality.

Usage & Examples

Scraping a recipe from an HTML string (which can be obtained via reqwest or ureq, etc):

use recipe_scraper::{
  Extract,
  Scrape,
  SchemaOrgEntry,
  SchemaOrgRecipe,
};
// let html = ...;
let schema_entries = SchemaOrgEntry::scrape_html(&html);
if let Some(first_valid_recipe) = schema_entries.into_iter().flat_map(Extract::extract_recipes).next() {
  println!("Found recipe!: {first_valid_recipe:?}")
}

Dependencies

~5–11MB
~115K SLoC