1 unstable release

0.1.0 Dec 22, 2023

#17 in #sitemap


Used in sws-lua

MIT/Apache

30KB
781 lines

Sitemap Web Scraper

Sitemap Web Scraper (sws) is a tool for simple, flexible, and yet performant web pages scraping.

It consists of a CLI written in Rust that crawls web pages and executes a Lua JIT script to scrap them, outputting results to a CSV file.

sws crawl --script examples/fandom_mmh7.lua -o result.csv

Check out the doc for more details.


lib.rs:

Web crawler with plugable scraping logic.

The main function crawl_site crawls and scraps web pages. It is configured through a CrawlerConfig and a Scrapable implementation. The latter defines the Seed used for crawling, as well as the scraping logic. Note that robots.txt seeds are supported and exposed through texting_robots::Robot in the CrawlingContext and ScrapingContext.

Dependencies

~11–25MB
~365K SLoC