11 releases
0.0.11 | Oct 25, 2024 |
---|---|
0.0.10 | Aug 29, 2024 |
#15 in #readability
1,109 downloads per month
Used in 2 crates
(via spider_transformations)
27KB
746 lines
llm_readability
The Rust readability library built for performance, AI, and multiple locales. The library is used on Spider Cloud for data cleaning.
Usage
[dependencies]
llm_readability = "0"
use llm_readability::extractor;
fn main() {
match extractor::extract(&mut "<html>...</html>".as_bytes(), "https://example.com", None) {
Ok(product) => {
println!("------- html ------");
println!("{}", product.content);
println!("---- plain text ---");
println!("{}", product.text);
},
Err(_) => println!("error occured"),
}
}
This project is a rewrite of readability-rs
for performance and bug fixes.
Dependencies
~9–17MB
~274K SLoC