154 stable releases
new 2.13.20 | Nov 24, 2024 |
---|---|
2.13.14 | Nov 21, 2024 |
2.11.0 | Oct 31, 2024 |
2.6.21 | Sep 30, 2024 |
0.0.3 | Sep 21, 2024 |
#1704 in Web programming
7,113 downloads per month
Used in spider_utils
200KB
4.5K
SLoC
spider_transformations
The Rust spider cloud transformation library built for performance, AI, and multiple locales. The library is used on Spider Cloud for data cleaning.
Usage
[dependencies]
spider_transformations = "0"
use spider_transformations::transformation::content;
fn main() {
// page comes from the spider object when streaming.
let conf = content::TransformConfig::default();
let content = content::transform_content(&page, &conf, &None, &None);
}
Transfrom types
- Markdown
- Commonmark
- Text
- Markdown (Text Map) or HTML2Text
- WIP: HTML2XML
Enhancements
- Readability
- Encoding
Chunking
There are several chunking utils in the transformation mod.
This project has rewrites and forks of html2md, and html2text for performance and bug fixes.
License
MIT
Dependencies
~22–37MB
~639K SLoC