1 unstable release
new 0.1.0 | Jan 17, 2025 |
---|
#1290 in Web programming
50KB
668 lines
HTMLFixinator
A Rust library for cleaning and transforming HTML content through a composable filter system. HTMLFixinator provides a set of filters that can be used individually or chained together to modify HTML documents.
Features
- 🔍 Attribute Filter: Remove specific HTML attributes while preserving others
- 🗑️ Comment Filter: Strip HTML comments from the document
- 📦 Element Filter: Remove or unwrap specific HTML elements
- 🧹 Empty Filter: Remove empty elements while preserving non-empty ones
- 🔗 URL Filter: Convert relative URLs to absolute URLs
- ⛓️ Filter Chain: Combine multiple filters for complex transformations
- 🎯 Case-insensitive: All filters work case-insensitively for robustness
Installation
Add this to your Cargo.toml
:
[dependencies]
htmlfixinator = { version = "0.1.0" }
Or run this command:
cargo add htmlfixinator
Usage
Basic Example
use htmlfixinator::{
filters::{Filter, FilterTrait},
string_to_node, node_to_string,
};
// Create a filter to remove class and style attributes
let filter = Filter::attribute(&["class", "style"]);
// Apply the filter to some HTML
let html = r#"<div class="test" style="color: red;">Content</div>"#;
let doc = string_to_node(html);
let result = filter.apply(doc);
assert_eq!(node_to_string(result), "<div>Content</div>");
Chaining Filters
use htmlfixinator::{
filters::{Filter, FilterChain, FilterTrait},
string_to_node, node_to_string,
};
// Create a chain of filters
let chain = FilterChain::new()
.add(Filter::comment()) // Remove comments
.add(Filter::empty()) // Remove empty elements
.add(Filter::attribute(&["class"])); // Remove class attributes
let html = r#"<!-- Comment --><div class="test"><span></span><p>Content</p></div>"#;
let doc = string_to_node(html);
let result = chain.apply(doc);
assert_eq!(node_to_string(result), "<div><p>Content</p></div>");
Available Filters
AttributeFilter
Removes specified attributes from all elements.
use htmlfixinator::filters::Filter;
let filter = Filter::attribute(&["class", "style"]);
CommentFilter
Removes all HTML comments from the document.
use htmlfixinator::filters::Filter;
let filter = Filter::comment();
ElementFilter
Either removes elements completely or unwraps them (removes the element but keeps its content).
use htmlfixinator::filters::Filter;
// Remove mode
let filter = Filter::element(&["script", "style"], false);
// Unwrap mode
let filter = Filter::element(&["div", "span"], true);
EmptyFilter
Removes elements that have no content (preserves elements with <img>
tags).
use htmlfixinator::filters::Filter;
let filter = Filter::empty();
RelativeToAbsoluteFilter
Converts relative URLs in href attributes to absolute URLs.
use htmlfixinator::filters::Filter;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let filter = Filter::relabs("https://example.com/path/")?;
Ok(())
}
License
This project is licensed under the GNU Lesser General Public License.
Dependencies
~6–12MB
~137K SLoC