7 releases (1 stable)

1.0.0	Aug 8, 2020
0.4.0	Apr 30, 2020
0.3.0	Apr 30, 2020
0.2.1	Apr 29, 2020
0.1.1	Apr 27, 2020

#24 in #extracting

Used in 4 crates

MIT license

26KB
277 lines

html-extractor

A Rust crate for extracting data from HTML.

Examples

Extracting a simple value from HTML

use html_extractor::{html_extractor, HtmlExtractor};
html_extractor! {
    #[derive(Debug, PartialEq)]
    Foo {
        foo: usize = (text of "#foo"),
    }
}

fn main() {
    let input = r#"
        <div id="foo">1</div>
    "#;
    let foo = Foo::extract_from_str(input).unwrap();
    assert_eq!(foo, Foo { foo: 1 });
}

Extracting a collection from HTML

use html_extractor::{html_extractor, HtmlExtractor};
html_extractor! {
    #[derive(Debug, PartialEq)]
    Foo {
        foo: Vec<usize> = (text of ".foo", collect),
    }
}

fn main() {
    let input = r#"
        <div class="foo">1</div>
        <div class="foo">2</div>
        <div class="foo">3</div>
        <div class="foo">4</div>
    "#;
    let foo = Foo::extract_from_str(input).unwrap();
    assert_eq!(foo, Foo { foo: vec![1, 2, 3, 4] });
}

Extracting with regex

use html_extractor::{html_extractor, HtmlExtractor};
html_extractor! {
    #[derive(Debug, PartialEq)]
    Foo {
        (foo: usize,) = (text of "#foo", capture with "^foo=(.*)$"),
    }
}

fn main() {
    let input = r#"
        <div id="foo">foo=1</div>
    "#;
    let foo = Foo::extract_from_str(input).unwrap();
    assert_eq!(foo, Foo { foo: 1 });
}

Changelog

v0.4.0

Add presence of .. target specifier

v0.3.0

Add parser specifier
Add inner_html target specifier
Change the behavior when extracting text nodes to remove spaces at both ends.
Fix error message

v0.2.1

Fix the internal usage of the rust standard library

v0.2.0

Rename "collect specifier" to "collector specifier"
Add "optional" collector

v0.1.1

Fix the links in the documentation

Dependencies

~5–7MB
~132K SLoC