5 releases
0.2.3 | Apr 22, 2023 |
---|---|
0.2.2 | Nov 3, 2019 |
0.2.1 | Jul 9, 2017 |
0.2.0 | Jul 8, 2017 |
0.1.0 | Jul 7, 2017 |
#2878 in Parser implementations
928 downloads per month
Used in uupdump
23KB
441 lines
Utility for extracting data from HTML tables.
This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:
Table::find_first
finds the first table.Table::find_by_id
finds a table by its HTML id.Table::find_by_headers
finds a table that has certain headers.
Each of these returns an Option<
Table
>
, since there might not be any
matching table in the HTML. Once you have a table, you can iterate over it
and access the contents of each Row
.
Examples
Here is a simple example that uses Table::find_first
to print the cells
in each row of a table:
let html = r#"
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
println!(
"{} is {} years old",
row.get("Name").unwrap_or("<name missing>"),
row.get("Age").unwrap_or("<age missing>")
)
}
If the document has multiple tables, we can use Table::find_by_headers
to identify the one we want:
let html = r#"
<table></table>
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
for cell in row {
println!("Table cell: {}", cell);
}
}
TableExtract
TableExtract is a Rust library for extracting data from HTML tables. It is inspired by Perl's HTML::TableExtract.
Check out the crate documentation for more information.
Usage
TableExtract is on crates.io. To use it, just add this to your Cargo.toml
:
[dependencies]
table-extract = "0.2"
Contributing
Contributions are welcome! There are two things to keep in mind:
- This project uses the stable Rust toolchain from rustup.
- This project uses
cargo fmt
to keep the code tidy.
License
© 2019 Mitchell Kember
TableExtract is available under the MIT License; see LICENSE for details.
Dependencies
~5–11MB
~105K SLoC