5 releases
0.2.3 | Apr 22, 2023 |
---|---|
0.2.2 | Nov 3, 2019 |
0.2.1 | Jul 9, 2017 |
0.2.0 | Jul 8, 2017 |
0.1.0 | Jul 7, 2017 |
#2844 in Parser implementations
700 downloads per month
Used in uupdump
23KB
441 lines
TableExtract
TableExtract is a Rust library for extracting data from HTML tables. It is inspired by Perl's HTML::TableExtract.
Check out the crate documentation for more information.
Usage
TableExtract is on crates.io. To use it, just add this to your Cargo.toml
:
[dependencies]
table-extract = "0.2"
Contributing
Contributions are welcome! There are two things to keep in mind:
- This project uses the stable Rust toolchain from rustup.
- This project uses
cargo fmt
to keep the code tidy.
License
© 2019 Mitchell Kember
TableExtract is available under the MIT License; see LICENSE for details.
lib.rs
:
Utility for extracting data from HTML tables.
This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:
Table::find_first
finds the first table.Table::find_by_id
finds a table by its HTML id.Table::find_by_headers
finds a table that has certain headers.
Each of these returns an Option<
Table
>
, since there might not be any
matching table in the HTML. Once you have a table, you can iterate over it
and access the contents of each Row
.
Examples
Here is a simple example that uses Table::find_first
to print the cells
in each row of a table:
let html = r#"
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
println!(
"{} is {} years old",
row.get("Name").unwrap_or("<name missing>"),
row.get("Age").unwrap_or("<age missing>")
)
}
If the document has multiple tables, we can use Table::find_by_headers
to identify the one we want:
let html = r#"
<table></table>
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
for cell in row {
println!("Table cell: {}", cell);
}
}
Dependencies
~4.5–9.5MB
~102K SLoC