3 releases
new 0.1.2 | Jan 26, 2025 |
---|---|
0.1.1 | Jan 22, 2025 |
0.1.0 | Jan 22, 2025 |
#1616 in Text processing
331 downloads per month
680KB
351 lines
markitdown-rs
markitdown-rs is a Rust library designed to facilitate the conversion of various document formats into markdown text. It is a Rust implementation of the original markitdown Python library.
Features
It supports:
- Excel(.xlsx)
- Word(.docx)
- PowerPoint
- Images
- Audio
- HTML
- Text-based formats (plain text, .csv, .xml, .rss, .atom)
- ZIP
Usage
Command-Line
Installation
cargo install markitdown
Convert a File
markitdown path-to-file.pdf
Or use -o to specify the output file:
markitdown path-to-file.pdf -o document.md
Rust API
Installation
Add the following to your Cargo.toml
:
[dependencies]
markitdown = "0.1.2"
Initialize MarkItDown
use markitdown::MarkItDown;
let mut md = MarkItDown::new();
Convert a File
use markitdown::{ConversionOptions, DocumentConverterResult};
let options = ConversionOptions {
file_extension: Some(".xlsx".to_string()),
url: None,
};
let result: Option<DocumentConverterResult> = md.convert("path/to/file.xlsx", Some(options));
if let Some(conversion_result) = result {
println!("Converted Text: {}", conversion_result.text_content);
} else {
println!("Conversion failed or unsupported file type.");
}
Register a Custom Converter
You can extend MarkItDown by implementing the DocumentConverter
trait for your custom converters and registering them:
use markitdown::{DocumentConverter, MarkItDown};
struct MyCustomConverter;
impl DocumentConverter for MyCustomConverter {
// Implement the required methods here
}
let mut md = MarkItDown::new();
md.register_converter(Box::new(MyCustomConverter));
License
MarkItDown is licensed under the MIT License. See LICENSE
for more details.
Dependencies
~25–34MB
~488K SLoC