#parser #xml #derive #macro-derive #struct #automate

deserialize_xml

Facilitates parsing structs from XML, particularly via a derive macro to automate the implementation

2 unstable releases

0.2.1 Nov 19, 2022
0.2.0 Nov 18, 2022
0.1.0 Sep 22, 2022

#21 in #automate

GPL-3.0-or-later

18KB
103 lines

This crate provides tools to deserialize structs from XML; most notably, it provides a [derive macro][derive@DeserializeXml] to automate that process (by implementing DeserializeXml for you).

Note: the implementation is highly limited and inelegant. I wrote this purely to help power a feed reader I'm working on as a personal project; don't expect anything "production-ready..." (See the caveats below.)

Examples

Basic

Here's how you could use this crate to easily parse a very simple XML structure:

use deserialize_xml::DeserializeXml;

#[derive(Default, Debug, DeserializeXml)]
struct StringOnly {
title: String,
author: String,
}

let input = "<stringonly><title>Title</title><author>Author</author></stringonly>";
// `from_str` here was provided by `#[derive(DeserializeXml)]` above
let result = StringOnly::from_str(input).unwrap();
assert_eq!(result.title, "Title");
assert_eq!(result.author, "Author");

Advanced

This example shows more advanced functionality:

use deserialize_xml::DeserializeXml;

#[derive(Default, Debug, DeserializeXml)]
// This attribute indicates we should parse this struct upon encountering an <item> tag
#[deserialize_xml(tag = "item")]
struct StringOnly {
title: String,
author: String,
}

#[derive(Default, Debug, DeserializeXml)]
struct Channel {
title: String,
// This allows us to use an idiomatic name for the
// struct member instead of the raw tag name
#[deserialize_xml(tag = "lastUpdated")]
last_updated: String,
ttl: u32,
// (unfortunately, we need to repeat `tag = "item"` here for now)
#[deserialize_xml(tag = "item")]
entries: Vec<StringOnly>,
}

let input = r#"<channel>
<title>test channel please ignore</title>
<lastUpdated>2022-09-22</lastUpdated>
<ttl>3600</ttl>
<item><title>Article 1</title><author>Guy</author></item>
<item><title>Article 2</title><author>Dudette</author></item>
</channel>"#;

let result = Channel::from_str(input).unwrap();
assert_eq!(result.title, "test channel please ignore");
assert_eq!(result.last_updated, "2022-09-22");
assert_eq!(result.ttl, 3600);
assert_eq!(result.entries.len(), 2);
assert_eq!(result.entries[0].title, "Article 1");
assert_eq!(result.entries[0].author, "Guy");
assert_eq!(result.entries[1].title, "Article 2");
assert_eq!(result.entries[1].author, "Dudette");

Caveats

  • The support for Vec<T>/Option<T> is very limited at the moment. Namely, the macro performs a textual check to see if the member type is, e.g., Vec<T>; if so, it creates an empty vec and pushes the results of DeserializeXml::from_reader for the inner type (T) when it encounters the matching tag. Note the emphasis on textual check: the macro will fail if you "spell" Vec<T> differently (e.g., by aliasing it), or use your own container type. (The same limitations apply for Option<T>.)

  • The macro only supports structs.

  • An implementation of DeserializeXml is provided for Strings and numeric types (i.e. u8, i8, ...). To add support for your own type, see this section.

  • Struct fields of type Option<T>, where T is also a struct to which #[derive(DeserializeXml)] has been applied, are seemingly skipped during parsing unless the tag attribute is set correctly. (This might also arise in other edge cases, but this one is instructive.) This is easiest to illustrate with an example:

use deserialize_xml::DeserializeXml;

#[derive(Default, Debug, DeserializeXml)]
struct Post {
title: String,
// The inner type has a weird name, but the generated parser uses the field name
// by default, so it will look for <attachment> tags--all good, or so you think...
attachment: Option<WeirdName>,
};

#[derive(Default, Debug, DeserializeXml)]
#[deserialize_xml(tag = "attachment")] // (*) - necessary!
struct WeirdName {
path: String,
mime_type: String,
}

let input = r#"<post>
<title>A Modest Proposal</title>
<attachment>
<path>./proposal_banner.jpg</path>
<mime_type>image/jpeg</mime_type>
</attachment>
</post>"#;

// So far, this looks like a very standard example...
let result = Post::from_str(input).unwrap();
assert_eq!(result.title, "A Modest Proposal");
// ..but without the line marked (*) above, result.attachment is None!
let attachment = result.attachment.unwrap();
assert_eq!(attachment.path, "./proposal_banner.jpg");
assert_eq!(attachment.mime_type, "image/jpeg");

Without line (*), what goes wrong? Post::from_reader (which is called by Post::from_str) will look for <attachment> tags and dutifully call WeirdName::from_reader when it sees one. However, WeirdName::from_reader has no knowledge that someone else is referring to it as attachment, so the body of that implementation assumes it should only parse <weirdname> tags. Since it won't find any, we won't parse our <attachment>. By adding the #[deserialize_xml(tag = "attachment")] attribute to WeirdName, we ensure that the implementation of WeirdName::from_reader instead looks for <attachment> tags, not <weirdname> tags. Unfortunately, at the moment there is no convenient way to associate WeirdName with multiple tags.

Implementing DeserializeXml for your own struct

Of course, you can implement DeserializeXml yourself from scratch, but doing so tends to involve frequently repeating some boilerplate XML parser manipulation code. Instead, see the documentation and implementation of impl_deserialize_xml_helper for a more ergonomic way of handling the common case.

Dependencies

~2MB
~42K SLoC