#compound #ncbi #parser

pubchem

Rust data structures and client for the PubChem API

2 releases

0.1.1 Jan 15, 2022
0.1.0 Jan 15, 2022

#4 in #compound

Download history 13/week @ 2024-09-18 21/week @ 2024-09-25 1/week @ 2024-11-13 1/week @ 2024-11-20 4/week @ 2024-11-27 8/week @ 2024-12-04 87/week @ 2024-12-11 17/week @ 2024-12-18 27/week @ 2024-12-25 232/week @ 2025-01-01

366 downloads per month
Used in proteinogenic

MIT license

53KB
978 lines

pubchem.rs Star me

Rust data structures and client for the PubChem REST API.

Actions Codecov License Source Crate Documentation Changelog GitHub issues

🔌 Usage

💊 Compound

Create a Compound to query the PubChem API for a single compound. It can be constructed from a compound ID, from a compound name, from an InChI or InChIKey, or from a SMILES string:

extern crate pubchem;

let alanine = pubchem::Compound::new(5950);
let aspirin = pubchem::Compound::with_name("aspirin");
let acetone = pubchem::Compound::with_inchi("InChI=1S/C3H6O/c1-3(2)4/h1-2H3");
let lysine  = pubchem::Compound::with_inchikey("KDXKERNSBIXSRK-YFKPBYRVSA-N");
let benzene = pubchem::Compound::with_smiles("C1=CC=CC=C1");

Use the methods to query the REST API with ureq. Dedicated methods exist for common single properties:

let alanine = pubchem::Compound::new(5950);

alanine.title().unwrap(); // "Alanine"
alanine.molecular_formula().unwrap(); // "C3H7NO2"
alanine.canonical_smiles().unwrap(); // "CC(C(=O)O)N"
alanine.isomeric_smiles().unwrap();  // "C[C@@H](C(=O)O)N"

Each method will perform a single query to the PubChem API, which is inefficient if you wish to retrieve several properties at once. In that case, use the properties method and select which properties you want to retrieve in a single query:

use pubchem::CompoundProperty::*;

let properties = pubchem::Compound::new(5950)
    .properties(&[Title, MolecularFormula, CanonicalSMILES])
    .unwrap();

properties.molecular_formula; // Some("C3H7NO2")
properties.canonical_smiles; // Some("CC(C(=O)O)N")
properties.isomeric_smiles; // Some("C[C@@H](C(=O)O)N")

To retrieve metadata from multiple compounds at once, use the Compounds struct and use the properties method to pack everything into a single query:

use pubchem::CompoundProperty::*;

// retrieve metadata from the three aromatic L-amino acids at once
for prop in pubchem::Compounds::new([6140, 145742, 6305])
    .properties(&[Title, IUPACName, ExactMass])
    .unwrap()
{
    println!(
        "[{cid}] {title} {iupac} {mass}g/mol",
        cid = prop.cid,
        title = prop.title.unwrap(),
        iupac = prop.iupac_name.unwrap(),
        mass = prop.exact_mass.unwrap(),
    );
}

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

🔍 See Also

If you're a bioinformatician and a Rustacean, you may be interested in these other libraries:

  • uniprot.rs: Rust data structures for the UniProtKB databases.
  • obofoundry.rs: Rust data structures for the OBO Foundry.
  • fastobo: Rust parser and abstract syntax tree for Open Biomedical Ontologies.

📜 License

This library is provided under the open-source MIT license.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the PubChem developers. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Dependencies

~12–20MB
~367K SLoC