Chapter 7 Annotation

When you get the peaks table or features table, annotation of the peaks would help you. Check this review(Domingo-Almenara et al. 2018) for a detailed notes on annotation. They proposed five levels regarding currently computational annotation strategies.

  • Level 1: Peak Grouping: MS Psedospectra extraction based on peak shape similarity and peak abundance correlation

  • Level 2: Peak Annotation: Adducts, Neutral losses, isotopes, and other mass relationships based on mass distances

  • Level 3: Biochemical knowledge based on putative identification, potential biochemical reaction and related statistical analysis

  • Level 4: Use and intergration of tandem MS data based on data dependant/independent acquistion mode or in silico predction

  • Level 5: Retention time prediction based on library-available retention index or quantitative structure-retnetion relationships (QSRR) models.

Most of the softwares are at level 1 or 2. If we only have compounds structure, we could guess ions under different ionization method. If we have mass spectrum, we could match the mass spectral by a similarity analysis to the database. In metabolomics, we only have mass spectrum or mass-to-charge ratios. Single mass-to-charge ratio is not enough for identification. That’s the one bottleneck for annotation. So prediction is always performed on MS/MS data.

7.1 Issues in annotation

The major issue in annotation is the redundancy peaks from same metabolite. Unlike genomcis, peaks or featuers from peak selection are not independant with each other. Adducts, in-source fragments and isotopes would lead to missannotation. A commen solution is that use known adducts, neutral losses, molecular multimers or multipley charged ions to compare mass distances.

Another issue is about the MS/MS database. Only 10% of known metabolites in databases have experimental spectral data. Thus in silico prediction are required. Some works try to fill the gap between experimental data, theoretical values(from chemical database like chemspider) and prediction together. Here is a nice review about MS/MS prediction(Hufsky, Scheubert, and Böcker 2014).

7.2 Peak misidentification

  • Isomer

Use seperation methods such as chromatography, ion mobility MS, MS/MS. Reversed-phase ion-pairing chromatography and HILIC is useful.Chemical derivatization is another options.

  • Interfering compounds

20ppm is the least resolution and accuracy for HRMS.

  • In-source degradation products

7.3 Annotation v.s. identification

According to the defination from the Chemical Analysis Working Group of the Metabolomics Standards Intitvative(Sumner et al. 2007; Viant et al. 2017). Four levels of confidence could be assigned to identification:

  • Level 1 ‘identified metabolites’
  • Level 2 ‘Putatively annotated compounds’
  • Level 3 ‘Putatively characterised compound classes’
  • Level 4 ‘Unknown’

In practice, data analysis based annotation could reach level 2. For level 1, we need at extra methods such as MS/MS, retention time, accurate mass, 2D NMR spectra, and so on to confirm the compounds. However, standards are always required for solid proof.

7.4 Cheminformatics

  • RDKit Open-Source Cheminformatics Software
  • cdk The Chemistry Development Kit (CDK) is a scientific, LGPL-ed library for bio- and cheminformatics and computational chemistry written in Java.
  • Open Babel Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

7.5 MS Database for annotation

7.5.1 MS/MS

  • MoNA Platform to collect all other open source database

  • MassBank

  • GNPS use inner correlationship in the data and make network analysis at peaks’ level instand of annotated compounds to annotate the data(Wang et al. 2016).

  • ReSpect: phytochemicals

  • Metlin is another useful online application for annotation(Guijas et al. 2018).

  • LipidBlast: in silico prediction

  • MZcloud

  • NIST: Not free

7.5.2 MS

7.6 Compounds Database

  • PubChem is an open chemistry database at the National Institutes of Health (NIH).

  • Chemspider is a free chemical structure database providing fast text and structure search access to over 67 million structures from hundreds of data sources.

  • ChEBI is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.

  • RefMet A Reference list of Metabolite names.

  • CAS Largest substance database

7.7 Software

7.7.1 Adducts list

You could find adducts list here from commonMZ project.

7.7.2 Molgen

molgen generating all structures (connectivity isomers, constitutions) that correspond to a given molecular formula, with optional further restrictions, e.g. presence or absence of particular substructures.

7.7.3 Isotope

Isotope pattern prediction

7.7.4 mfFinder

mfFinder predict formula based on accurate mass

7.7.5 CAMERA

Common annotation for xcms workflow(Kuhl et al. 2012).

7.7.6 RAMClustR

The software could be found here(Broeckling et al. 2014). The package included a vignette as usages. Use the following code to read:

7.7.7 pmd

Paired Mass Distance(PMD) analysis for GC/LC-MS based nontarget analysis to find independant peaks(???)

7.7.8 nontarget

nontarget Isotope & adduct peak grouping, homologue series detection

7.7.9 xMSannotator

The software could be found here(Uppal, Walker, and Jones 2017).

7.7.10 CFM-ID

CFM-ID use Metlin’s data to make prediction(Allen et al. 2014).

7.7.11 MINE

MINE is an open access database of computationally predicted enzyme promiscuity products for untargeted metabolomics. The annotation would be accurate for general compounds database.

7.7.12 InterpretMSSpectrum

This package is for annotate and interpret deconvoluted mass spectra (mass*intensity pairs) from high resolution mass spectrometry devices. You could use this package to find molecular ions for GC-MS.

7.7.13 For Ident

For-ident could give a score for identification with the help of logD(relative retention time) and/or MS/MS.

7.7.15 mz.unity

You could find source code here(Mahieu et al. 2016) and it’s for detecting and exploring complex relationships in accurate-mass mass spectrometry data.

7.7.16 MAIT

You could find source code here(Fernández-Albert et al. 2014).

7.7.17 ProbMetab

Provides probability ranking to candidate compounds assigned to masses, with the prior assumption of connected sample and additional previous and spectral information modeled by the user. You could find source code here(Silva et al. 2014).

7.7.18 RAMSI

You could find paper here(Baran and Northen 2013).

7.7.19 Sirius

Sirius is a new java-based software framework(Dührkop et al. 2015) for discovering a landscape of de-novo identification of metabolites using single and tandem mass spectrometry. It could be used with CSI:FingerID.

7.7.20 MI-Pack

You could find python software here(Weber and Viant 2010)

7.7.21 Plantmat

excel library based pridiction for plant metabolites(Qiu et al. 2016).

7.7.22 MetFamily

Shiny app for MS and MS/MS data annotation(Treutler et al. 2016).

7.7.23 Lipidmatch

in silico: in silico lipid mass spectrum search(Koelmel et al. 2017).

7.7.24 MolFind

JAVA based MolFind could make annotation for unknown chemical structure by prediction based on RI, ECOM50, drift time and CID spectra(Menikarachchi et al. 2012).

7.7.25 MetFusion

Java based integration of compound identification strategies. You could access the application here(Gerlich and Neumann 2013).

7.7.26 iMet

This online application is a network-based computation method for annotation(Aguilar-Mogas et al. 2017).

7.7.27 Metscape

Metscape based on Debiased Sparse Partial Correlation (DSPC) algorithm(Basu et al. 2017) to make annotation.

7.7.28 MetFrag

MetFrag could be used to make in silico prediction/match of MS/MS data(Ruttkies et al. 2016).

7.7.29 LipidFrag

LipidFrag could be used to make in silico prediction/match of lipid related MS/MS data(Witting et al. 2017).

7.7.30 MycompoundID

MycompoundID could be used to search known and unknown metabolites(Li et al. 2013) online.

7.7.31 magma

magma could predict and match MS/MS files.

7.7.32 MetExpert

MetExpert is an expert system to assist users with limited expertise in informatics to interpret GCMS data for metabolite identification without querying spectral databases(Qiu, Lei, and Sumner 2018)

7.7.33 MS-DIA

  • decoMS2 requires two different collision energies, low (usually 0V) and high, in each precursor range to solve the mathematical equations.(Nikolskiy et al. 2013)

  • MS-DIAL data independent MS/MS deconvolution for comprehensive metabolome analysis.(Tsugawa et al. 2015)

  • MetDIA Targeted Metabolite Extraction of Multiplexed MS/MS Spectra Generated by Data-Independent Acquisition(???)

  • DIA-Umpire comprehensive computational framework for data-independent acquisition proteomics(Tsou et al. 2015)

  • MetaboDIA quantitative metabolomics analysis using DIA-MS(Chen et al. 2017)

  • SWATHtoMRM Development of High-Coverage Targeted Metabolomics Method Using SWATH Technology for Biomarker Discovery(Zha et al. 2018)


Aguilar-Mogas, Antoni, Marta Sales-Pardo, Miriam Navarro, Roger Guimerà, and Oscar Yanes. 2017. “iMet: A Network-Based Computational Tool to Assist in the Annotation of Metabolites from Tandem Mass Spectra.” Anal. Chem. 89 (6): 3474–82.

Allen, Felicity, Allison Pon, Michael Wilson, Russ Greiner, and David Wishart. 2014. “CFM-ID: A Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra.” Nucleic Acids Res 42 (W1): W94–W99.

Baran, Richard, and Trent R. Northen. 2013. “Robust Automated Mass Spectra Interpretation and Chemical Formula Calculation Using Mixed Integer Linear Programming.” Anal. Chem. 85 (20): 9777–84.

Basu, Sumanta, William Duren, Charles R. Evans, Charles F. Burant, George Michailidis, and Alla Karnovsky. 2017. “Sparse Network Modeling and Metscape-Based Visualization Methods for the Analysis of Large-Scale Metabolomics Data.” Bioinformatics 33 (10): 1545–53.

Broeckling, C. D., F. A. Afsar, S. Neumann, A. Ben-Hur, and J. E. Prenni. 2014. “RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data.” Anal. Chem. 86 (14): 6812–7.

Chen, Gengbo, Scott Walmsley, Gemmy C. M. Cheung, Liyan Chen, Ching-Yu Cheng, Roger W. Beuerman, Tien Yin Wong, Lei Zhou, and Hyungwon Choi. 2017. “Customized Consensus Spectral Library Building for Untargeted Quantitative Metabolomics Analysis with Data Independent Acquisition Mass Spectrometry and MetaboDIA Workflow.” Anal. Chem. 89 (9): 4897–4906.

Domingo-Almenara, Xavier, J. Rafael Montenegro-Burke, H. Paul Benton, and Gary Siuzdak. 2018. “Annotation: A Computational Solution for Streamlining Metabolomics Analysis.” Anal. Chem. 90 (1): 480–89.

Dührkop, Kai, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. 2015. “Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID.” PNAS 112 (41): 12580–5.

Fernández-Albert, Francesc, Rafael Llorach, Cristina Andrés-Lacueva, and Alexandre Perera. 2014. “An R Package to Analyse LC/MS Metabolomic Data: MAIT (Metabolite Automatic Identification Toolkit).” Bioinformatics 30 (13): 1937–9.

Gerlich, Michael, and Steffen Neumann. 2013. “MetFusion: Integration of Compound Identification Strategies.” J. Mass Spectrom. 48 (3): 291–98.

Guijas, Carlos, J. Rafael Montenegro-Burke, Xavier Domingo-Almenara, Amelia Palermo, Benedikt Warth, Gerrit Hermann, Gunda Koellensperger, et al. 2018. “METLIN: A Technology Platform for Identifying Knowns and Unknowns.” Anal. Chem. 90 (5): 3156–64.

Hufsky, Franziska, Kerstin Scheubert, and Sebastian Böcker. 2014. “Computational Mass Spectrometry for Small-Molecule Fragmentation.” TrAC Trends in Analytical Chemistry 53 (January): 41–48.

Koelmel, Jeremy P., Nicholas M. Kroeger, Candice Z. Ulmer, John A. Bowden, Rainey E. Patterson, Jason A. Cochran, Christopher W. W. Beecher, Timothy J. Garrett, and Richard A. Yost. 2017. “LipidMatch: An Automated Workflow for Rule-Based Lipid Identification Using Untargeted High-Resolution Tandem Mass Spectrometry Data.” BMC Bioinformatics 18 (July): 331.

Kuhl, Carsten, Ralf Tautenhahn, Christoph Böttcher, Tony R. Larson, and Steffen Neumann. 2012. “CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets.” Anal. Chem. 84 (1): 283–89.

Li, Liang, Ronghong Li, Jianjun Zhou, Azeret Zuniga, Avalyn E. Stanislaus, Yiman Wu, Tao Huan, et al. 2013. “MyCompoundID: Using an Evidence-Based Metabolome Library for Metabolite Identification.” Anal. Chem. 85 (6): 3401–8.

Mahieu, Nathaniel G., Jonathan L. Spalding, Susan J. Gelman, and Gary J. Patti. 2016. “Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.Unity Algorithm.” Anal. Chem. 88 (18): 9037–46.

Menikarachchi, Lochana C., Shannon Cawley, Dennis W. Hill, L. Mark Hall, Lowell Hall, Steven Lai, Janine Wilder, and David F. Grant. 2012. “MolFind: A Software Package Enabling HPLC/MS-Based Identification of Unknown Chemical Structures.” Anal. Chem. 84 (21): 9388–94.

Nikolskiy, Igor, Nathaniel G. Mahieu, Ying-Jr Chen, Ralf Tautenhahn, and Gary J. Patti. 2013. “An Untargeted Metabolomic Workflow to Improve Structural Characterization of Metabolites.” Anal. Chem. 85 (16): 7713–9.

Qiu, Feng, Dennis D. Fine, Daniel J. Wherritt, Zhentian Lei, and Lloyd W. Sumner. 2016. “PlantMAT: A Metabolomics Tool for Predicting the Specialized Metabolic Potential of a System and for Large-Scale Metabolite Identifications.” Anal. Chem. 88 (23): 11373–83.

Qiu, Feng, Zhentian Lei, and Lloyd W. Sumner. 2018. “MetExpert: An Expert System to Enhance Gas Chromatography-Mass Spectrometry-Based Metabolite Identifications.” Analytica Chimica Acta, Analytical Metabolomics, 1037 (December): 316–26.

Ruttkies, Christoph, Emma L. Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. 2016. “MetFrag Relaunched: Incorporating Strategies Beyond in Silico Fragmentation.” Journal of Cheminformatics 8 (January): 3.

Silva, Ricardo R., Fabien Jourdan, Diego M. Salvanha, Fabien Letisse, Emilien L. Jamin, Simone Guidetti-Gonzalez, Carlos A. Labate, and Ricardo Z. N. Vêncio. 2014. “ProbMetab: An R Package for Bayesian Probabilistic Annotation of LCMS-Based Metabolomics.” Bioinformatics 30 (9): 1336–7.

Sumner, Lloyd W., Alexander Amberg, Dave Barrett, Michael H. Beale, Richard Beger, Clare A. Daykin, Teresa W.-M. Fan, et al. 2007. “Proposed Minimum Reporting Standards for Chemical Analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI).” Metabolomics 3 (3): 211–21.

Treutler, Hendrik, Hiroshi Tsugawa, Andrea Porzel, Karin Gorzolka, Alain Tissier, Steffen Neumann, and Gerd Ulrich Balcke. 2016. “Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies.” Anal. Chem. 88 (16): 8082–90.

Tsou, Chih-Chiang, Dmitry Avtonomov, Brett Larsen, Monika Tucholska, Hyungwon Choi, Anne-Claude Gingras, and Alexey I. Nesvizhskii. 2015. “DIA-Umpire: Comprehensive Computational Framework for Data-Independent Acquisition Proteomics.” Nat. Methods 12 (3): 258–64.

Tsugawa, Hiroshi, Tomas Cajka, Tobias Kind, Yan Ma, Brendan Higgins, Kazutaka Ikeda, Mitsuhiro Kanazawa, Jean VanderGheynst, Oliver Fiehn, and Masanori Arita. 2015. “MS-DIAL: Data-Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis.” Nat Meth 12 (6): 523–26.

Uppal, Karan, Douglas I. Walker, and Dean P. Jones. 2017. “xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.” Anal. Chem. 89 (2): 1063–7.

Viant, Mark R, Irwin J Kurland, Martin R Jones, and Warwick B Dunn. 2017. “How Close Are We to Complete Annotation of Metabolomes?” Current Opinion in Chemical Biology, Omics, 36 (February): 64–69.

Wang, Mingxun, Jeremy J. Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen, et al. 2016. “Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking.” Nat. Biotechnol. 34 (8): 828–37.

Weber, Ralf J. M., and Mark R. Viant. 2010. “MI-Pack: Increased Confidence of Metabolite Identification in Mass Spectra by Integrating Accurate Masses and Metabolic Pathways.” Chemometrics and Intelligent Laboratory Systems, OMICS, 104 (1): 75–82.

Witting, Michael, Christoph Ruttkies, Steffen Neumann, and Philippe Schmitt-Kopplin. 2017. “LipidFrag: Improving Reliability of in Silico Fragmentation of Lipids and Application to the Caenorhabditis Elegans Lipidome.” PLOS ONE 12 (3): e0172311.

Zha, Haihong, Yuping Cai, Yandong Yin, Zhuozhong Wang, Kang Li, and Zheng-Jiang Zhu. 2018. “SWATHtoMRM: Development of High-Coverage Targeted Metabolomics Method Using SWATH Technology for Biomarker Discovery.” Anal. Chem. 90 (6): 4062–70.