Chapter 11 Demo

11.1 Project Setup

I suggest building your data analysis projects in RStudio(Click File - New project - New dictionary - Empty project). Then assign a name for your project. I also recommend the following tips if you are familiar with it.

  • Use git/github to make version control of your code and sync your project online.

  • Don’t use your name for your project because other peoples might cooperate with you and someone might check your data when you publish your papers. Each project should be a work for one paper or one chapter in your thesis.

  • Use workflow document(txt or doc) in your project to record all of the steps and code you performed for this project. Treat this document as digital version of your experiment notebook

  • Use data folder in your project folder for the raw data and the results you get in data analysis

  • Use figure folder in your project folder for the figure

  • Use munuscript folder in your project folder for the manuscript (you could write paper in rstudio with the help of template in Rmarkdown)

  • Just double click [yourprojectname].Rproj to start your project

11.2 Data input

xcms does not support all of the raw files formate from every mass spectrometry. You need to convert your raw data into some open-source data format such as mzData, mzXML or CDF files. The tool is MSconvert from ProteoWizard.

11.4 Wrap function

Here we would use the optimized parameters for peak picking, retention time correction and peaks filling.

load('para.RData')
library(xcms)
library(stringr)
getrtmz <- function(path,index = NULL){
  peakwidth <- c(para$min_peakwidth,para$max_peakwidth)
  ppm <- para$ppm
  noise <- para$noise
  snthresh <- para$snthresh
  mzdiff <- para$mzdiff
  prefilter <- c(para$prefilter,para$value_of_prefilter)
  integrate <- para$integrate
  profStep <- para$profStep
  center <- para$center
  response <- para$response
  gapInit <- para$gapInit
  gapExtend <- para$gapExtend
  factorDiag <- para$factorDiag
  factorGap <- para$factorGap
  localAlignment <- para$localAlignment
  bw <- para$bw
  mzwid <- para$mzwid
  minfrac <- para$minfrac
  minsamp <- para$minsamp
  max <- para$max
  distFunc <- para$distFunc
  plottype <- para$plottype
  retcorMethod <- para$retcorMethod
  mzCenterFun <- para$mzCenterFun
  fitgauss <- para$fitgauss
  verboseColumns <- para$verbose.columns
  files <- list.files(path,full.names = T,recursive = T)
  if(!is.null(index)){
    files <- files[index]
  }
  xset <- xcmsSet(files,
                  method = "centWave",
                  peakwidth       = peakwidth,
                  ppm             = ppm,
                  noise           = noise,
                  snthresh        = snthresh,
                  mzdiff          = mzdiff,
                  prefilter       = prefilter,
                  mzCenterFun     = mzCenterFun,
                  integrate       = integrate,
                  fitgauss        = fitgauss,
                  verbose.columns = verboseColumns)
  xset <- retcor( 
    xset,
    method         = retcorMethod,
    plottype       = plottype,
    distFunc       = distFunc,
    profStep       = profStep,
    center         = center,
    response       = response,
    gapInit        = gapInit,
    gapExtend      = gapExtend,
    factorDiag     = factorDiag,
    factorGap      = factorGap,
    localAlignment = localAlignment)
  xset <- group( 
    xset,
    method  = "density",
    bw      = bw,
    mzwid   = mzwid,
    minfrac = minfrac,
    minsamp = minsamp,
    max     = max)
  
  xset <- fillPeaks(xset)
  return(xset)
}

11.4.1 Peak picking

The first step to process the MS data is that find the peaks against the noises. In xcms, all of related staffs are handled by xcmsSet function.

For any functions in xcms or R, you could get their documents by type ? before certain function. Another geek way is input the name of the function in the console of Rstudio and press F1 for help.

In the document of xcmsset, we could set the sample classes, profmethod, profparam, polarity,etc. In the online version, such configurations are shown in certain windows. In the local analysis environment, such parameters are setup by yourselves. However, I think the default configurations could satisfied most of the analysis because related information should have been recorded in your Raw data and xcms could find them. All you need to do is that show the data dictionary for xcmsSet.

If your data have many groups such as control and treated group, just put them in separate subfolder of the data folder and xcmsSet would read them as separated groups.

11.4.2 Data correction

Reasons of data correction might come from many aspects such as the unstable instrument and pollution on column. In xcms, the most important correction is retention time correction. Remember the original retention time might changed and use another object to save the new object.

11.4.3 Peaks filling

After the retention time correction, filling the missing peaks could be done by fillpeaks. Peaks filling could avoid two many NAs for false statistical analysis. The algorithm could use the baseline signal to impute the data.

11.6 Peaks filtering

After we get the peaks list, it’s nessary to filter the peaks list to retain the peaks with high quality for further analysis. Normally, we could use the relative standard deviation of blank and pooled QC samples to control the peaks list.

11.10 Pathway Analysis

We could output the files for online pathway analysis with mummichog algorithm.

11.11 MetaboAnalyst

Actully, after you perform data correction, you have got the data matrix for statistic analysis. You might choose MetaboAnalyst online or offline to make furthor analysis, which supplied more statistical choices than xcms.

The input data format for MetaboAnalyst should be rows for peaks and colomns for samples. You could also add groups infomation if possible. Use the following code to get the data for analysis.

11.12 Summary

This is the offline metaboliomics data process workflow. For each study, details would be different and F1 is always your best friend.

Enjoy yourself in data mining!