1  Example data

I will use a metagenomic dataset as an example for the analysis in the following sections, helping you get familiar with downstream analysis of omics data and the usage of pctax.

Look at the Example data of a microbiome (simulated data):

library(pctax)
library(pcutils)
data(otutab, package = "pcutils")
# help(otutab)

Metadata

Firstly, the metadata. This is where we record essential information about our study samples. Typically, rows represent sample IDs, while columns encompass various macroscopic features of the samples. These features include experimental groups (control or treatment), sampling time, location, various environmental factors at the sampling point, and phenotypic characteristics of the host, among others.

print(metadata)
Id Group env1 env2 env3 env4 env5 env6 lat long
NS1 NS1 NS 3.057248 10.235708 5.554576 8.084997 25.007946 -1.1545668 26.94422 103.4767
NS2 NS2 NS 4.830219 11.134527 5.613455 8.556829 16.676898 0.8116874 29.08733 109.6196
NS3 NS3 NS 3.753133 10.062318 5.582916 10.226572 21.689255 1.4073321 28.25164 104.0361
NS4 NS4 NS 4.262264 10.844010 5.258419 9.002256 24.810460 1.4780532 33.82415 106.8651
NS5 NS5 NS 2.476135 7.525840 6.255314 9.357587 19.705527 0.0581309 33.51011 105.4571
NS6 NS6 NS 5.131004 10.827615 5.180966 8.141506 18.390209 -1.7003257 31.86864 102.7832
WS1 WS1 WS 4.690185 8.868384 5.534423 2.922556 13.066594 -0.9073270 25.67656 102.2946
WS2 WS2 WS 5.500007 8.270563 6.698076 3.711924 6.344009 -0.1699797 27.69990 106.0343
WS3 WS3 WS 3.220505 8.435364 7.462542 3.906052 15.703113 -1.5205620 28.04572 108.9124
WS4 WS4 WS 5.624307 7.174707 5.387799 2.777254 12.503655 1.6144087 33.86966 110.1844
WS5 WS5 WS 5.013274 7.678983 6.478364 3.527165 7.391619 -0.6876136 28.36314 107.0412
WS6 WS6 WS 6.321235 7.822989 6.262504 3.238742 10.298175 0.0661551 30.07997 105.0054
CS1 CS1 CS 5.242789 12.053449 8.383412 7.175002 17.666552 1.0230426 32.83965 103.8978
CS2 CS2 CS 5.402243 9.865916 6.760709 5.050641 19.775379 1.7248702 30.29499 101.6969
CS3 CS3 CS 5.474717 12.489934 5.729690 4.215989 16.861294 -0.8506381 29.90803 106.0819
CS4 CS4 CS 6.915080 12.492414 6.845870 5.280682 15.011610 0.5285857 31.87761 104.2137
CS5 CS5 CS 6.355684 13.085380 6.474958 5.893205 17.686923 -0.5588746 27.94134 103.1896
CS6 CS6 CS 6.381007 10.461389 7.432614 7.173710 17.387503 -0.0904096 35.29004 106.2336

Here, the metadata simulates a study on soil microbiome:

  • Id: Unique identifier for each sample (name).
  • Group: Experimental grouping (NS, WS, CS, and their actual meanings are not necessary for our simulation).
  • env1~6: Environmental factors at the sampling points (e.g., pH, temperature, humidity, etc.).
  • lat and long: Latitude and longitude recording the simulated sampling location (with no actual significance).

Feature abundance table

Next is the feature abundance table generated through upstream processing, such as microbial abundance in metagenomics, gene abundance in transcriptomics, or metabolite abundance in metabolomics.

Typically, rows represent the names of features, and columns represent sample names. It’s a common practice to align the column names of the abundance table exactly with the row names of the metadata. This alignment is highly advantageous for subsequent analyses.

head(otutab)
NS1 NS2 NS3 NS4 NS5 NS6 WS1 WS2 WS3 WS4 WS5 WS6 CS1 CS2 CS3 CS4 CS5 CS6
s__un_f__Thermomonosporaceae 1092 1920 810 1354 1064 1070 1252 1597 1330 941 1233 1011 2313 2518 1709 1975 1431 1527
s__Pelomonas_puraquae 1962 1234 2362 2236 2903 1829 644 495 1230 1284 953 635 1305 1516 844 1128 1483 1174
s__Rhizobacter_bergeniae 588 458 889 901 1226 853 604 470 1070 1028 846 670 1029 1802 1002 1200 1194 762
s__Flavobacterium_terrae 244 234 1810 673 1445 491 318 1926 1493 995 577 359 1080 1218 754 423 1032 1412
s__un_g__Rhizobacter 1432 412 533 759 1289 506 503 590 445 620 657 429 1132 1447 550 583 1105 903
s__un_o__Burkholderiales 886 683 824 912 1502 1029 235 252 359 381 387 351 551 540 477 559 513 496
s__un_g__Streptomyces 516 510 621 424 205 322 340 548 1590 776 493 508 624 757 560 1058 449 512
s__Lentzea_flaviverrucosa 424 1033 310 440 311 485 414 416 309 505 673 407 805 600 815 415 683 463
s__un_g__Actinoplanes 338 805 349 443 261 549 297 448 632 382 552 417 579 322 439 441 752 512
s__un_g__Rhizobium 369 357 684 774 1033 666 213 186 281 274 408 279 360 598 243 274 517 273
s__un_g__Noviherbaspirillum 321 344 317 364 561 364 470 386 235 415 351 184 435 497 511 419 320 383
s__un_f__Comamonadaceae 170 176 375 367 521 385 194 509 484 304 503 194 386 285 410 281 578 544
s__Bradyrhizobium_neotropicale 318 415 449 330 371 380 365 315 279 238 406 375 306 274 330 358 352 368
s__Streptomyces_ederensis 234 262 524 248 148 211 145 232 593 289 224 175 445 296 245 305 817 354
s__Actinocorallia_herbida 260 315 58 454 144 184 162 277 151 268 253 194 396 470 240 310 463 233
s__un_g__Amycolatopsis 198 429 90 258 154 150 81 115 59 184 106 107 243 284 99 142 1547 103
s__Actinophytocola_burenkhanensis 117 140 1152 58 30 64 268 140 74 186 175 125 139 31 296 251 201 368
s__un_p__Proteobacteria 210 173 144 130 87 192 256 193 182 171 227 273 220 183 325 252 251 320
s__Kribbella_catacumbae 152 370 194 121 99 129 174 194 163 166 196 158 209 313 195 295 377 222
s__un_o__Rhizobiales 202 205 322 237 235 215 254 161 147 161 178 215 156 183 222 166 146 203

Here, the otutab represents the abundance of each identified microbial species across all samples.

Feature annotation (optional)

Having both metadata and a feature abundance table allows for various analyses.

Sometimes, additional information comes in the form of feature annotation, containing details about each feature. For instance, in metagenomic data, this might include taxonomic information such as phylum, class, order, family, and genus for each microbial species. In transcriptomics, it could involve functional descriptions and classifications corresponding to gene IDs.

This additional layer of annotation enhances our understanding of the features being analyzed. Typically, aligning the row names of feature annotation with the row names of the feature abundance table is advantageous for subsequent analyses.

head(taxonomy)
Kingdom Phylum Class Order Family Genus Species
s__un_f__Thermomonosporaceae k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Thermomonosporaceae g__un_f__Thermomonosporaceae s__un_f__Thermomonosporaceae
s__Pelomonas_puraquae k__Bacteria p__Proteobacteria c__Betaproteobacteria o__Burkholderiales f__Comamonadaceae g__Pelomonas s__Pelomonas_puraquae
s__Rhizobacter_bergeniae k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae g__Rhizobacter s__Rhizobacter_bergeniae
s__Flavobacterium_terrae k__Bacteria p__Bacteroidetes c__Flavobacteriia o__Flavobacteriales f__Flavobacteriaceae g__Flavobacterium s__Flavobacterium_terrae
s__un_g__Rhizobacter k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae g__Rhizobacter s__un_g__Rhizobacter
s__un_o__Burkholderiales k__Bacteria p__Proteobacteria c__Betaproteobacteria o__Burkholderiales f__un_o__Burkholderiales g__un_o__Burkholderiales s__un_o__Burkholderiales
s__un_g__Streptomyces k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Streptomycetaceae g__Streptomyces s__un_g__Streptomyces
s__Lentzea_flaviverrucosa k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Pseudonocardiaceae g__Lentzea s__Lentzea_flaviverrucosa
s__un_g__Actinoplanes k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Micromonosporaceae g__Actinoplanes s__un_g__Actinoplanes
s__un_g__Rhizobium k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rhizobiales f__Rhizobiaceae g__Rhizobium s__un_g__Rhizobium
s__un_g__Noviherbaspirillum k__Bacteria p__Proteobacteria c__Betaproteobacteria o__Burkholderiales f__Oxalobacteraceae g__Noviherbaspirillum s__un_g__Noviherbaspirillum
s__un_f__Comamonadaceae k__Bacteria p__Proteobacteria c__Betaproteobacteria o__Burkholderiales f__Comamonadaceae g__un_f__Comamonadaceae s__un_f__Comamonadaceae
s__Bradyrhizobium_neotropicale k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rhizobiales f__Bradyrhizobiaceae g__Bradyrhizobium s__Bradyrhizobium_neotropicale
s__Streptomyces_ederensis k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Streptomycetaceae g__Streptomyces s__Streptomyces_ederensis
s__Actinocorallia_herbida k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Thermomonosporaceae g__Actinocorallia s__Actinocorallia_herbida
s__un_g__Amycolatopsis k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Pseudonocardiaceae g__Amycolatopsis s__un_g__Amycolatopsis
s__Actinophytocola_burenkhanensis k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Pseudonocardiaceae g__Actinophytocola s__Actinophytocola_burenkhanensis
s__un_p__Proteobacteria k__Bacteria p__Proteobacteria c__un_p__Proteobacteria o__un_p__Proteobacteria f__un_p__Proteobacteria g__un_p__Proteobacteria s__un_p__Proteobacteria
s__Kribbella_catacumbae k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Nocardioidaceae g__Kribbella s__Kribbella_catacumbae
s__un_o__Rhizobiales k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rhizobiales f__un_o__Rhizobiales g__un_o__Rhizobiales s__un_o__Rhizobiales

Here, the taxonomy includes taxonomic information for each species, providing valuable insights when exploring the composition of species.