FAIR data in Aspergillus research

The FAIR datastation? A solution for Aspergillus research?

Sibbe Bakker
sibbe.bakker@wur.nl

To start out with a question

Where do you look up or share (research) data?

there is a poll here

  • Mention that you are studying a Msc. bioinformatics.

This presentation was made with quarto and R studio, please check them out!

What is the ASPAR_KR project again?

A project needed to…

  • Standardising research data.

  • Improve datasharing.

  • Note that you are doing this for your Master thesis Bioinformatics.

Aims and scope of ASPAR_KR

Introducing an existing standard.

We don’t want to do this [1]

Easy to understand and use.

Findings so far…

What standards are there?

  • No standards are fully ready yet.

  • Standards are impossible to develop without cooperation.

  • The FAIRDS [2] platform may be useful here.

FAIRDS, what is it for?

Standardisation of omics data.

Users register their study, and make it FAIR from the start.

  • For standardised data management.

  • Automated analysis pipelines.

Made by the friendly folks at the WUR’s synthetic system biology:

Jasper Koehorst

Bart Nijsen1

Peter J Schaap

  • WEB app.

  • Distributed as open source.

  • Excel based.

FAIRDS, how does it work?

  • Classes available within the ASPAR_KR database. Each class `owns’ lower level classes. For example, a sample has associated assays.
  • Each class is a sheet in an excel file.
  • Mention that the programme is written in java
    Meaning that most programmers can contribute to it.
  • Mention that I have permission to adapt a branch of the software.
  • Experimental design is part of the dataset.

  • Minimum information standards are used.

  • New templates can be introduced.

  • You make excel templates.
  • Fill these in with your data.
  • Upload them, FAIRDS makes RDF.

Investigation class

  • Title of the Investigation
  • Study authors.
  • Publication details.
  • Abstract.

Study class, sub question of Investigation.

  • Title of the study.
  • Aim and description.

Observation class, what a study observed.

  • Name and description
  • Observation level factors.

Sample class, what a study observed.

  • Name and description
  • Sample level factors.
  • Sample specific metadata

Assay class, what was measured.

  • Name and description
  • Assay specific metadata

A demonstration

That’s all well and good, but how does it work?

Data entry using a template.

Imagine the following situation.

  • A researcher takes air samples in Arhnem and Nijmegen.
  • He wants to know if the resistance fraction is higher in Arnhem or Nijmegen.
  • He uses Hylke et al [3] method of air sampling with the delta traps.

Hylke will explain more after me.

The delta trap method – Image by Bo Briggeman

  • Per city, 4 locations are sampled.

  • Using the two layer culture…

    • Strips are grown on Flamingo agar…
    • And Flamingo agar with ITR.
  • Lets see how to enter these things in ASPAR_KR.

+−
Leaflet

Map data from the open streetmap project [4].

The data set to be FAIRified.

Our FAIRification programme.

The FAIR data, how can we use this?

Analysis of FAIR data.

  • Great! We’ve done it, we made FAIR data.
  • How do we analyse it?

We need an

To explain linked data concepts

  • RDF files are plain text of various formats.

  • The basis is the triple.

An example of an RDF statement.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<https://example.com/data#statement> rdf:type "triple" .

This…

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<#green-goblin>
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .
<#spiderman>
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru .

Turns into this …

  • SPARQL is a programming language for analysis of RDF.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
PREFIX ex: <http//example.org/>
SELECT ?person ?enemy
WHERE
  {
    
    ?person a foaf:Person ;
      foaf:name ?name ;
      rel:enemyOf ?enemy .
    FILTER(STRSTARTS(?name, "Green")) .
  }

should return: “Green goblin” ex:spiderman

Analysis of RDF data.

Your Image

Using RDFlib [5, 6]:

# read in the RDF file.
rdf <- rdflib::rdf_parse("hylke_air_method_example/data.ttl",
                         format = "turtle")
rdf
Total of 425 triples, stored in hashes
-------------------------------
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://schema.org/description> "A two layer culture made from the delta trap taken from the station square in Nijmegen" .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://fairbydesign.nl/ontology/biosafety_level> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen> <http://schema.org/identifier> "arnhemVsNijmegen" .
<http://fairbydesign.nl/ontology/selection_medium> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenPark1> <http://fairbydesign.nl/ontology/antibiotics> "CHEMBL1835949" .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_arnhemAirSamples/sam_CultureArnhemStation2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://jermontology.org/ontology/JERMOntology#Sample> .
<http://fairbydesign.nl/ontology/biosafety_level> <http://schema.org/valueRequired> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://jermontology.org/ontology/JERMOntology#Sample> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://fairbydesign.nl/ontology/medium_type> "flamingo medium" .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_arnhemAirSamples/sam_arnhem2> <http://fairbydesign.nl/ontology/packageName> "DeltaTrap" .

... with 415 more triples
sparql_query <- 
  '
  prefix ppeo:     <http://purl.org/ppeo/PPEO.owl#> 
  prefix jerm:     <http://jermontology.org/ontology/JERMOntology#> 
  prefix fair:     <http://fairbydesign.nl/ontology/> 
  prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> 
  prefix schema:   <http://schema.org/>
  SELECT 
  ?observation_label ?sample_label 
  ?total_cfu
  ?selection_cfu
  WHERE {
    # Get the samples of interest
    ?observation_unit a ppeo:observation_unit .
    ?observation_unit jerm:hasPart ?parts .
    ?parts a jerm:Sample .
    ?parts fair:packageName "DeltaTrap" .
    ?parts fair:derives ?cultures .
    ?observation_unit schema:name ?observation_label .
    ?parts schema:name ?sample_label .

    # Experimental data
    ?cultures fair:total_cfu ?total_cfu .
    ?cultures fair:selection_cfu ?selection_cfu .
  }
  '
result <- rdflib::rdf_query(rdf, sparql_query)
result
# A tibble: 8 × 4
  observation_label    sample_label             total_cfu selection_cfu
  <chr>                <chr>                        <dbl>         <dbl>
1 The city of Nijmegen Nijmegen Station plein 2        57            21
2 The city of Nijmegen Nijmegen Station plein 1        53            24
3 The city of Nijmegen Kronenburger park 2             61            26
4 The city of Nijmegen Kronenburger park 1             70            30
5 The city of Arnhem   Arnhem Centraal 2               66            28
6 The city of Arnhem   Arnhem Centraal 1               55            18
7 The city of Arnhem   Sonsbeek park 2                 51            23
8 The city of Arnhem   Sonsbeek park 1                 52            20
# Plot the result.
result |> 
  dplyr::mutate(resistance_fraction = selection_cfu / total_cfu) |> 
  ggplot2::ggplot(ggplot2::aes(x = observation_label, y = resistance_fraction)) +
  ggplot2::labs(x = "City", y = "Resistance fraction") +
  ggplot2::geom_boxplot()

How do we go forward?

  • With more feedback, the programme can be made better.
  • We need feedback on the FAIRDS.
  • We need community engagement.

And…

Questions?

If you need more information?

Check out the git page.

Check out the documentation.

Thanks for

  • Martin Weichert, Bart Fraaije & Johanna Rhodes.
    Providing feedback on the prototype of ASPAR_KR.
  • Jasper Koehorst and Bart Nijsen
    Helping me contribute to FAIRDS.
  • Mariana and Anna
    Supervision during my stay at the genetics department.
  • Murambia Nyati.
    For providing comments on the presentation.

For general questions, reach out to sibbe.bakker@wur.nl

Additonal slides

Extra slides for extra questions

Examples of ASPAR_KR alternatives

alternative + -
seek4science * Supports ISA
* Sharing of templates online.
* A bit more complicated to contribute to.
* Not available locally via excel sheets.
FAIRshare * Locally available
* Limited in scope to genomics or immunology.
* Not clear how to expand it.

References

Used literature

1.
Standards. https://xkcd.com/927/. Accessed 26 Oct 2023
2.
Nijsse B, Schaap PJ, Koehorst JJ (2023) FAIR data station for lightweight metadata management and validation of omics studies. GigaScience 12:giad014. https://doi.org/10.1093/gigascience/giad014
3.
Kortenbosch HH, Van Leuven F, Zwaan BJ, Snelders E (2022) Catching more air: An effective and simple-to-use air sampling approach to assess aerial resistance fractions in Aspergillus Fumigatus. Microbiology
4.
OpenStreetMap contributors (2017) Planet dump retrieved from https://planet.osm.org
5.
Boettiger C, Mecum B, Krystalli A, Senderov V (2023) Rdflib: Tools to Manipulate and Query Semantic Data
6.
Jones MB, Slaughter P, Ooms J, et al (2023) Redland: RDF Library Bindings in R

Footnotes

  1. Does not want to share his likeness

II azole resistance international meeting
Source code of this presentation here

1 / 20
FAIR data in Aspergillus research The FAIR datastation? A solution for Aspergillus research? Sibbe Bakker sibbe.bakker@wur.nl

  1. Slides

  2. Tools

  3. Close
  • FAIR data in Aspergillus research
  • To start out with a question
  • What is the ASPAR_KR project again?
  • Aims and scope of ASPAR_KR
  • Findings so far…
  • FAIRDS, what is it for?
  • FAIRDS, how does it work?
  • A demonstration
  • Data entry using a template.
  • Analysis of FAIR data.
  • RDF files are plain...
  • SPARQL is a programming...
  • Analysis of RDF data.
  • How do we go forward?
  • Questions?
  • Additonal slides
  • Examples of ASPAR_KR alternatives
  • References
  • Used literature
  • Footnotes
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help