Map data from the open streetmap project [4].
A look a the A. fumigatus field
Have you ever been confused about data?
Most research data is \(\dots\)
mostly stored in excel formats;
Fickle;
Impossible to validate;
In the best case, only well described for humans.
Delayed research;
Critical errors in data analysis [1];
Increased barrier to entry for labs that lack resources;
Impossible questions (missing metadata);
For important data this may directly affect peoples lives.
What requirements did I find?
We need a knowledge base \(\rightarrow\) ASPAR_KR.
Excel documents should be central to the solution.
A solution should not require too much technical backround.
It must integrate with existing standards such as ISA, MiXS and JERM.
The FAIRDS
Validates excel sheets against ENA standards.
Converts them to a stable and scalable format.
Allows addition of new standards via excel sheets.
Made by Nijsen et al. [2].
Going from excel \(\rightarrow\)
terse resource discriptor framework
.
Experimental design is part of the dataset.
Minimum information standards are used.
New templates can be introduced.
excel
templates.FAIRDS
makes RDF.Imagine the following situation.
Per city, 4 locations are sampled.
Using the two layer culture…
Lets see how to enter these things in FAIRDS
.
Map data from the open streetmap project [4].
The data set to be FAIRified.
The FAIRification programme.
The FAIR data, how can we use this?
Hylke’s air sampling data
PREFIX schema: <http://schema.org/>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix sf: <http://www.opengis.net/ont/sf#>
PREFIX cats: <http://cats.org/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
PREFIX fair: <http://fairbydesign.nl/ontology/>
prefix jerm: <http://jermontology.org/ontology/JERMOntology#>
SELECT *
WHERE {
# Get properties of the DeltaTrap.
?deltaTrap fair:packageName 'DeltaTrap' ;
<https://w3id.org/mixs/terms/0000011> ?arrival_date ;
geo:hasGeometry/geo:asWKT ?point .
?observationalUnit jerm:hasPart ?deltaTrap ;
schema:identifier ?obsId .
# Get properties of the culture
?deltaTrap fair:derives ?twoLayerCulture ;
fair:start_date ?start_date ;
fair:end_date ?end_date ;
<https://w3id.org/mixs/terms/0000011> ?analysis_date ;
# What is the amount of time the seals were exposed?
BIND(day(?end_date - ?start_date) as ?air_exposure_days)
# What is the transfer time?
BIND(day(?arrival_date - ?end_date) as ?transfer_time)
# Time untill analysis after arrival?
BIND(day(?analysis_date - ?arrival_date) as ?time_to_analysis)
# Distance from WUR
SERVICE <https://query.wikidata.org/sparql> {
# What things are a municipality?
?municipality wdt:P31 wd:Q2039348.
# What things have a place?
?municipality wdt:P625 ?placeOfInterest .
# Take only the thing that is near the place of interest.
FILTER(?municipality = wd:Q1305) . # Arnhem
}
BIND (geof:distance(?point, ?placeOfInterest, uom:kilometre) as ?d_km)
}
What lays ahead?
Get started today!
is your data FAIR yet?
Every bit helps.
Reccomendation
Departmental databases?
One central repository?
ASPAR_KR
Useages?
Besides just transparancy FAIR data can also be used for \(\dots\)
How is the genotype related to environmental factors?
Knowing the coordinates in a standard format is a good first step.
Can we predict the anti-fungal resistance of fungi better?
Having all resistance assays be described in one way is a good first step.
Researchers need to be mindful of their data.
An excel based workflow is prone to error.
FAIR data tooling like FAIRDS need to be adopted and improved.
Thank you!
Challenges encountered during the project
Why I picked the FAIRDS
Improvements to the FAIRDS
Interviewing people.
Contributing to the java
programme.
Running all of the software correctly
Understanding what people are doing in the lab.
There is no programme yet in widespread use for minimal metadata validation.
alternative | + | - |
---|---|---|
seek4science | * Supports ISA * Sharing of templates online. |
* A bit more complicated to contribute to. * Not available locally via excel sheets. |
FAIRshare | * Locally available |
* Limited in scope to genomics or immunology. * Not clear how to expand it. |
ENA and NCBI | * Allow upload of large datasets with searchable metadata | * Do not enforce the quality of their metadata. |
4tu and zenodo | * General storage of large datasets | * Only a small amount of metadata annotation is supported. |
Should allow more user-friendly addition of new standards.
Non-interactive batch mode should be supported.
Should inter-operate with more data formats:
Should inter-operate with more databases:
Complete refactor of the programme is required.
My thesis could not be made without the help of a lot of collaborators:
Jasper for offering his help with the
FAIRDS
. Anna and Mariana, for support and guidance during the thesis; Martin and Hylke, for their data and aiding with the usecases; Murambiwa, Christopher for their data and critical discussion; and Sijmen for the insight on the structure of my thesis.
Does not want to share his likeness