Chapter 2 Data Organization and Validation

2.1 About Brain Imaging Data Structure (BIDS)

Neuroimaging data are massive and complicated and challenging to organize. Data from the scanner results in a large number of files per participant and later analyses generate even more files. Neuroimaging experiments result in complex data that can be arranged in many different ways. Researchers working in the same lab can opt to arrange their data in different and idiosyncratic ways. Time is wasted rearranging data and rewriting scripts. Using the BIDS formatting, you will save time and reduce errors/misunderstandings. BIDS is for everyone, and all users can take part in the benefits of organized data, reproducible research, and data sharing.

2.2 Benefits

Disorganized data can lead to data corruption, switching of participant images, mistakes, etc. Neuroimaging analyses can take years to complete and having to repeat these analyses because of dataset uncertainty is a tremendous waste of time and resources.

Other benefits of using BIDS includes:

  • It will be easy for another researcher to work on your data. To understand the organization of the files and their format you will only need to refer them to this document. This is especially important if you are running your own lab and anticipate more than one person working on the same data over time. By using BIDS, you will save time trying to understand and reuse data acquired by a graduate student or postdoc that has already left the lab.
  • There is a growing number of data analysis software packages that can understand data organized according to BIDS.
  • Databases such as OpenNeuro.org, LORIS, COINS, XNAT, SciTran, and others will accept and export datasets organized according to BIDS. If you ever plan to share your data publicly (nowadays some journals require this) you can speed up the curation process by using BIDS.
  • There are validation tools (also available online) that can check your dataset integrity and let you quickly spot missing values.

2.3 File Formats

BIDS focus is on raw NIFTI data (minimally processed), not source (e.g. , DICOM) or derived data (e.g., post-processed images). For imaging data, files should be in NIFTI format converted from DICOM using dcm2niix program. BIDS does include other files as well. Non-imaging data like demographics, neuropsychological data, etc. are kept in tabular files (.tsv). Imaging metadata is saved as JSON file. Other metadata files include data dictionary for tabular files, data description, etc.

2.4 Example dataset

STUDY_DIR/
|–– sub-001/
    |–– ses-1/
        |–– anat/
            |–– sub-001_ses-1_T1w.json
            |–– sub-001_ses-1_T1w.nii
            |–– sub-001_ses-1_T2w.json
            |–– sub-001_ses-1_T2w.nii
        |–– dwi/
            |–– sub-001_ses-1_acq-AP_dwi.json
            |–– sub-001_ses-1_acq-AP_dwi.nii.gz
            |–– sub-001_ses-1_acq-AP_dwi.bvec
            |–– sub-001_ses-1_acq-AP_dwi.bval
            |–– sub-001_ses-1_acq-PA_dwi.json
            |–– sub-001_ses-1_acq-PA_dwi.nii.gz
            |–– sub-001_ses-1_acq-PA_dwi.bvec
            |–– sub-001_ses-1_acq-PA_dwi.bval
        |–– func/
            |–– sub-001_ses-1_task-restEO_bold.json
            |–– sub-001_ses-1_task-restEO_bold.nii
            |–– sub-001_ses-1_task-restEC_bold.json
            |–– sub-001_ses-1_task-restEC_bold.nii
        |–– asl/
            |–– sub-001_ses-1_asl.json
            |–– sub-001_ses-1_asl.nii
            |–– sub-001_ses-1_cbf.json
            |–– sub-001_ses-1_cbf.nii
            |–– sub-001_ses-1_M0.json
            |–– sub-001_ses-1_M0.nii
    |–– ses-2/
        |–– anat/
            |–– sub-001_ses-2_T1w.json
            |–– sub-001_ses-2_T1w.nii
            |–– sub-001_ses-2_T2w.json
            |–– sub-001_ses-2_T2w.nii
        |–– dwi/
            |–– sub-001_ses-2_acq-AP_dwi.json
            |–– sub-001_ses-2_acq-AP_dwi.nii.gz
            |–– sub-001_ses-2_acq-AP_dwi.bvec
            |–– sub-001_ses-2_acq-AP_dwi.bval
            |–– sub-001_ses-2_acq-PA_dwi.json
            |–– sub-001_ses-2_acq-PA_dwi.nii.gz
            |–– sub-001_ses-2_acq-PA_dwi.bvec
            |–– sub-001_ses-2_acq-PA_dwi.bval
        |–– func/
            |–– sub-001_ses-2_task-restEO_bold.json
            |–– sub-001_ses-2_task-restEO_bold.nii
            |–– sub-001_ses-2_task-restEC_bold.json
            |–– sub-001_ses-2_task-restEC_bold.nii
        |–– asl/
            |–– sub-001_ses-2_asl.json
            |–– sub-001_ses-2_asl.nii
            |–– sub-001_ses-2_cbf.json
            |–– sub-001_ses-2_cbf.nii
            |–– sub-001_ses-2_M0.json
            |–– sub-001_ses-2_M0.nii
participants.tsv
dataset_description.json

2.5 Tabular Files

If creating .tsv is daunting, converting from comma-separated files (.csv) is easy. TSV file is a delimited text file that uses tabs to separate values. A CSV file uses a comma to separate values. Either is a simple file format used to store tabular data (numbers and text) in plain text. Each line of the file is a data record. Files in either TSV or CSV format can be imported to and exported from programs that store data in tables, such as Microsoft Excel, R, etc. The only participant property required is the participant_id which should match the sub- in your study.

participant_id   group   gender   age     WASI2FSIQ
sub-001          tbi     male     15.82   75
sub-002          oi      male     12.88   106
sub-003          oi      female   13.48   110
sub-004          tbi     female   9.33    72

2.6 Metadata

The data description file needs to contain information about the dataset and open licenses if applicable.

{
    "BIDSVersion": "1.0.0",
    "Name": "Mild Injury Outcome Study"
}

JSON files for the imaging data are automatically generated from the dcm2niix program and contain information about the scanner and sequence.

{
    "Modality": "MR",
    "MagneticFieldStrength": 3,
    "Manufacturer": "Siemens",
    "ManufacturersModelName": "Skyra",
    "InstitutionName": "Anonymous_Institution",
    "DeviceSerialNumber": "45603",
    "BodyPartExamined": "HEAD",
    "PatientPosition": "HFS",
    "ProcedureStepDescription": "HEAD_RESEARCH_BRAIN",
    "SoftwareVersions": "syngo_MR_E11",
    "MRAcquisitionType": "3D",
    "SeriesDescription": "SAG_IR-MPRAGE",
    "ProtocolName": "SAG_IR-MPRAGE",
    "ScanningSequence": "GR_IR",
    "SequenceVariant": "SK_SP_MP_OSP",
    "ScanOptions": "IR",
    "SequenceName": "_tfl3d1_16ns",
    "ImageType": ["ORIGINAL", "PRIMARY", "M", "ND", "NORM"],
    "SeriesNumber": 5,
    "AcquisitionTime": "19:07:19.265000",
    "AcquisitionNumber": 1,
    "SliceThickness": 1.2,
    "SAR": 0.0799092,
    "EchoTime": 0.00229,
    "RepetitionTime": 2.3,
    "InversionTime": 0.9,
    "FlipAngle": 8,
    "PartialFourier": 1,
    "BaseResolution": 256,
    "ShimSetting": [
        3993,
        -5236,
        -3657,
        0,
        -115,
        11,
        129,
        -2    ],
    "TxRefAmp": 349.117,
    "PhaseResolution": 1,
    "PhaseOversampling": 0.15,
    "ReceiveCoilName": "Spine_32",
    "ReceiveCoilActiveElements": "HE1-4;NE1,2;SP1",
    "PulseSequenceDetails": "%SiemensSeq%_tfl",
    "ConsistencyInfo": "N4_VE11A_LATEST_20140830",
    "PercentPhaseFOV": 90.625,
    "PhaseEncodingSteps": 266,
    "AcquisitionMatrixPE": 232,
    "ReconMatrixPE": 256,
    "ParallelReductionFactorInPlane": 2,
    "PixelBandwidth": 200,
    "DwellTime": 9.8e-06,
    "ImageOrientationPatientDICOM": [
        -0.0414933,
        0.902626,
        0.428422,
        -0.00450578,
        0.428618,
        -0.903475    ],
    "InPlanePhaseEncodingDirectionDICOM": "ROW",
    "ConversionSoftware": "dcm2niix",
    "ConversionSoftwareVersion": "v1.0.20171215 Clang9.0.0"
}

2.7 Checking Compliance

You can check whether the data are compliant with the BIDS format: http://incf.github.io/bids-validator.