Chapter 18 Data donation / Transfer

If you are considering sharing your data with ExELang, you are probably in one of the following situations:

  1. You already have collected data and you want to share it with us; or
  2. You don’t have data yet but you are willing to collect it and share it with us.

18.1 You don’t have data yet

Let’s start with the easier option: you haven’t collected data yet. In this case, we’ll typically need to sign a bilateral agreement. This is ideal because we are going to agree on several things:

  • you’ll always keep ownership (and main responsibility) over the data;
  • what you get from us can include expertise hours, equipment, data processing, and funding for logistics;
  • what we would like to get is the possibility to re-analyze your data within the scope of ExELang.

Please note that in the agreement, we’ll agree on an appropriate scope for intellectual property. At ExELang, we are specifically interested in relating children’s experiences with how they talk, so we will ask for the ability to analyze and publish about this one point. If that covers your own research goals, we will discuss how to frame intellectual property so that it is ideal for both of us.

If this sounds complicated, you can always just start collecting data on your own, and eventually get back to us when you have your data – in which case, you can watch the video “you already have data”.

18.2 You already have data

If you already have data, there are four different options:

18.2.1 You have already shared your data in a scientific repository

Great, we only need to access you data!

18.2.2 You haven’t shared your data in a scientific repository yet but you are willing to do so

This is our preferred option! Sharing data in a scientific repository might be good for you and your study because it allows you to get many potential collaborators and citations by making the effort of uploading your data only once. Moreover, you are contributing to make science an open source for everybody.

If you are not familiar with scientific repositories online, we can give you suggestions based on our experience with each one of the repositories, so to make it easier for you to decide which one works best for you.

Feel free to comment if you have other questions for us!

Repository name Project file Formatting How to update data Data access by non-curators Other features
[Open Science Framework] (https://osf.io) No specific requirements for files or project structure Via browser A choice of: None (completely private), Invited people can read (and write), Anyone can read Plugins for software such as Google Drive, GitHub (GitHub, 2018), and others; Storage in USA or Europe
[Databrary] (https://nyu.databrary.org) Some aspects of project structure specified Via browser A choice of: None (completely private), Invited people can read (and write), Anyone can read Data annotation with Datavyu software (Datavyu Team, 2014), some APIs
[HomeBank] (https://homebank.talkbank.org) Project structure and file structure must follow one specific format Through personal contact with HomeBank personnel A choice of: None (completely private); Any HomeBank member can read; Selected HomeBank members can read; Anyone can read Data annotation and analyses with CLAN software
[The Language Archive] (https://archive.mpi.nl/tla/) Some aspects of project structure specified Via browser A choice of: None (completely private), Invited people can read (and write), Anyone can read Data annotation with ELAN software

18.2.3 You don’t want to upload your data on a scientific repository but you are willing to create a license

A License is an official permission granted by the owner of some Work (the “Licensor”) to other people (the “Licensee”) and governing how the Licensee is allowed to use the Licensor’s Work. This option allows you to make the effort of establishing rules for potential collaborators/citations once, and then re-use this license with others.

When you create your license, you make the rules, so you can specify whatever you feel is best. See the resources section for some examples.

You can also use open source licences, but you need to know that they allow for re-distribution (i.e., the people who get hands on your corpus can re-distribute it). Just in case, we made a list of the most commonly used ones:

  • GNU: The GNU General Public Licence (GPL) is probably one of the most commonly used licenses for open-source projects. The GPL grants and guarantees a wide range of rights to developers who work on open-source projects. Basically, it allows users to legally copy, distribute and modify software. It’s a copyleft license: it means that the derived works need to have the same type of license.

  • MIT: The MIT License is the least restrictive license out there. It basically says that anyone can do whatever they want with the licensed material, as long as it’s accompanied by the license.

  • APACHE: The Apache License, Version 2.0, grants a number of rights to users. These rights can be applied to both copyrights and patents. 1) Rights are perpetual, 2) Rights are worldwide, 3) Rights are granted for no fee or royalty, 4) Rights are non-exclusive, 5) Rights are irrevocable, 6) Redistributing code also has special requirements, mostly pertaining to giving proper credit to those who have worked on the code and to maintaining the same license. It’s a non copyleft license: it means that the works derived can have a different type of license.

  • CREATIVE COMMONS: Creative Commons (CC) licenses aren’t quite open-source licenses, but they are commonly used for design projects. A wide variety of CC licenses is available, each granting certain rights. A CC license has four basic parts, which can be enacted individually or in combination.

    • Attribution. The author must be attributed as the creator of the work. Beyond that, the work can be modified, distributed, copied and otherwise used.
    • Share Alike. The work can be modified, distributed and so forth, but only under the same CC license.
    • Non-Commercial. The work can be modified, distributed and so on, but not for commercial purposes.
    • No Derivative Works. This means you can copy and distribute the licensed work, but you can’t modify it in any way or create work based on the original.

These parts of the CC license terms can be combined. The most restrictive license would be the “Attribution, Non-Commercial, No Derivatives” license, which means that you can freely share the work, but not change it or charge for it, and you must attribute it to the creator. This is a good license to get your work out there but still maintain more or less complete control over how it is used. The least restrictive would be the “Attribution” license, which means that as long as people credit you, they can do whatever they like with the work.

18.2.4 None of the above mentioned options works well for you

In this case, we are back to the bilateral agreement. This is ideal if we want to agree on something very specific (e.g., we want to make an exchange – for instance, we fund your re-consenting the families who participated, in exchange for the possibility of re-using the data). Please note that in the agreement, you’ll always keep ownership over the data; and that we’ll agree on an appropriate scope for intellectual property. At ExELang, we are specifically interested in relating children’s experiences with how they talk, so we will ask for the ability to analyze and publish about this one point. If that covers your own research goals, we will discuss how to frame intellectual property so that it is ideal for both. See the resources for an example of a Data Transfer Agreement.

18.3 Resources