3.6 Preservation

Preservation ensures that your data will be around for the long haul, that it will persist even if you don’t. Why?

  • You might need to build on your prior research.

  • You or someone else might want to reproduce your study or use your data.

  • Establish a precedent If your study is challenged, you will need to have access to your data (and all the connected documentation)

Conversely, in some cases sensitive data needs to be destroyed at the end of a project.

3.6.1 Preservation Issues

Just because you store your data somewhere doesn’t mean that it’s being preserved.

Preservation focuses on

  • Making sure that your data will be available for a long period of time

  • Making data available in the same way that you used it when it was collected.

Preservation helps protect you from hardware obsolescence.

What is required to run this hardware? When a manufacturer of a storage media or hardware to read the media goes out of business support for the media and hardware becomes less available and accessible. Always migrate to new hardware formats so that your data will be available long term.

The software you currently use may not be around, be very difficult to gain access, or even run on your PC of the future. Like hardware obsolescence, you should also be thinking about how to save your data using open software formats. Especially for lab researchers, if you use proprietary or homegrown software to collect data, it might be unreadable in the future.

Because of this, it’s important to make a distinction between how you collect data, versus how you disseminate it.

You might collect data in one way using specific hardware or software, But you might need to transfer it to a different format that’s suitable for dissemination and preservation.

3.6.2 Open Software Formats

We have talked about open software formats but what are they? Here are four trusted open formats for long-term storage. These would cover preservation of nearly all data you’d encounter or be working with. They are:

  • XML: can be used to save documents or web service content

  • CSV: this is an ideal way to preserve spreadsheets.

  • PDF: should be used to freeze a document in time that will not be changed

  • TIFF: the gold standard for saving image files

These four formats allow data to be viewed on any operating system using any kind of software. Especially important if you consider your data to be irreplaceable – that is, you can never collect it again, or it took place at a certain place in time, or it took years to collect.

However, it’s not always easy to transfer all of your data to an open format. Use the most suitable format for preservation available.

Microsoft Excel is much more stable than a program that is used by a specific research community.

Also, information can be lost when converting file formats (verify checksums, etc.) If possible, keep the original files with the converted ones.

3.6.3 Data Formats

Also, to protect your data from degradation avoid encrypting or compressing your data when possible (at least for long term storage/preservation)

  • Encryption is useful for keeping your data safe and transferring or sending it

  • Encrypting data can make it difficult to access your data later

Always keep an original dataset in secure storage, make a copy, and then compress or encrypt the copy to send to others.

Finally, with preservation as well as storage of data actively being collected it is important to know who owns the data you have collected.

  • You can’t assume that you own your data

  • Before you share or delete data, check funder and institutional policies to understand your user rights.