3.4 File Organization

Talking about file naming might seem like a no brainer – but it can prevent a data disaster.

3.4.1 File Naming Issues

File naming and versioning can cause disasters in all kinds of research, especially when working with teams.

Question: Is this a good filename?

What is the number? What date is it? Is it a date?

Including date is a good idea, but this date could be interpreted several ways. Depending on how you read it, it could be any of the three dates listed.

For dates, use the international standard date format (ISO), representing date as a four digit year, followed by month and day. Time can also be included, as follows.

What does Sam mean?

  • Is this data from a scanning acoustic microscope?

  • An experiment involving systolic anterior motion?

  • Or was the experiment done by Sam Lee, the postdoc

This example is from a researcher who studies rat hearts. Each of the hearts this researcher studies are sliced into hundreds of slices. The slices are then mounted on slides, three to a slide. Thousands of TIF images are then created from those slides. Those TIF images are then reassembled into hundreds of huge images. There are three postdocs in the lab doing these experiments. And these experiments are conducted between five and seven times each week. It is not hard to see that unless a clear file naming convention is established, and is used consistently by everyone who does these experiments, the likelihood of confusion around what was in each file would be very high. The following examples will demonstrate good file naming conventions

3.4.2 File Naming Conventions

Here are a few simple rules to follow in naming files.

  1. the file name should include major parameters of the data in the file.

  1. It is best to use non-cryptic, intuitive names whenever possible.

So use DataValidation rather than DatVal or DV. What seems clear at the time of an experiment, may seem less so when you look back at files in the future.

  1. File names should be extensible (extendable). Consider the potential number of files and choose the number of digits so that they will sort properly.

So if you think your number of files will be in the 100s, do what you see on the right. You want your files to sort numerically, as on the right.

  1. Avoid giving files the same name, even if they are in different folders.

…because if files get moved around, you run the risk of overwriting data.

  1. Do not use any special characters. While you may create the file on a platform that allows these, if you move them to one that does not, you might have problems.

Stick to numbers, letters, and underscores.

3.4.3 File Naming Recap

Here’s the final word: if you use consistent, documentable names, the contents of each file will be clear.

In this example, the file name contains the

  • experiment type
  • experiment number
  • sample number
  • Stain
  • coordinates of image
  • stage of data processing.

This would work the same for clinical lab samples, MRI images, spreadsheets from forms data, etc.

The files will sort well and it will be relatively easy to identify exactly what you need

Of course, you do need to actually document your file naming structure. A simple Readme file in the directory with the files can prevent a great deal of future confusion.