Chapter 3 Preprocessing and labelling
Once you have deployed your camera traps and brought your SD cards. We have several steps we need to perform before we can start analyzing the data:
- backup the data
- pre-process the files
- label the footage
- update deployment end dates
We summarise each step below and point to useful tools where necessary.
3.1 Data storage
The file structure of your data backups depends on the structure of your project. We use one of two different options, which each have their merits:
1) Location based
This is likely the most intuitive method if you are manually sorting data or using an image labeller (software to manage your camera data) which uses the location as the key organizing element. You would make a folder using the ‘placename’ (unique location where a camera is deployed), then copy all of the data relating to that site within it (left). Note, if you had multiple camera deployments you would have nested folders with the ‘deployment_id’ as the name:
2) Deployment based
Increasingly camera trap management platforms are ‘deployment’ driven rather than location based (e.g. Wildlife Insights). In this instance, the images are placed within a folder named with the deployment_id
(the unique code corresponding to that deployment), typically within a single folder.
In this scenario, we would have a folder called ‘to upload’ with all of the unique deployment folders within it. Then, once the folder has been upload to the platform, then the folder is moved to an “uploaded” folder:
Crucially - make redundant copies to ensure you do not lose data. We make both local and cloud-based copies of our data sets.
3.2 Preprocessing
The following steps represent optional elements to apply to your data. Whether you need them depends on your questions, the platform you are using to label your data, and the volume of images you will be processing.
3.2.1 Renaming
When a camera takes images, it applies sequential names which are duplicated across cameras (e.g. RCNX0001, RCNX0002 etc). In the future, if files are accidentally moved it would be difficult (if not impossible) to trace them back their origin. One way to get around this is to rename every camera image with a unique code (e.g. placename_datetime) which will ensure that line of data you generate can be traced back to an image, regardless of how it is stored.
We have created a tool which can be applied to folders of images organised by location and deployment, to create unique codes for each image:the WildCo Image Renamer. The repository has an example dataset which you can play around with to get familiar with the tool.
3.2.2 Automated Labelers
Once you have backed up and renamed your images, you may want to process them with an Artificial Intelligence (AI) labeler. Although they are pretty cool and in vogue
right now, the desicion to use one (or not) should be based on several points:
- The number of image you have to process If you only have a small dataset (a few thousand images) it is likely easier to do manually
- Whether there is an AI labeler validated for your study area and strata Despite the claims of their authors, AI labelers are not perfect. If they haven’t been validated in your survey location then use extreme caution when applying it. For example, an AI algorithm developed for terrestrial camera traps data will likely not work well on an arboreal dataset.
- How much money you have For AI labelers to run quickly, you may need some very expensive computer gear or cloud computing time. Do not assume that this is cheaper than manual labor!
- The resolution of the labels you require AI labelers are getting pretty good at sifting out blank images, but they have a long way to be before they can reliably split ground squirrel species (Urocitellus sp), or long-nosed armadillo species (Dasypus sp.)!
For a very pragmatic and informed take on the current state of the art, see Saul Greenberg’s Automated Image Recognition for Wildlife Camera Traps: Making it Work for You. report.
One of the biggest players in the game is undoubtedly Megadetector. Click the link for an overview of the machine learning model and how it might work for you.
Finally, some platforms now have their own inbuilt labeling AI (e.g. Wildlife Insights), which is certainly much more accessible than developing your own. Our only advice is be weary of the identifications they generate and always check your data (a.k.a. keep a human in the loop - at least for now).
3.2.3 Sensitive images
One of the benefits of AI labelers is you can use them to remove sensitive information (such as peoples identities) from images without ever looking at them. An example of this would be camera trapping in protected areas where it is not possible to ask every person if they are happy being photographed for science. Instead, we can use megadetector (or another AI labeler) to tell us when a human is detected in an image, then blur the area of that photo to remove individually identifying information. Previously researchers had to delete the human images to be compliant with privacy requirements - which throws away valuable data of human use.
The WildCo lab has developed a tool to blur human images using Megadetector outputs: WildCo_Face_Blur. Click the link for details on how to use it.
For a discussion of its application in a recreational ecology context see:
3.2.4 Timelapse extraction
Timelapse photographs can be critical to determine when cameras are functioning, particularly in low productivity environments where wildlife detections are rare. We highly recommend you take a photo at noon each day! They can also be used to generate site-level vegetation indices, such as NDVI, as they are taken at the same time every day. However, you likely don’t want to be sort through thousands of images of leaves and grass, or if you want to extract the images to run through a different program (e.g. phenopix
package - see the covariates chapter).
To quickly extract timelapse images we develop some code which uses the metadata of the images to filter out timelapse photos from motion detected photos. It is packaged up as part of the WildCo_Image_renamer script.
3.3 Labelling
We often get asked what the best software/data platform is for labeling images… and the pragmatic answer is that it does not matter as long as you export your data in a standardised format (see the data standardisation chapter. The truth is that different projects have different needs:
- If you have a poor internet connection you might need to use a standalone offline software, such as Timelapse
- Or if you work internationally with a large team of labelers who will tag images simultaneously, an online data platform, such as Wildlife Insights, might be essential
Dan Morris has curated a fantastic list of currently available tools here: Everything I know about machine learning and camera traps.
In a nutshell:
Data platforms are web- and desktop-based tools used for efficient and standardized data management, sharing, and analysis of remote camera data. A number of platforms exist so it is important that users choose the one best suited to their needs. To help camera trap users make this decision, the Wildcam network has developed a comparison of different camera data platforms. It provides an overview of platforms and software used in remote camera research in western Canada. As software and online tools are often subject to frequent updates and change, we recognize this as a document subject to change over time. Click here to review the comparison (last updated June 2020). We welcome feedback at any time (info@wildcams.ca)
Software are programs specifically designed for camera trap photos and their associated data is now recognized as the best method for data processing. There are quite a few programs available for practitioners, but many of them have most of the same functionalities. The relatively few unique features that distinguish programs will help to determine what software to use, and what features are needed for specific studies will vary depending on their study designs. See:
3.4 End dates and outages
It is very important to note that camera deployments do not end when you pickup the camera - they end when the camera stops collecting comparable data. The best time to record date a camera stops functioning probably is when you are labeling images. Do not cut this corner!
Below is the same camera station, at two points in time. The data from these are not comparable - if a tree fell on you whilst you were out counting animals you would probably count less effectively too! We would edit the deployment end to to reflect when it stopped recording comparable data (not all examples are as clear cut as this one).