Chapter 5 Non-Lena Softwares
In this video we’re going to explain how to use a software that is not LENA in order to do your analysis. One important point I want to make right off the bat is that you need to know a little bit of coding to use this software so if you have never used a terminal and if you don’t know what bash or Python are then you probably want to get some training on that first. We provide some links for this in the further resources section of this video. We are going to base our explanations on the software that was developed in the ACLEW project, where the analysis was broken down into several phases.
The first phase involves deciding who speaks when. For this phase you need to use VTC, short for Voice Type Classifier. VTC was trained on a large corpus of over 200 hours of child-centred recordings that were put together by combining some data that we had in our lab with data available on CHILDES. Most of the data actually came from CHILDES, in particular from the Tsay corpus that contains samples from Chinese language. The rest of the data comes from several different languages including English, French and other languages from Oceania, America, and Africa. So one big difference compared to Lena is that it was not trained solely and exclusively on children learning English.
The second phase of analysis applies only to sections that the Voice Type Classifier identified as being adult speech. For this we use another piece of software that’s called Alice, short for Automatic Linguistic Units Counter. Alice was also trained with multiple languages although much of the data was English from the US and from the UK.
Both of these are open source, which means that you can download them and reuse them – you can even change them anytime as you think best, since both of them can be retrained. Even if both of them can be retrained, you can also use them out of the box.
Both of these tools have been benchmarked against LENA and both of them are competitive. For more information, see the 14 Video.
5.1 Resources
- Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A., & Casillas, M. (2020). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods. pdf code
- Lavechin, M., Bousbib, R., Bredin, H., Dupoux, E., & Cristia, A. (2020). An open-source voice type classifier for child-centered daylong recordings. Interspeech. pdf code