Finding Datasets for Final Product!
- Kavan Mehta
- Feb 20, 2023
- 1 min read
I have been exploring the LRW, LRS2, and LRS3-TED Datasets developed professionally by researchers worldwide to improve audio-visual speech recognition. Each one of these datasets is streamlined for alignment between audio and video with labeled data for sentences that are being spoken. I am currently exploring various sample recordings in the datasets to understand the different groups (ethnic and gender) that are being included for speakers as well as the dialects and accents that should remain as similar as possible for better training of the machine learning model. The dataset that I choose will determine my approach because my machine-learning algorithm will need to adjust to the dataset’s features in terms of the segmentation of lip movements and temporal alignment between the frames and the audio. Hence, I will need to think about both the datasets and algorithms together as I make this decision with the guidance of my mentor, Dr. Paschall.
After picking my dataset, I will then try to sample it to create an equally representative dataset (a smaller subset of the larger dataset) that I can train my machine learning algorithm on using my computer resources. I will also need to experiment with certain machine learning approaches, such as 3D convolutional neural networks, transformers, and even conformers, to optimize speech recognition with the most precision.
So see you next week, same place, same time.

Comments