This is it!
- Kavan Mehta
- Jan 30, 2023
- 1 min read
Let's start Final Product! I think this idea of combining the usage of both video and audio of speech to recognize and understand efficiently could be a great problem to solve! I am currently in the process of asking some of the professionals I have interviewed for mentorship! This week I plan to work on conducting independent research on semantics and utilizing both lip movements and audio segments (converted into spectrograms) to improve recognition. Furthermore, many of the mainstream algorithms, such as linear regression, logistic regression, decision trees, and deep neural networks (feed-forward and recursive), could be used to utilize the various sounds and speech effectively.
I also think that some tutorials and online research on this topic by academia could be used to help improve my understanding of the topic and work on creating a machine-learning solution. Apart from research, I think I could utilize my future mentor's experience to enhance my understanding of the entire speech recognition problem and frame it so that I can optimize the issue's roots by changing my focus towards lip movements or audio depending on the training data results. I think working with simplified algorithms can also help me see the issue, and I am hoping to start my Final Product implementation next week.
So see you next week, same place, same time.

Comments