AI/ML Voice Cloning and Transcribing on your Fingertips.
AI/ML Techniques in Emotion-Preserving Multilingual Video Translation for Practical Cross-Cultural Communication
A free, fully responsive AI/ML solution created by
Kevin Gomes
and Major Project Group no. 2 (Aditya Yadav, Akshit Kumar Tiwari,
Harsh Anand Gupta).
AI/ML Techniques in Emotion-Preserving Multilingual Video Translation for Practical Cross-Cultural Communication
The initial step involves extracting the video and audio from the input video file using the FFmpeg library. The output includes separate video and audio files.
The code utilizes the noisereduce library to remove background noise from the extracted audio. The cleaned audio is saved as "vocals.wav."
The cleaned audio is transcribed using the Google Speech Recognition library. The transcribed text is then translated into the target language using the translate library. The translated text is saved to a file named "vocal_text.txt."
The code employs a multilingual text-to-speech (TTS) model to convert the translated text into speech. The TTS model used is "xtts_v2," and the generated speech is saved as "output.wav."
Finally, the script merges the original video with the synthesized multilingual audio to create the final output video file named "Final_output.mp4."
What you get, is the Same video, with the same audio, but in a different language. Isn't that cool?