Xiaomi Open-Sources OmniVoice, Zero-Shot Voice Cloning Model Supporting 646 Languages

According to Beating, Xiaomi's AI Lab Kaldi team has open-sourced OmniVoice, a zero-shot voice cloning TTS model supporting 646 languages. The model clones voice characteristics from just seconds of reference audio and works across languages—a single voice can synthesize speech in Mandarin, Japanese, Korean, and other languages. All code, weights, and training data are open-sourced under Apache-2.0 license.

OmniVoice uses a simplified architecture with a single bidirectional Transformer that directly maps text to discrete acoustic tokens, achieving 40x faster-than-realtime inference in PyTorch. Trained on 580,000 hours of audio from 50 open-source datasets, OmniVoice outperformed commercial systems in voice similarity and intelligibility across 24 tested languages and matched or exceeded human recordings in 102 languages.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments