Ecosystem overview for everything related to diffusion-model.
OmniVoice by k2-fsa is a state-of-the-art massively multilingual zero-shot Text-to-Speech (TTS) model, uniquely supporting over 600 languages. Leveraging an innovative diffusion language model-style architecture, it delivers high-quality speech generation with exceptional inference speed. Its core capabilities include industry-leading voice cloning, sophisticated voice design via attributes like gender, age, pitch, and accent, as well as precise control over non-verbal symbols and pronunciation correction. OmniVoice stands out for its extensive language coverage and rapid performance, making it an ideal choice for diverse applications in multilingual content creation, personalized voice synthesis, and real-time interactive systems.