#speech-synthesis

Ecosystem overview for everything related to speech-synthesis.

Products (3)

v
voice-pro
Open Source

Voice-Pro by ABUS-AIKOREA is a powerful AI-driven desktop web application designed for multimedia content creation and processing. It integrates YouTube video downloading, voice separation, advanced speech recognition, multilingual translation, and text-to-speech capabilities. The tool supports zero-shot voice cloning and multilingual TTS, offering a comprehensive solution for content creators, researchers, and multilingual professionals. Utilizing core technologies like Whisper series, F5-TTS, E2-TTS, and CosyVoice, it provides high-quality speech recognition, cloning, and translation services.

#audiobook#faster-whisper#gradio#karaoke
V
VoiceFlow-TTS
Open Source

VoiceFlow is an efficient Text-to-Speech (TTS) system based on Rectified Flow Matching, addressing the efficiency limitations of traditional diffusion models in speech synthesis. As the official implementation of its ICASSP 2024 paper, it generates high-quality mel-spectrograms by learning a continuous flow between noise and data. Through a flow rectification process, it further optimizes the sampling trajectory, achieving superior synthesis quality and efficiency with a limited number of sampling steps. It features Kaldi-style data organization, flexible training configurations, supervised duration modeling, and experimental voice conversion capabilities.

#conditional-flow-matching#generative-models#probabilistic-models#rectified-flow-matching
V
VoxCPM
Open Source

VoxCPM is a tokenizer-free Text-to-Speech system that directly generates continuous speech representations via an end-to-end diffusion autoregressive architecture, achieving highly natural and expressive synthesis. VoxCPM2, the latest 2B parameter model, is trained on over 2 million hours of multilingual speech data, supporting 30 languages, Voice Design, Controllable Voice Cloning, and 48kHz studio-quality audio output with built-in super-resolution.

#audio#deeplearning#minicpm#multilingual