#tts
Ecosystem overview for everything related to tts.
Products (3)
VoiceFlow is an efficient Text-to-Speech (TTS) system based on Rectified Flow Matching, addressing the efficiency limitations of traditional diffusion models in speech synthesis. As the official implementation of its ICASSP 2024 paper, it generates high-quality mel-spectrograms by learning a continuous flow between noise and data. Through a flow rectification process, it further optimizes the sampling trajectory, achieving superior synthesis quality and efficiency with a limited number of sampling steps. It features Kaldi-style data organization, flexible training configurations, supervised duration modeling, and experimental voice conversion capabilities.
Pixelle-Video by AIDC-AI is an AI-powered fully automated short video engine. Users simply input a topic, and it automatically handles scriptwriting, AI image/video generation, voiceover synthesis, background music addition, and video compilation. Built on a modular design and ComfyUI architecture, it supports flexible customization and integration of various AI models, enabling video creation with zero barriers and no editing experience.
VoxCPM is a tokenizer-free Text-to-Speech system that directly generates continuous speech representations via an end-to-end diffusion autoregressive architecture, achieving highly natural and expressive synthesis. VoxCPM2, the latest 2B parameter model, is trained on over 2 million hours of multilingual speech data, supporting 30 languages, Voice Design, Controllable Voice Cloning, and 48kHz studio-quality audio output with built-in super-resolution.