VoiceFlow-TTS

by X-LANCE

🔓 Open Source Python 🌍 Global free

About

VoiceFlow is an efficient Text-to-Speech (TTS) system based on Rectified Flow Matching, addressing the efficiency limitations of traditional diffusion models in speech synthesis. As the official implementation of its ICASSP 2024 paper, it generates high-quality mel-spectrograms by learning a continuous flow between noise and data. Through a flow rectification process, it further optimizes the sampling trajectory, achieving superior synthesis quality and efficiency with a limited number of sampling steps. It features Kaldi-style data organization, flexible training configurations, supervised duration modeling, and experimental voice conversion capabilities.

Features

Efficient Text-to-Speech synthesis with Rectified Flow Matching
Supports Flow Rectification (ReFlow) for optimized sampling efficiency and quality
Compatible with Kaldi-style data organization and processing
Integrates supervised duration modeling and Monotonic Alignment Search (MAS)
Offers experimental features like voice conversion and likelihood estimation

Supported Platforms

desktop

Links

🌐 Visit Website 📦 GitHub Repository

VoiceFlow-TTS

About

Features

Supported Platforms

Links

Related AI Industry News

MCP: The USB-C of AI Tools, Addressing Developers' Outdated AI Assistant Workflows

Unveiling MCP Tool's Hidden Footprint: How eBPF Exposes AI Agent's True Kernel Interactions

Neuralink Developing Surgical Robot Capable of Reaching All Brain Regions for Universal Neural Interface