V

VoiceFlow-TTS

by X-LANCE
🔓 Open Source Python 🌍 Global free

About

VoiceFlow is an efficient Text-to-Speech (TTS) system based on Rectified Flow Matching, addressing the efficiency limitations of traditional diffusion models in speech synthesis. As the official implementation of its ICASSP 2024 paper, it generates high-quality mel-spectrograms by learning a continuous flow between noise and data. Through a flow rectification process, it further optimizes the sampling trajectory, achieving superior synthesis quality and efficiency with a limited number of sampling steps. It features Kaldi-style data organization, flexible training configurations, supervised duration modeling, and experimental voice conversion capabilities.

Features

  • Efficient Text-to-Speech synthesis with Rectified Flow Matching
  • Supports Flow Rectification (ReFlow) for optimized sampling efficiency and quality
  • Compatible with Kaldi-style data organization and processing
  • Integrates supervised duration modeling and Monotonic Alignment Search (MAS)
  • Offers experimental features like voice conversion and likelihood estimation

Supported Platforms

desktop