Ecosystem overview for everything related to minicpm.
VoxCPM is a tokenizer-free Text-to-Speech system that directly generates continuous speech representations via an end-to-end diffusion autoregressive architecture, achieving highly natural and expressive synthesis. VoxCPM2, the latest 2B parameter model, is trained on over 2 million hours of multilingual speech data, supporting 30 languages, Voice Design, Controllable Voice Cloning, and 48kHz studio-quality audio output with built-in super-resolution.