Overview
By reading our guides and technical documentation, you can gain a deeper understanding of Vocu's features
Welcome to the Vocu.ai user guide. We will walk you through step by step from account registration to cloning your first voice and generating your first speech. We will also guide you on how to improve overall generation quality by optimizing audio files and editing text content. Finally, we will honestly introduce some current technical limitations to help you use the service better.
First, you can complete account registration and login through various methods. After logging in, we'll start exploring from Voice Management, where you can create characters and add any audio samples for voice cloning, and set names and descriptions for them. After adding characters, you can go to the Vocu Studio page, where you can use the character voices you created to generate your first speech.
How AI Models Work
Our Vocu Large Voice Model has been pre-trained on massive amounts of audio, covering various types of content, but most notably audiobooks and regular conversational audio. If your cloned audio samples and target text are of these types, you will typically achieve better results when generating speech. Our model will try to mimic the tone, speed, emotion, pauses, loudness, acoustic environment, breathing sounds, accent, and vocalization characteristics of the cloned audio samples, understand the context of the target text as much as possible, and synthesize them to produce the most matching speech.
Shortcomings and Limitations
The current version series of the voice model (V2.9 and higher) has achieved human-like speech generation capabilities, but it is not perfect. You may encounter the following issues during use:
Occasional Unstable Results: You may occasionally encounter some poor-quality generation results. You can try setting the generation style to stable, which can improve overall stability, but may sacrifice some expressiveness of the voice. You can also generate the same text multiple times and choose the best result.
Stability or Quality of Other Language Content May Be Lower Than Character Language: V2 and higher models support bilingual (Chinese and English) cloning and synthesis, and V3 series models add support for more languages. In cross-language generation, the model will try to pronounce foreign languages with the character's voice through imitation and reasoning, but since different languages have different pronunciation systems, the performance of cross-language content cloning and synthesis may be slightly lower than content in the character's native language.
Not Very Good at Handling Excessively Exaggerated, Sharp, or Highly Unique Cloning Samples: When using excessively exaggerated, sharp, or highly unique cloning samples, you may encounter issues with decreased audio quality, similarity, or stability. You can try to improve this by generating a single sentence multiple times and using the most satisfactory result as a sample for cloning.
Last updated
Was this helpful?