Overview

You can learn more about how to create characters and perform voice cloning here, and get various usage tips

You can perform instant voice cloning by creating a character and uploading or recording a short audio sample for it. You can also perform professional voice cloning by providing 1-60 minutes of audio samples, completing it within 3-60 minutes.

After that, you can assign these characters to different texts in speech synthesis to let AI use these character voices for reading.

Currently, you can summon the character creation panel by clicking the "Add Character" button on the Voice Management page, or by selecting the "Create New Character..." button in the popup when selecting a character on the Vocu Studio page, and use it to create a character.

Instant Cloning

Instant cloning allows you to clone a voice almost instantly from a very short sample. It should be noted that the basic principle of instant cloning is not to create or train a new model based on the provided voice sample, but to let AI reasonably guess and imitate based on the massive data it has learned. Our model has been trained on a large amount of regular speech, so it should be very effective for most natural speech processing in theory.

However, our model still has some imperfections. If the voice sample you provide is relatively unique and our AI has not learned similar voices before, it may lead to poor generation results or inability to replicate the voice well. Currently, for specific introductions, shortcomings and limitations of our various models, please refer to Model Introduction

Sample quality is more important than length. Noisy samples may produce poor results. Please provide high-quality sample speech as much as possible. Currently, sample speech length needs to be greater than 2 seconds and file size not exceeding 20M. You can also try to obtain high-quality vocal audio samples from any audio using vocal separation/audio noise reduction/vocal beautification/loudness normalization and other functions of some audio editing software.

Professional Cloning

Through professional voice cloning, you only need to provide one minute or longer (up to 60 minutes) voice samples, and our AI will deeply train and learn every detail of the voice samples you provide, including every tone, pronunciation method, rhythm, prosody, etc., within 3-60 minutes, achieving top-level cloning and synthesis effects indistinguishable from the original voice, while retaining all cutting-edge features of the Vocu voice large model such as language understanding, emotional expressiveness, etc.

V2 series models (V2.9) only support Chinese and English. When using V2 series models, please ensure that the input text does not contain any non-Chinese and English characters, such as Japanese and Korean, otherwise it may cause generation failure and other issues.

Starting from V3 series, we have added Cantonese, Japanese, Korean, French, German, Spanish and Portuguese in addition to Chinese and English, as well as more than 30 accent variants of these languages in total. Please ensure that the model version and text content you use are in line with the corresponding support capabilities.

PreviousModel Introduction NextInstant Cloning

Last updated 4 months ago

Was this helpful?

hashtagInstant Cloning

hashtagProfessional Cloning

Instant Cloning

Professional Cloning