Instant Cloning

Learn here how to add a character and assign it a voice sample for instant cloning

Through instant voice cloning, you only need to provide a 5-30 second sample, and without any model training, cloning can be completed instantly. Our AI will instantly, based on millions of hours of experience, try to mimic the tone, speed, emotion, pauses, loudness, acoustic environment, breathing sounds, accent, and vocalization characteristics of the cloned audio sample during generation, understand the context of the target text as much as possible, and synthesize them to produce the most expressive and matching speech.

Currently, you can summon the character creation panel by clicking the "Add Character" button on the Voice Management page, or by selecting the "Create New Character..." button in the popup when selecting a character on the Vocu Studio page, and use it to create a character. The first step is to select the type of creation. Different types of characters have slightly different performance details. The availability of model versions and types will be opened according to our current maintenance plan.

Then, you need to upload an audio file or record a piece of audio as the default style guide sample for this cloning. This default style sample will be used to define the default voice performance of the character, including voice line, emotion, speed, tone, prosody, etc. (later you can add more different style samples in the character details page).

We have added a simple audio processing function to the audio uploader, which allows you to quickly edit the audio clips to be uploaded.

After the audio is uploaded, please confirm whether your uploaded voice sample belongs to the language range supported by the model. The system will automatically recognize the supported language in the audio. If you need more accurate recognition results, you can also manually select the language type for better results (Cantonese samples need to be manually selected).

If the audio sample has background sound, you can also turn on the "Remove Background Sound" switch, and the system will optimize the audio sample when creating the character.

Then, you need to specify a name for the created character, and optionally specify a description and an avatar. Currently, the name, description and avatar are for display only and will not affect the usage effect.

Then confirm the relevant information of this creation on the final page, click the submit button in the lower right corner and wait for processing to complete.

Sample quality is more important than length. Noisy samples may produce poor results. Please provide high-quality sample speech as much as possible. Currently, sample speech length needs to be greater than 2 seconds and file size not exceeding 20M. You can also try to obtain high-quality vocal audio samples from any audio using vocal separation/audio noise reduction/vocal beautification/loudness normalization and other functions of some audio editing software.

For detailed precautions and best practices about instant cloning sample audio, please refer to this page.

Last updated

Was this helpful?