Professional Cloning

Learn here how to perform professional cloning to obtain unparalleled voice generation effects

Through professional voice cloning, you can provide up to 60 minutes (at least one minute is recommended) of voice samples, and our AI will deeply train and learn every detail of the voice samples you provide, including every tone, pronunciation method, rhythm, prosody, etc., within 3-60 minutes, achieving top-level cloning and synthesis effects indistinguishable from the original voice, while retaining all cutting-edge features of the Vocu voice large model such as language understanding and emotional expressiveness.

Professional cloning is a value-added paid self-service item. Each cloning requires consumption of independent professional cloning quota. You can go to the Credit Store to purchase more professional cloning times for your account.

Operation Process

Preparation

Before starting your professional cloning, you need to prepare single or multiple audio sample files for cloning.

Audio sample files need to meet the following requirements:

  • The total duration of all audio sample files added together should be at least 1 minute, at most 60 minutes; within this range, the longer the total duration, the better the cloning effect.

  • Each audio file needs to be in wav/mp3/mp4 (recommended to convert to audio)/flac/m4a/ogg format.

  • Please ensure to provide high-quality audio as much as possible, and ensure that the audio contains recognizable sentences (for supported languages, please refer to Model Introduction).For more tips on ensuring cloning effects and quality, please refer to Important Notes

After the audio sample files are prepared, you can manually select audio files or drag them to the upload box, or package them into unencrypted Zip format compressed packages. The system will automatically organize the sample files. The total size of uploaded files cannot exceed 256MB.

Start Cloning

After confirming that the above conditions are ready, please follow the steps below:

  1. Summon the character creation panel through the Add Character button on the Voice Management page, or the "Clone Character Voice" button on the Vocu Studio page.

  2. Click the Professional Cloning button to switch to the professional cloning tab;

  3. Click next, you will see the "Professional Cloning Sample Package" section. You can manually select audio files or drag them to the upload box, or package them into unencrypted Zip format compressed packages. The system will automatically organize the sample files. The total size of uploaded files cannot exceed 256MB.

  1. Please in the "Default Style Sample" section, from the series of audio samples you prepared for professional cloning, cut and select about 5-30 seconds of the most representative and high-quality audio segment as the default style guide sample for this cloning. This default style sample will be used to define the default voice performance of the character, including voice line, emotion, speed, tone, prosody, etc. (later you can add more different style samples in the character details page). You can also select or record other audio not included in the compressed package as the default style sample in the same mode as instant cloning, but it is recommended to use style samples from training material packages to better restore pronunciation style.

  1. Confirm whether your uploaded voice sample belongs to the language range supported by the model, and manually select the language of the sample.

  1. You need to specify a name for the created character, and optionally specify a description and an avatar. Currently, the name, description and avatar are for display only and will not affect voice cloning behavior.

  2. After all content is confirmed to be correct, click the submit button in the lower right corner to start uploading and submitting the professional cloning task.

Your professional cloning task will start automatically with status displayed as "Training". At this time, you only need to wait for character training to complete before use. (Usually only takes 3-60 minutes, depending on sample length)

Last updated

Was this helpful?