# Important Notes

## Instant Cloning/Style Guide Sample Precautions

As stated in [Overview](https://docs.vocu.ai/voices/overview), if the voice sample you provide is relatively unique and our AI has not learned similar voices before, it may lead to poor generation results or inability to replicate the voice well.

{% hint style="info" %}
Sample quality is more important than length. Noisy samples may produce poor results. Please provide high-quality sample speech as much as possible. Currently, sample speech length needs to be **greater than 2 seconds** and **file size not exceeding 20M**. You can also try to obtain high-quality vocal audio samples from any audio using **vocal separation/audio noise reduction/vocal beautification/loudness normalization** and other functions of some **audio editing software**.
{% endhint %}

We recommend that you usually use 10-20 seconds of clear speech audio, and it should not contain any reverberation, echo, background noise for best results. For the quality of audio files, we recommend that you use audio with a source bitrate of 128kbps or above to ensure carrying as complete information as possible.

## Professional Cloning Precautions

Before starting your professional cloning, you need to prepare **single or multiple** audio sample files for cloning.

Audio sample files need to meet the following requirements:

* The total duration of all audio sample files added together should be **at least 1 minute**, **at most 60 minutes; within this range, the longer the total duration, the better the cloning effect.**
* Each audio file needs to be in **wav/mp3/mp4 (recommended to convert to audio)/flac/m4a/ogg** format.
* Please ensure to provide **high-quality** audio as much as possible, and **ensure that the audio contains recognizable sentences (for supported languages, please refer to** [**Model Introduction**](https://docs.vocu.ai/introduction/models)**)**.At the same time, you need to avoid serious noise, multiple speakers and other interference in the audio.

After the audio sample files are prepared, you can manually select audio files or drag them to the upload box, or package them into **unencrypted Zip format compressed packages**. The system will automatically organize the sample files. The total size of uploaded files **cannot exceed 256MB.**

## Comprehensive Precautions

Our AI voice model will try to imitate everything it hears in the audio, such as the speaker's tone, speed, accent, breathing method, strength, background noise, vocal noise, hesitation pauses, and everything else. This means that if the sample audio contains relevant information, it may be imitated by AI and expressed in the final synthesis.

In other words, if you speak in a slow, flat voice, the final result will usually be the same; or if you speak in an excited, fast manner, AI will also try to imitate it.

**Very importantly**, we recommend that you ensure the consistency of voice performance in all aspects as much as possible throughout the voice sample. If the performance in the first 2 seconds of the sample is excited and fast, then the following seconds also need to maintain similar performance as much as possible, including tone, speed, volume and other aspects. If your performance fluctuates too much within the same voice sample, it may confuse AI and produce more unpredictable results each time it generates.

In general:

* The voice performance itself, accent, and recording quality will greatly affect the final effect of cloning
* **For instant cloning**, the length of the audio is not that important, but we recommend ensuring at least five seconds to contain enough information
* Keep the voice performance and recording quality consistent throughout the audio sample as much as possible, and avoid excessive changes within the same segment
* The volume of the audio may also be replicated by AI, so we recommend that you adjust to a reasonable volume balance range to avoid too loud or too soft sound

{% hint style="warning" %}
V2 series models **(V2.9) only support Chinese and English**. When using V2 series models, please ensure that the input text does not contain any non-Chinese and English characters, such as Japanese and Korean, otherwise it may cause **generation failure** and other issues.

Starting from V3 series, we have added Cantonese, Japanese, Korean, French, German, Spanish and Portuguese in addition to Chinese and English, as well as more than 30 accent variants of these languages in total. Please ensure that the model version and text content you use are in line with the corresponding support capabilities.
{% endhint %}
