Important Notes
Some important notes and best practices for voice cloning
Instant Cloning/Style Guide Sample Considerations
As mentioned in the Overview, if the voice sample you provide is relatively unique and our AI has never learned similar voices before, it may lead to poor generation results or inability to reproduce the voice well.
We recommend that you usually use 5-8 seconds of clear voice audio, and should not contain any reverb, echo, background noise for best results. For audio file quality, we recommend that you use audio with a source bit rate of 128kbps or above to ensure that as complete information as possible is carried.
Professional Cloning Considerations
Before starting your Professional Cloning, you need to prepare one or more audio sample files for cloning.
Audio sample files need to meet the following requirements:
The total duration of all audio sample files combined should be at least 1 minute and at most 60 minutes; within this range, the longer the total duration, the better the cloning effect.
Each audio file needs to be in wav/mp3/mp4/flac/m4a/ogg format.
Please ensure to provide high-quality audio whenever possible, ensure that the audio contains recognizable Chinese or English sentences, and should not contain any reverb, echo, background noise for best results.
After the audio sample files are prepared, please pack them into an unencrypted Zip format compressed package, and the size of the compressed package must not exceed 256MB.
General Considerations
Our AI voice model will try to imitate everything it hears in the audio, such as the speaker's tone, speed, accent, breathing patterns, intensity, background noise, vocal noise, hesitant pauses, and everything else. This means that if the sample audio contains relevant information, it may all be imitated by the AI and manifested in the final synthesis.
That is to say, if you speak with a slow, bland voice, the final result will usually be the same; or if you speak in an excited, fast manner, the AI will also try to imitate it.
A very important point is that we recommend you ensure the consistency of voice performance throughout the entire voice sample as much as possible. If the performance in the first 2 seconds of the sample is excited and fast, then the subsequent seconds also need to maintain similar performance as much as possible, including tone, speed, volume and other aspects. If your performance fluctuates too much in the same voice sample, it may confuse the AI and produce more unpredictable results each time it generates.
In summary:
The voice performance itself, accent, and recording quality will greatly affect the final cloning effect
For instant cloning, the length of audio is not that important, but we recommend at least five seconds to contain enough information
Try to maintain consistency in voice performance and recording quality throughout the entire audio sample, avoiding excessive changes within the same segment
The volume of audio may also be replicated by AI, so we recommend you find a good volume balance range to avoid sounds that are too loud or too soft
Currently, we only support Chinese and English sample audio. Please ensure that the sample audio you provide contains correctly recognizable Chinese or English content and does not contain content in other languages, otherwise it will cause character creation to fail or lead to other various problems.
Do not use our services to clone or generate any content that infringes copyright, violates ethics, or violates the laws and regulations of the People's Republic of China and your local area. All content we generate comes with detailed logs, automatic/manual review, and traceable invisible audio watermarks. If we find that you have violated relevant rules, we reserve the right to terminate your service and report to government agencies and other institutions.
For more information, please refer to Service Agreement, Account Agreement, Privacy Statement.
Last updated
Was this helpful?