Creates a custom voice by cloning from an audio file.
modelId determines which provider (ElevenLabs or Minimax) handles TTS at generation time.
ElevenLabs V3 (eleven_v3)
eleven_v3Settings:voiceStability (0-1, default: 0.5) — Higher = more consistent, lower = more expressivevoiceSimilarity (0-1, default: 0.5) — Clarity and similarity to the original voicevoiceStyle (0-1, default: 0.0) — Style exaggeration. Higher values are more expressive but slower.voiceSpeed (0.7-1.2, default: 1.0) — Playback speedElevenLabs models (eleven_multilingual_v2, etc.)
eleven_multilingual_v2 (recommended), eleven_multilingual_v1, eleven_monolingual_v1, eleven_turbo_v2, eleven_turbo_v2_5, eleven_flash_v2_5Settings:voiceStability (0-1, default: 0.8) — Higher = more consistent, lower = more expressivevoiceSimilarity (0-1, default: 0.5) — Clarity and similarity to the original voicevoiceStyle (0-1, default: 0.0) — Style exaggeration. Higher values are more expressive but slower.voiceSpeed (0.7-1.2, default: 1.0) — Playback speedspeakerBoost (boolean, default: true) — Enhances speaker similarityMinimax models (speech-*)
speech-2.8-hd (recommended), speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-02-hd, speech-02-turboSettings:voiceSpeed (0.7-1.2, default: 1.0) — Playback speedlanguageBoost (enum, default: “auto”) — Boost a specific language for better pronunciationemotion (enum, default: “auto”) — Voice emotion: happy, sad, angry, fearful, disgusted, surprised, neutralmodelId is omitted, the default is speech-2.5-hd-preview (Minimax).
IDLE, meaning it is ready to be used in video generation immediately.API key to be included in the x-api-key header
Name of the voice
1 - 256HTTPS URL to the audio file for voice cloning (MP3, WAV, M4A). Duration must be between 30 seconds and 4 minutes.
^https://.*Voice model to use for TTS. Determines which provider (ElevenLabs or Minimax) and which settings are available.
eleven_multilingual_v2, eleven_multilingual_v1, eleven_monolingual_v1, eleven_turbo_v2, eleven_turbo_v2_5, eleven_flash_v2_5, eleven_v3, speech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-2.6-hd, speech-2.6-turbo, speech-2.8-hd, speech-2.8-turbo Model-specific voice settings. Available fields depend on the chosen modelId. See the Settings by Model section for details.
ENGLISH, SPANISH, FRENCH, PORTUGUESE, GERMAN, RUSSIAN, HINDI, CHINESE, DUTCH, ARABIC, POLISH, BULGARIAN, JAPANESE, ITALIAN MALE, FEMALE Voice created successfully
ENGLISH, SPANISH, FRENCH, PORTUGUESE, GERMAN, RUSSIAN, HINDI, CHINESE, DUTCH, ARABIC, POLISH, BULGARIAN, JAPANESE, ITALIAN MALE, FEMALE