Create a voice from audio

Overview

Creates a custom voice by cloning from an uploaded audio file. The voice is available immediately after creation.

Settings by Model

Settings vary depending on the model you choose. The modelId determines which provider (ElevenLabs or Minimax) handles TTS at generation time.

ElevenLabs V3 (eleven_v3)

Model: eleven_v3Settings:

voiceStability (0-1, default: 0.5) — Higher = more consistent, lower = more expressive
voiceSimilarity (0-1, default: 0.5) — Clarity and similarity to the original voice
voiceStyle (0-1, default: 0.0) — Style exaggeration. Higher values are more expressive but slower.
voiceSpeed (0.7-1.2, default: 1.0) — Playback speed

Speaker Boost is not available for this model.

ElevenLabs models (eleven_multilingual_v2, etc.)

Models: eleven_multilingual_v2 (recommended), eleven_multilingual_v1, eleven_monolingual_v1, eleven_turbo_v2, eleven_turbo_v2_5, eleven_flash_v2_5Settings:

voiceStability (0-1, default: 0.8) — Higher = more consistent, lower = more expressive
voiceSimilarity (0-1, default: 0.5) — Clarity and similarity to the original voice
voiceStyle (0-1, default: 0.0) — Style exaggeration. Higher values are more expressive but slower.
voiceSpeed (0.7-1.2, default: 1.0) — Playback speed
speakerBoost (boolean, default: true) — Enhances speaker similarity

Minimax models (speech-*)

Models: speech-2.8-hd (recommended), speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-02-hd, speech-02-turboSettings:

voiceSpeed (0.7-1.2, default: 1.0) — Playback speed
languageBoost (enum, default: “auto”) — Boost a specific language for better pronunciation
emotion (enum, default: “auto”) — Voice emotion: happy, sad, angry, fearful, disgusted, surprised, neutral

Default Model

If modelId is omitted, the default is speech-2.5-hd-preview (Minimax).

Audio Requirements

Must be accessible via HTTPS URL
Duration must be between 30 seconds and 4 minutes
Supported formats: MP3, WAV, M4A

Voice Status

The created voice is returned with status IDLE, meaning it is ready to be used in video generation immediately.

Authorizations

x-api-key

string

header

required

API key to be included in the x-api-key header

Body

application/json

name

string

required

Name of the voice

Required string length: 1 - 256

audioUrl

string<uri>

required

HTTPS URL to the audio file for voice cloning (MP3, WAV, M4A). Duration must be between 30 seconds and 4 minutes.

Pattern: ^https://.*

modelId

enum<string>

Voice model to use for TTS. Determines which provider (ElevenLabs or Minimax) and which settings are available.

Available options:

eleven_multilingual_v2,

eleven_multilingual_v1,

eleven_monolingual_v1,

eleven_turbo_v2,

eleven_turbo_v2_5,

eleven_flash_v2_5,

eleven_v3,

speech-02-hd,

speech-02-turbo,

speech-2.5-hd-preview,

speech-2.5-turbo-preview,

speech-2.6-hd,

speech-2.6-turbo,

speech-2.8-hd,

speech-2.8-turbo

settings

object

Model-specific voice settings. Available fields depend on the chosen modelId. See the Settings by Model section for details.

Show child attributes

language

enum<string>

Available options:

ENGLISH,

SPANISH,

FRENCH,

PORTUGUESE,

GERMAN,

RUSSIAN,

HINDI,

CHINESE,

DUTCH,

ARABIC,

POLISH,

BULGARIAN,

JAPANESE,

ITALIAN

gender

enum<string>

Available options:

MALE,

FEMALE

Response

Voice created successfully

string<uuid>

name

string

createdAt

string<date-time>

updatedAt

string<date-time>

status

string

sampleUrl

string

language

enum<string>

Available options:

ENGLISH,

SPANISH,

FRENCH,

PORTUGUESE,

GERMAN,

RUSSIAN,

HINDI,

CHINESE,

DUTCH,

ARABIC,

POLISH,

BULGARIAN,

JAPANESE,

ITALIAN

gender

enum<string>

Available options:

MALE,

FEMALE

Avatars

Voices

Videos

Webhooks

Assets

Subtitles

Create a voice from audio

Overview

Settings by Model

Default Model

Audio Requirements

Voice Status

Authorizations

Body

Response

Avatars

Voices

Videos

Webhooks

Assets

Subtitles

​Overview

​Settings by Model

​Default Model

​Audio Requirements

​Voice Status

Authorizations

Body

Response

Overview

Settings by Model

Default Model

Audio Requirements

Voice Status