Skip to main content
POST
/
voices
curl --request POST \
  --url https://api.argil.ai/v1/voices \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "name": "My Custom Voice",
  "audioUrl": "https://example.com/my-audio.mp3"
}
'
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "name": "<string>",
  "createdAt": "2023-11-07T05:31:56Z",
  "updatedAt": "2023-11-07T05:31:56Z",
  "status": "<string>",
  "sampleUrl": "<string>",
  "language": "ENGLISH",
  "gender": "MALE"
}

Overview

Creates a custom voice by cloning from an uploaded audio file. The voice is available immediately after creation.

Settings by Model

Settings vary depending on the model you choose. The modelId determines which provider (ElevenLabs or Minimax) handles TTS at generation time.
Model: eleven_v3Settings:
  • voiceStability (0-1, default: 0.5) — Higher = more consistent, lower = more expressive
  • voiceSimilarity (0-1, default: 0.5) — Clarity and similarity to the original voice
  • voiceStyle (0-1, default: 0.0) — Style exaggeration. Higher values are more expressive but slower.
  • voiceSpeed (0.7-1.2, default: 1.0) — Playback speed
Speaker Boost is not available for this model.
Models: eleven_multilingual_v2 (recommended), eleven_multilingual_v1, eleven_monolingual_v1, eleven_turbo_v2, eleven_turbo_v2_5, eleven_flash_v2_5Settings:
  • voiceStability (0-1, default: 0.8) — Higher = more consistent, lower = more expressive
  • voiceSimilarity (0-1, default: 0.5) — Clarity and similarity to the original voice
  • voiceStyle (0-1, default: 0.0) — Style exaggeration. Higher values are more expressive but slower.
  • voiceSpeed (0.7-1.2, default: 1.0) — Playback speed
  • speakerBoost (boolean, default: true) — Enhances speaker similarity
Models: speech-2.8-hd (recommended), speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-02-hd, speech-02-turboSettings:
  • voiceSpeed (0.7-1.2, default: 1.0) — Playback speed
  • languageBoost (enum, default: “auto”) — Boost a specific language for better pronunciation
  • emotion (enum, default: “auto”) — Voice emotion: happy, sad, angry, fearful, disgusted, surprised, neutral

Default Model

If modelId is omitted, the default is speech-2.5-hd-preview (Minimax).

Audio Requirements

  • Must be accessible via HTTPS URL
  • Duration must be between 30 seconds and 4 minutes
  • Supported formats: MP3, WAV, M4A

Voice Status

The created voice is returned with status IDLE, meaning it is ready to be used in video generation immediately.

Authorizations

x-api-key
string
header
required

API key to be included in the x-api-key header

Body

application/json
name
string
required

Name of the voice

Required string length: 1 - 256
audioUrl
string<uri>
required

HTTPS URL to the audio file for voice cloning (MP3, WAV, M4A). Duration must be between 30 seconds and 4 minutes.

Pattern: ^https://.*
modelId
enum<string>

Voice model to use for TTS. Determines which provider (ElevenLabs or Minimax) and which settings are available.

Available options:
eleven_multilingual_v2,
eleven_multilingual_v1,
eleven_monolingual_v1,
eleven_turbo_v2,
eleven_turbo_v2_5,
eleven_flash_v2_5,
eleven_v3,
speech-02-hd,
speech-02-turbo,
speech-2.5-hd-preview,
speech-2.5-turbo-preview,
speech-2.6-hd,
speech-2.6-turbo,
speech-2.8-hd,
speech-2.8-turbo
settings
object

Model-specific voice settings. Available fields depend on the chosen modelId. See the Settings by Model section for details.

language
enum<string>
Available options:
ENGLISH,
SPANISH,
FRENCH,
PORTUGUESE,
GERMAN,
RUSSIAN,
HINDI,
CHINESE,
DUTCH,
ARABIC,
POLISH,
BULGARIAN,
JAPANESE,
ITALIAN
gender
enum<string>
Available options:
MALE,
FEMALE

Response

Voice created successfully

id
string<uuid>
name
string
createdAt
string<date-time>
updatedAt
string<date-time>
status
string
sampleUrl
string
language
enum<string>
Available options:
ENGLISH,
SPANISH,
FRENCH,
PORTUGUESE,
GERMAN,
RUSSIAN,
HINDI,
CHINESE,
DUTCH,
ARABIC,
POLISH,
BULGARIAN,
JAPANESE,
ITALIAN
gender
enum<string>
Available options:
MALE,
FEMALE