Синтез речи
POST
/ent/v2/audio-tts
- Документация: https://platform.vidu.cn/docs/speech-synthesis
- Синтез речи
Authorizations
bearer
Type
HTTP (bearer)
Request Body
application/json
text
string
Required
Text to synthesize
- Max length under 10000 characters
- Use line breaks for paragraphs
- Pause control: use <#x#> where x is pause duration in seconds, range [0.01, 99.99], max 2 decimal places. Place between speakable segments; do not chain multiple pause markers
- Example: Hello<#2#>I am vidu<#2#>Nice to meet you
voice_setting_voice_id
string
Required
Voice ID for synthesis
See voice list: https://shengshu.feishu.cn/sheets/EgFvs6DShhiEBStmjzccr5gonOg
voice_setting_speed
string
Speech rate, default 1.0
1.0 is normal; range [0.5, 2]. 0.5 slowest, 2 fastest
voice_setting_volume
string
Volume
Range 0–10, default 0 (normal). Higher = louder
voice_setting_pitch
string
Pitch
Range [-12, 12], default 0 (original voice)
voice_setting_emotion
string
Emotion for synthesized speech
- Allowed: "happy", "sad", "angry", "fearful", "disgusted", "surprised", "calm"
- Model usually auto-matches emotion from text
pronunciation_dict_tone
string
Pronunciation overrides for polyphones
- Rules for special readings; in Chinese, tones as digits 1–5
- Example:
["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]
payload
string
Passthrough parameter
Not processed; data transfer only
Note: Max 1048576 characters
Responses
Успех
application/json
object
task_id
string
Required
state
string
Required
model
string
Required
prompt
string
Required
duration
integer
Required
seed
integer
Required
created_at
string
Required
credits
integer
Required