Skip to content

Tổng hợp giọng nói

POST
/ent/v2/audio-tts
  • Tài liệu: https://platform.vidu.cn/docs/speech-synthesis

Authorizations

bearer
Type
HTTP (bearer)

Request Body

application/json
object

Text to synthesize

  1. Max length under 10000 characters
  2. Use line breaks for paragraphs
  3. Pause control: use <#x#> where x is pause duration in seconds, range [0.01, 99.99], max 2 decimal places. Place between speakable segments; do not chain multiple pause markers
  • Example: Hello<#2#>I am vidu<#2#>Nice to meet you

Voice ID for synthesis
See voice list: https://shengshu.feishu.cn/sheets/EgFvs6DShhiEBStmjzccr5gonOg

Speech rate, default 1.0
1.0 is normal; range [0.5, 2]. 0.5 slowest, 2 fastest

Volume
Range 0–10, default 0 (normal). Higher = louder

Pitch
Range [-12, 12], default 0 (original voice)

Emotion for synthesized speech

  1. Allowed: "happy", "sad", "angry", "fearful", "disgusted", "surprised", "calm"
  2. Model usually auto-matches emotion from text

Pronunciation overrides for polyphones

  • Rules for special readings; in Chinese, tones as digits 1–5
  • Example:
    ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]

Passthrough parameter
Not processed; data transfer only
Note: Max 1048576 characters

Responses

Thành công

application/json
object

Playground

Authorization
Body

Samples