Tổng hợp giọng nói

POST

/ent/v2/audio-tts

Tài liệu: https://platform.vidu.cn/docs/speech-synthesis

Authorizations

bearer

Type

HTTP (bearer)

Request Body

application/json

object

Text to synthesize

Max length under 10000 characters
Use line breaks for paragraphs
Pause control: use <#x#> where x is pause duration in seconds, range [0.01, 99.99], max 2 decimal places. Place between speakable segments; do not chain multiple pause markers

Example: Hello<#2#>I am vidu<#2#>Nice to meet you

Voice ID for synthesis
See voice list: https://shengshu.feishu.cn/sheets/EgFvs6DShhiEBStmjzccr5gonOg

Speech rate, default 1.0
1.0 is normal; range [0.5, 2]. 0.5 slowest, 2 fastest

Volume
Range 0–10, default 0 (normal). Higher = louder

Pitch
Range [-12, 12], default 0 (original voice)

Emotion for synthesized speech

Allowed: "happy", "sad", "angry", "fearful", "disgusted", "surprised", "calm"
Model usually auto-matches emotion from text

Pronunciation overrides for polyphones

Rules for special readings; in Chinese, tones as digits 1–5
Example:
["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]

Passthrough parameter
Not processed; data transfer only
Note: Max 1048576 characters

Responses

Thành công

Content-Type

application/json

object

Định dạng chính thức OpenAI

Chế độ chat

Định dạng API chuẩn thống nhất

API chuẩn thống nhất

Định dạng tương thích OpenAI

Định dạng chuẩn thống nhất

Chế độ Chat

Định dạng OpenAI

API chuẩn thống nhất

Định dạng tương thích OpenAI

Định dạng Replicate chính thức

Định dạng tương thích OpenAI

Tổng hợp giọng nói

Authorizations

Request Body

Responses

Playground

Samples

Tổng hợp giọng nói​

Authorizations​

Request Body​

Responses​

Playground​

Samples​

Tổng hợp giọng nói

Authorizations

Request Body

Responses

Playground

Samples