Digital Human
POST
/kling/v1/videos/avatar/image2video
Generate a digital human video task from a reference image and audio.
Authorizations
bearer
Type
HTTP (bearer)
Request Body
application/json
image
string
Required
Reference image for the digital human.
Supports Base64 or an accessible image URL.
Formats: .jpg / .jpeg / .png.
Max file size 10MB; width and height must each be at least 300px; aspect ratio between 1:2.5 and 2.5:1.
audio_id
string
Audio ID from the preview (TTS) API.
Only audio generated within 30 days, duration 2–300 seconds.
Use either audio_id or sound_file, not both and not neither.
sound_file
string
Audio file.
Supports Base64 or an accessible audio URL.
Formats: .mp3 / .wav / .m4a / .aac; max 5MB; duration 2–300 seconds.
Use either audio_id or sound_file, not both and not neither.
prompt
string
Positive text prompt
mode
string
Required
Video generation mode.
Enum: std, pro
callback_url
string
Callback URL when the task completes
external_task_id
string
External task ID (custom)
Responses
OK
application/json
object