bytedance/seedance-2.0
ByteDance's advanced video generation model with native audio, keyframe control, and multimodal references for consistent character and style.
$1 / request
GPU: H100
Readme
Seedance 2.0 is ByteDance's next-generation video model, generating high-quality video with synchronized native audio in a single pass.
Generation modes:
- Text-to-video — describe a scene in natural language
- Keyframe mode — provide start_image (and optional end_image) to anchor the video
- Multi-ref mode — combine up to 11 reference images and up to 3 reference videos (max 11 total, max 3 videos) for character consistency and motion reference
- Keyframe mode and multi-ref mode are mutually exclusive within a single request
Capabilities:
- Native audio generation — dialogue, sound effects, and music generated together with video
- Better motion and physics for complex interactions like sports, dancing, and object collisions
- Character consistency across multi-shot narratives via reference images
- Video editing and extension via reference videos
Prompt references:
- Reference images: @IMG_1 .. @IMG_11
- Reference videos: @VID_1 .. @VID_3
- For dialogue, put spoken words in double quotes (e.g. The man said: "Remember this moment.") to drive lip-sync
Supported durations: 5s, 10s, 15s Supported resolutions: 480p, 720p (default), 1080p Supported aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9 Reference video constraints: each <=15s, longest side <=1280px (<=720p)
Pricing: $0.2 per second of output video.