zsxkib/mmaudio

Add sound to video using the MMAudio V2 model. An advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation.

$0.01 / request
GPU: A100

Pricing

zsxkib/

mmaudio

Pricing for Synexa AI models works differently from other providers. Instead of being billed by time, you are billed by input and output, making pricing more predictable.

Output
$0.0100 / video
or
100 videos / $1

For example, generating 100 videos should cost around $1.00.

Check out our docs for more information about how per-request pricing works on Synexa.

ProviderPrice ($)Saving (%)
Synexa$0.0100-
replicate$0.015033.3%

Readme

MMAudio - Video-to-Audio Synthesis

MMAudio V2 is an advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation. It analyzes visual elements, actions, and environments to generate contextually appropriate sound.

Key Features

  • High-Quality Audio Synthesis: Generates rich, realistic audio that matches visual content
  • Context-Aware Sound Generation: Understands visual context to produce appropriate sounds
  • Precise Temporal Synchronization: Audio accurately aligns with video events and actions
  • Environmental Audio Synthesis: Creates ambient sounds matching the video environment
  • Action-Sound Mapping: Maps visual actions to corresponding sound effects
  • Text-Guided Generation: Use text prompts to guide the audio generation
  • Negative Prompting: Specify sounds to avoid (e.g., "music" to prevent background music)

Input Parameters

  • video (required): Input video file for audio generation
  • prompt: Text description to guide the audio output (e.g., "galloping", "ocean waves")
  • negative_prompt: Sounds to avoid in the output (default: "music")
  • duration: Output duration in seconds (default: 8)
  • num_steps: Number of inference steps for quality control (default: 25)
  • cfg_strength: Guidance strength - higher values follow the prompt more closely (default: 4.5)
  • seed: Set for reproducible results, use -1 for random

Use Cases

  • Film and Video Post-Production: Add sound effects and ambient audio to video content
  • Silent Film Enhancement: Bring silent footage to life with generated audio
  • Educational Content: Add appropriate audio to instructional videos
  • Gaming and VR Sound Design: Generate environmental and action sounds
  • Accessibility: Create audio descriptions and sound for visual content
  • Content Creation: Quickly add professional-sounding audio to video projects

Limitations

  • Processing time increases with video length
  • Complex acoustic environments may produce variable results
  • Output quality depends on input video clarity
  • Performance may vary with rapid scene changes
  • If the video is shorter than the requested duration, audio will be truncated to match the video length