Video Creation

Generate AI videos directly from text descriptions or images using 14 different models.

How to Create Videos

Request video generation in your chat:

"Create a video of waves crashing on a beach"
"Generate a timelapse of clouds moving"
"Make a video from this image" (attach an image)

Available Models

Video generation is available on Plus, Pro, and Max plans. Plus users have access to budget and mid-tier models, while Pro and Max unlock all premium models.

Plus + Pro + MaxText-to-Video Models

Mochi 1 (Genmo)

Budget-friendly text-to-video generation.

Duration: 5 seconds
Aspect ratio: 16:9
Usage level: Lower

Hunyuan Video (Tencent)

Affordable text-to-video with multiple aspect ratios.

Duration: 5 seconds
Aspect ratios: 16:9, 9:16, 1:1
Usage level: Lower to moderate

Luma Ray 2 (Luma AI)

High-quality text-to-video with realistic physics and motion.

Duration: 5-9 seconds
Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
Usage level: Moderate

Plus + Pro + MaxImage-to-Video Models

Luma Ray 2 I2V (Luma AI)

Animate a single image into video.

Duration: 5-9 seconds
Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
Usage level: Moderate

Wan FLF2V (Alibaba)

Create videos from first and last frame images.

Duration: 5 seconds
Requires 2 images (first + last frame)
Usage level: Moderate to higher

Pro + MaxPremium Models

MiniMax Video 01 (MiniMax)

High-quality video generation with optional image input.

Duration: 6 seconds
Text-to-video and image-to-video
Usage level: Higher

Grok Imagine Video (xAI)

Long-duration videos with automatic audio generation.

Duration: 5-15 seconds
Audio always included (automatic)
7 aspect ratios supported
Usage level: Higher

Kling 2.6 Pro (Kuaishou)

Premium quality with optional audio generation.

Duration: 5-10 seconds
Audio: optional (adds extra usage)
Usage level: Higher

Veo 2 I2V (Google)

Google's image-to-video model.

Duration: 5-8 seconds
Single image required
Usage level: High to very high

Veo 3 (Google)

Google's flagship video model with optional audio - dialogue, SFX, and ambient sounds.

Duration: 4-8 seconds
Audio: optional (adds extra usage) - dialogue, SFX, ambient
Text-to-video and image-to-video
Usage level: Highest

Kling 2.6 Pro I2V (Kuaishou)

Image-to-video generation with premium motion quality and optional audio.

Duration: 5-10 seconds
Single image required
Audio: optional (adds extra usage)
Usage level: Higher

Kling 3.0 Pro I2V (Kuaishou)

Latest Kling image-to-video model with start-frame animation, optional end-frame guidance, and optional audio.

Duration: 3-15 seconds
Requires a start image; optional end image
Audio: optional (adds extra usage)
Usage level: Highest

Grok Imagine Video I2V (xAI)

Animate an image into a fast social-style clip with automatic audio.

Duration: 5-15 seconds
Single image required
Audio always included (automatic)
Usage level: Higher

Seedance 2.0 I2V (ByteDance)

ByteDance image-to-video generation for fast creative motion from a still frame.

Duration: 5-10 seconds
Single image required
Usage level: Higher

Video Specifications

Duration: 4-15 seconds per generation (varies by model)
Resolution: 720p-1080p
Format: MP4
Generation time: 1-5 minutes

Audio Generation

Some models support generating audio (dialogue, sound effects, ambient sounds) alongside the video.

Audio is Optional (adds extra usage)

Audio generation is disabled by default to keep usage lower. When enabled, it adds extra usage on top of the base model.

To enable audio, ask explicitly: "Generate a video with audio" or "Create a video with sound".

Models with Audio Support:

Veo 3 - Dialogue, SFX, ambient sounds
Kling 2.6 Pro - Sound effects and ambient audio
Kling 3.0 Pro I2V - Optional audio for image-to-video
Grok Imagine Video - Automatic audio (always included)

Tips for Better Videos

Describe the motion or action clearly
Specify camera movement (pan, zoom, static)
Include lighting and atmosphere details
Keep prompts focused on a single scene
For audio-enabled models: Describe sounds explicitly (for example, "birds chirping" or "footsteps on gravel")
For dialogue: Include character speech in quotes for lip-synced audio

Availability & Pricing

Plus Plan ($10/month)

5 video models available
Text-to-video: Mochi 1, Hunyuan, Luma Ray 2
Image-to-video: Luma Ray 2 I2V, Wan FLF2V
Lower to moderate usage per video

Pro and Max

All 14 video models available
Access to Veo 3, Grok, Kling, and Seedance models
Lower to highest usage per video (depends on model)
Extra usage for audio generation (optional)

View subscription plans

Subscription Plans - Compare Plus, Pro, and Max features
Usage Limits - Credit costs and limits
Image Generation - Create images with AI

Available Models

Video generation is available on Plus, Pro, and Max plans. Plus users have access to budget and mid-tier models, while Pro and Max unlock all premium models.

Plus + Pro + MaxText-to-Video Models

Mochi 1 (Genmo)

Budget-friendly text-to-video generation.

Duration: 5 seconds
Aspect ratio: 16:9
Usage level: Lower

Hunyuan Video (Tencent)

Affordable text-to-video with multiple aspect ratios.

Duration: 5 seconds
Aspect ratios: 16:9, 9:16, 1:1
Usage level: Lower to moderate

Luma Ray 2 (Luma AI)

High-quality text-to-video with realistic physics and motion.

Duration: 5-9 seconds
Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
Usage level: Moderate

Plus + Pro + MaxImage-to-Video Models

Luma Ray 2 I2V (Luma AI)

Animate a single image into video.

Duration: 5-9 seconds
Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
Usage level: Moderate

Wan FLF2V (Alibaba)

Create videos from first and last frame images.

Duration: 5 seconds
Requires 2 images (first + last frame)
Usage level: Moderate to higher

Pro + MaxPremium Models

MiniMax Video 01 (MiniMax)

High-quality video generation with optional image input.

Duration: 6 seconds
Text-to-video and image-to-video
Usage level: Higher

Grok Imagine Video (xAI)

Long-duration videos with automatic audio generation.

Duration: 5-15 seconds
Audio always included (automatic)
7 aspect ratios supported
Usage level: Higher

Kling 2.6 Pro (Kuaishou)

Premium quality with optional audio generation.

Duration: 5-10 seconds
Audio: optional (adds extra usage)
Usage level: Higher

Veo 2 I2V (Google)

Google's image-to-video model.

Duration: 5-8 seconds
Single image required
Usage level: High to very high

Veo 3 (Google)

Google's flagship video model with optional audio - dialogue, SFX, and ambient sounds.

Duration: 4-8 seconds
Audio: optional (adds extra usage) - dialogue, SFX, ambient
Text-to-video and image-to-video
Usage level: Highest

Kling 2.6 Pro I2V (Kuaishou)

Image-to-video generation with premium motion quality and optional audio.

Duration: 5-10 seconds
Single image required
Audio: optional (adds extra usage)
Usage level: Higher

Kling 3.0 Pro I2V (Kuaishou)

Latest Kling image-to-video model with start-frame animation, optional end-frame guidance, and optional audio.

Duration: 3-15 seconds
Requires a start image; optional end image
Audio: optional (adds extra usage)
Usage level: Highest

Grok Imagine Video I2V (xAI)

Animate an image into a fast social-style clip with automatic audio.

Duration: 5-15 seconds
Single image required
Audio always included (automatic)
Usage level: Higher

Seedance 2.0 I2V (ByteDance)

ByteDance image-to-video generation for fast creative motion from a still frame.

Duration: 5-10 seconds
Single image required
Usage level: Higher

Audio Generation

Some models support generating audio (dialogue, sound effects, ambient sounds) alongside the video.

Audio is Optional (adds extra usage)

Audio generation is disabled by default to keep usage lower. When enabled, it adds extra usage on top of the base model.

To enable audio, ask explicitly: "Generate a video with audio" or "Create a video with sound".

Models with Audio Support:

Veo 3 - Dialogue, SFX, ambient sounds
Kling 2.6 Pro - Sound effects and ambient audio
Kling 3.0 Pro I2V - Optional audio for image-to-video
Grok Imagine Video - Automatic audio (always included)

Tips for Better Videos

Describe the motion or action clearly

Specify camera movement (pan, zoom, static)

Include lighting and atmosphere details

Keep prompts focused on a single scene

For audio-enabled models: Describe sounds explicitly (for example, "birds chirping" or "footsteps on gravel")

For dialogue: Include character speech in quotes for lip-synced audio

Availability & Pricing

Plus Plan ($10/month)

5 video models available
Text-to-video: Mochi 1, Hunyuan, Luma Ray 2
Image-to-video: Luma Ray 2 I2V, Wan FLF2V
Lower to moderate usage per video

Pro and Max

All 14 video models available
Access to Veo 3, Grok, Kling, and Seedance models
Lower to highest usage per video (depends on model)
Extra usage for audio generation (optional)

How to Create Videos

Available Models

Plus + Pro + MaxText-to-Video Models

Mochi 1 (Genmo)

Hunyuan Video (Tencent)

Luma Ray 2 (Luma AI)

Plus + Pro + MaxImage-to-Video Models

Luma Ray 2 I2V (Luma AI)

Wan FLF2V (Alibaba)

Pro + MaxPremium Models

MiniMax Video 01 (MiniMax)

Grok Imagine Video (xAI)

Kling 2.6 Pro (Kuaishou)

Veo 2 I2V (Google)

Veo 3 (Google)

Kling 2.6 Pro I2V (Kuaishou)

Kling 3.0 Pro I2V (Kuaishou)

Grok Imagine Video I2V (xAI)

Seedance 2.0 I2V (ByteDance)

Video Specifications

Audio Generation

Audio is Optional (adds extra usage)

Models with Audio Support:

Tips for Better Videos

Availability & Pricing

Plus Plan ($10/month)

Pro and Max

Related Pages

How to Create Videos

Available Models

Plus + Pro + MaxText-to-Video Models

Mochi 1 (Genmo)

Hunyuan Video (Tencent)

Luma Ray 2 (Luma AI)

Plus + Pro + MaxImage-to-Video Models

Luma Ray 2 I2V (Luma AI)

Wan FLF2V (Alibaba)

Pro + MaxPremium Models

MiniMax Video 01 (MiniMax)

Grok Imagine Video (xAI)

Kling 2.6 Pro (Kuaishou)

Veo 2 I2V (Google)

Veo 3 (Google)

Kling 2.6 Pro I2V (Kuaishou)

Kling 3.0 Pro I2V (Kuaishou)

Grok Imagine Video I2V (xAI)

Seedance 2.0 I2V (ByteDance)

Video Specifications

Audio Generation

Audio is Optional (adds extra usage)

Models with Audio Support:

Tips for Better Videos

Availability & Pricing

Plus Plan ($10/month)

Pro and Max

Related Pages