Video Creation
Generate AI videos directly from text descriptions or images using 14 different models.
How to Create Videos
Request video generation in your chat:
- "Create a video of waves crashing on a beach"
- "Generate a timelapse of clouds moving"
- "Make a video from this image" (attach an image)
Available Models
Video generation is available on Plus, Pro, and Max plans. Plus users have access to budget and mid-tier models, while Pro and Max unlock all premium models.
Plus + Pro + MaxText-to-Video Models
Mochi 1 (Genmo)
Budget-friendly text-to-video generation.
- Duration: 5 seconds
- Aspect ratio: 16:9
- Usage level: Lower
Hunyuan Video (Tencent)
Affordable text-to-video with multiple aspect ratios.
- Duration: 5 seconds
- Aspect ratios: 16:9, 9:16, 1:1
- Usage level: Lower to moderate
Luma Ray 2 (Luma AI)
High-quality text-to-video with realistic physics and motion.
- Duration: 5-9 seconds
- Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
- Usage level: Moderate
Plus + Pro + MaxImage-to-Video Models
Luma Ray 2 I2V (Luma AI)
Animate a single image into video.
- Duration: 5-9 seconds
- Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
- Usage level: Moderate
Wan FLF2V (Alibaba)
Create videos from first and last frame images.
- Duration: 5 seconds
- Requires 2 images (first + last frame)
- Usage level: Moderate to higher
Pro + MaxPremium Models
MiniMax Video 01 (MiniMax)
High-quality video generation with optional image input.
- Duration: 6 seconds
- Text-to-video and image-to-video
- Usage level: Higher
Grok Imagine Video (xAI)
Long-duration videos with automatic audio generation.
- Duration: 5-15 seconds
- Audio always included (automatic)
- 7 aspect ratios supported
- Usage level: Higher
Kling 2.6 Pro (Kuaishou)
Premium quality with optional audio generation.
- Duration: 5-10 seconds
- Audio: optional (adds extra usage)
- Usage level: Higher
Veo 2 I2V (Google)
Google's image-to-video model.
- Duration: 5-8 seconds
- Single image required
- Usage level: High to very high
Veo 3 (Google)
Google's flagship video model with optional audio - dialogue, SFX, and ambient sounds.
- Duration: 4-8 seconds
- Audio: optional (adds extra usage) - dialogue, SFX, ambient
- Text-to-video and image-to-video
- Usage level: Highest
Kling 2.6 Pro I2V (Kuaishou)
Image-to-video generation with premium motion quality and optional audio.
- Duration: 5-10 seconds
- Single image required
- Audio: optional (adds extra usage)
- Usage level: Higher
Kling 3.0 Pro I2V (Kuaishou)
Latest Kling image-to-video model with start-frame animation, optional end-frame guidance, and optional audio.
- Duration: 3-15 seconds
- Requires a start image; optional end image
- Audio: optional (adds extra usage)
- Usage level: Highest
Grok Imagine Video I2V (xAI)
Animate an image into a fast social-style clip with automatic audio.
- Duration: 5-15 seconds
- Single image required
- Audio always included (automatic)
- Usage level: Higher
Seedance 2.0 I2V (ByteDance)
ByteDance image-to-video generation for fast creative motion from a still frame.
- Duration: 5-10 seconds
- Single image required
- Usage level: Higher
Video Specifications
- Duration: 4-15 seconds per generation (varies by model)
- Resolution: 720p-1080p
- Format: MP4
- Generation time: 1-5 minutes
Audio Generation
Some models support generating audio (dialogue, sound effects, ambient sounds) alongside the video.
Audio is Optional (adds extra usage)
Audio generation is disabled by default to keep usage lower. When enabled, it adds extra usage on top of the base model.
To enable audio, ask explicitly: "Generate a video with audio" or "Create a video with sound".
Models with Audio Support:
- Veo 3 - Dialogue, SFX, ambient sounds
- Kling 2.6 Pro - Sound effects and ambient audio
- Kling 3.0 Pro I2V - Optional audio for image-to-video
- Grok Imagine Video - Automatic audio (always included)
Tips for Better Videos
- Describe the motion or action clearly
- Specify camera movement (pan, zoom, static)
- Include lighting and atmosphere details
- Keep prompts focused on a single scene
- For audio-enabled models: Describe sounds explicitly (for example, "birds chirping" or "footsteps on gravel")
- For dialogue: Include character speech in quotes for lip-synced audio
Availability & Pricing
Plus Plan ($10/month)
- 5 video models available
- Text-to-video: Mochi 1, Hunyuan, Luma Ray 2
- Image-to-video: Luma Ray 2 I2V, Wan FLF2V
- Lower to moderate usage per video
Pro and Max
- All 14 video models available
- Access to Veo 3, Grok, Kling, and Seedance models
- Lower to highest usage per video (depends on model)
- Extra usage for audio generation (optional)
Related Pages
- Subscription Plans - Compare Plus, Pro, and Max features
- Usage Limits - Credit costs and limits
- Image Generation - Create images with AI