AI Video Stack 2026: Live Comparison & Rankings

This is RCTV’s living reference to the AI video software stack — models, orchestration agents, and open-source generation. Updated as products launch, pricing changes, and capabilities evolve. Last updated: June 28, 2026.

Quick Verdict: Best AI Video Model by Use Case (2026)

Routing decisions based on what each model actually wins at — not benchmark Elo alone, but the practical question of which model handles a specific use case best. Scroll down for full model breakdowns and the detailed routing framework.

Best for photorealism — Veo 3.1. 4K native, the cleanest photorealistic rendering available; free 10 clips/month via Google Vids on any Google account.
Best for broadcast 4K and motion quality — Kling 3.0. Native 4K at 60fps with multi-cut storyboards; the only production-grade model meeting broadcast delivery standards without upscaling.
Best for character consistency across shots — Seedance 2.0 Pro. Native multi-shot storytelling with frame-level character and scene control; available in the US via CapCut with real-face restrictions.
Best for stylized and VFX work — Runway Gen-4 Turbo. The most mature professional ecosystem for non-photorealistic aesthetics; motion brushes, scene consistency tools, $315M Series C runway.
Best for editing existing video — Aleph 2.0 in Runway Edit Studio. Multishot edit propagation up to 30 seconds at 1080p; edit one frame, model carries it across the sequence; available on all paid Runway plans.
Best for free consumer distribution — Gemini Omni Flash. Free inside YouTube Shorts and YouTube Create; multimodal any-to-any input (image / audio / video / text → video); SynthID provenance by default.
Best open-source for local generation — LTX-2.3. True 4K native on consumer GPUs (12GB+ VRAM); Apache 2.0 license; standalone desktop editor plus ComfyUI integration.
Best benchmark quality with API access — HappyHorse-1.0. Still #1 on Artificial Analysis T2V no-audio (Elo 1,290); HappyHorse-1.1 debuted at #2 (Elo 1,285) in the same week. The order flips on the audio-included board, where Seedance leads. Commercial API via fal.ai and Alibaba Cloud Bailian; open weights still pending.
Cheapest commercial API — Grok Imagine Video 1.5. $0.08/sec ($4.80/min), audio-native, GA as of June 16, 2026 — undercuts every audio-capable tier except the original Grok Imagine ($0.05/sec, no audio). The 720p ceiling holds; the 1080p Pro tier remains unshipped past Musk’s April commitment. Arena positions in the entry.
Best multi-model orchestration agent — Pika Agents. Broadest model roster in the category (Kling, Veo, Seedance, MiniMax, Sora API, Pika Video); runs inside Slack, Telegram, Discord, Notion, Figma, and 12+ other surfaces; persistent memory across sessions. See The Agentic Layer.

Quick Reference: All Models at a Glance

Model	Best For	Max Resolution	Free Tier	Paid From	API
Veo 3.1 (Google DeepMind)	Photorealism, widest free access	4K	✓ 10 clips/mo via Google Vids	$19.99/mo (AI Pro)	✓ (also via Adobe Firefly)
Gemini Omni Flash (Google DeepMind)	Multimodal input, broadest free distribution, SynthID provenance by default	Not published at launch	✓ YouTube Shorts + YouTube Create	$19.99/mo (Google AI Plus)	Coming weeks
Kling 3.0 / 3.0 Omni (Kuaishou)	Broadcast-ready 4K, 60fps, multi-shot storyboards	4K native	✓	~$8/mo	✓ (also via Adobe Firefly)
Seedance 2.0 Pro (ByteDance)	Character consistency, multi-shot	2K	Via CapCut (US, with restrictions)	Via CapCut / third-party	Via third-party
Luma Ray 3.2 / Ray 3.14 (Luma AI)	Production volume, frame-level control (Ray 3.2); duration/loop workflows (Ray 3.14)	1080p native; 20s max (Ray 3.2)	✓	Available	✓ (Ray 3.2 API launched June 9, 2026)
Runway Gen-4 Turbo / Aleph 2.0	Stylized/VFX, real-time avatars, multishot edit propagation	1080p	✗	$12/mo	✓ (Gen-4.5 also via Adobe Firefly)
Pika 2.5 / Pika Agents (Pika Labs)	Budget creators; multi-model agent orchestration (Kling, Veo, Seedance, MiniMax, Sora)	1080p	✗	$8/mo	✓
Grok Imagine / Grok Imagine Video 1.5 (xAI)	Speed, cheapest audio API; native I2V (#2 arena)	720p (Pro 1080p delayed); 1.5 native I2V	✗	X Premium / SuperGrok	✓ Original: $0.05/sec ($3.00/min, no audio); 1.5 GA: $0.08/sec ($4.80/min, audio-native)
LTX-2.3 (Lightricks)	Local / private generation	4K	✓ Open source	Free (Apache 2.0)	ComfyUI
HappyHorse-1.0 / 1.1 (Alibaba)	1.0: #1 T2V no-audio (Elo 1,290); 1.1: #2 T2V no-audio (Elo 1,285), #2 I2V audio (Elo 1,119); API live	1080p (joint audio, 7-language lip-sync)	1.0 weights still pending; API live	$0.14/sec 720p · $0.28/sec 1080p	✓ via fal.ai + Alibaba Cloud Bailian
Wan 2.7 (Alibaba)	Thinking Mode, 5 unified task types	1080p	✓ Open source	Free	ComfyUI / Model Studio
SkyReels V4 (Skywork AI)	Joint audio-video, open-source	1080p	✓ 70 credits/mo	Free (open source)	—

Rankings and pricing change weekly. Scroll down for full model breakdowns.

The Big Eight: Commercial Models

These are the production-grade models dominating professional and creator workflows in 2026. The market has matured to the point where no single model leads across all dimensions — the professional standard is now multi-model routing, choosing the right tool for each specific shot.

Luma Ray 3.2 / Ray 3.14 — Luma AI

Best for: Professional production volume, frame-level keyframe control, HDR/EXR output, cost-efficient multi-shot workflows

Max resolution: 1080p native
Max duration: Up to 20 seconds at 1080p (Ray 3.2); variable (Ray 3.14)
Key features (Ray 3.2, June 9, 2026): Up to 16 keyframes per clip for precise motion direction; performance tracking across up to 8 faces simultaneously; native HDR generation with 16-bit EXR export; Enhanced Reframe (aspect ratio, frame extension, background replacement); API launch (first API availability for Ray 3.2)
Key features (Ray 3.14): Duration-change and loop workflows; Ray3 Modify (hybrid performance/acting control)
Timeline editor / EDL Export (June 19, 2026): Multi-clip timeline editor with Edit Decision List export for NLE handoff; integrates with Ray 3.2 generation workflow
Luma Connectors (June 24, 2026): Native integrations with Airtable, Dropbox, and Google Drive for asset management and workflow automation
Speed: 4× faster generation than previous Ray model (Ray 3.14 baseline); Ray 3.2 maintains the same 1080p baseline
Pricing: 3× cheaper per-second than previous Ray (Ray 3.14 baseline)
Access: Luma AI subscription; free tier available
API: Available for both variants; Ray 3.2 API launched June 9, 2026; enterprise deployments via Luma Agents
Note: Ray 3.2 and Ray 3.14 are parallel sub-models in the Ray3 family. Ray 3.2 handles standard video-to-video transformation and multi-keyframe guidance; Ray 3.14 handles duration-change and loop workflows. Both remain available.

Luma AI’s Ray 3.14 shipped in January 2026 as the model that stepped into the commercial tier vacated by Sora’s shutdown. (Weekly Roundup — March 27, 2026) Native 1080p output, generation 4× faster than the previous Ray 3 model, per-second pricing 3× cheaper. Ray3 Modify, a companion tool for hybrid performance and acting workflows, gives studios more control over scene continuity and character consistency across shots.

Luma launched Ray 3.2 on June 9, 2026 — framing the update as a shift “from prompting to directing.” The headline addition is keyframe control: up to 16 keyframes per clip at arbitrary positions, letting operators set precise motion direction and pacing rather than prompting for results. Alongside it: performance tracking across up to 8 simultaneous faces, native HDR generation with 16-bit EXR export for post-production pipelines, Enhanced Reframe for aspect-ratio changes and background replacement, and full API availability — the first Ray 3.2 API access. Duration extends to 20 seconds at 1080p. Ray 3.14 stays available and is the better choice specifically for duration-change and loop workflows, where its fixed-length output is an asset rather than a constraint.

Luma is positioning Ray explicitly as professional infrastructure priced for production volume rather than a consumer app — a distinction that looks strategically deliberate given Sora’s failure. The company’s $900M Series C led by HUMAIN, a London office, and enterprise Luma Agents deployments at Publicis, Adidas, and Mazda all reinforce this direction. The Mazda relationship produced a concrete deliverable in April 2026: Boundless, a Johannesburg agency, used Luma Agents to deliver Mazda’s first AI-produced commercial in under two weeks — the most credible production-deployment signal for any AI video platform to date.

Kling 3.0 / Kling 3.0 Omni / Kling 3.0 Turbo — Kuaishou

Best for: Feature density, broadcast-ready output, motion quality; Turbo for rapid iteration and cost-efficiency

Max resolution: 4K native (60fps) — Standard and O3; 480p–720p — Turbo
Frame rate: Up to 60fps
Audio: Native built-in audio in six languages
Key feature: Multi-cut storyboard generation (up to 6 camera cuts, 15s); Omni/O3 adds shot/camera/character controls; Turbo adds faster generation, lower cost, improved lip-sync, stable motion
Kling 3.0 Turbo (June 17, 2026): Speed-and-cost tier; faster generation, lower price, improved lip-sync, stable motion; targets rapid iteration workflows
Kling 3.0 O3 (June 17, 2026): Up to 15-second clips at full 4K; stronger prompt-and-reference consistency; targets production-quality delivery
Platform rollout (June 17–18, 2026): fal, SeaArt, Clipfly, Fotor, GlamAI, Runware, Morphic — seven partner platforms in a coordinated same-day launch
Pricing (Turbo/O3): Kuaishou has not published official pricing for either variant; capability claims sourced from partner-platform announcements
Access: Free tier available; paid plans from ~$8/mo; also via Adobe Firefly (Creative Cloud subscription)
API: Available via Kuaishou and third-party platforms

The most capability-dense model available. Kling 3.0 is the first AI video model to meet broadcast delivery standards without upscaling, offering native 4K at 60fps. The storyboard feature generates up to six camera cuts in a single generation with visual consistency — a production-first capability no other model matches. The Kling 3.0 Omni (O3) variant adds finer-grained controls for shot duration, camera angle, and character movement across multi-shot sequences, with clips up to 15 seconds at full 4K.

On June 17–18, 2026, Kuaishou rolled two new Kling variants to seven partner platforms simultaneously — fal, SeaArt, Clipfly, Fotor, GlamAI, Runware, and Morphic — with no direct Kuaishou announcement on klingai.com (which remains 446-blocked). Kling 3.0 Turbo targets speed and cost: faster generation, lower price, improved lip-sync, and more stable motion than Standard, designed for rapid iteration and volume workflows. Kling 3.0 O3 targets production quality: clips up to 15 seconds at full 4K, with stronger prompt and reference consistency. One tier for iteration; one for delivery. Pricing for both variants has not been published by Kuaishou; all capability claims are sourced from partner-platform announcements, which agree across all seven independently.

In April 2026, both Kling 3.0 and Kling 3.0 Omni joined Adobe Firefly’s multi-model video hub alongside Veo 3.1, Runway Gen-4.5, and 30+ other AI models — significantly broadening Kling’s distribution to Adobe Creative Cloud’s existing professional user base.

Kuaishou confirmed in May 2026 it is weighing a restructuring that would bring external financing into Kling AI, characterizing the plan as preliminary with no definitive agreements signed (Weekly Roundup — May 18, 2026). The figures around it are press attribution, not company-confirmed: a reported ~$2B pre-IPO round at a ~$20B target valuation, Tencent among potential investors, and Kling annualized revenue near $500M — roughly double its January run rate, the majority from overseas markets. Treated with appropriate caution, it is still the clearest commercial-scale signal in AI video to date, and the first pure-play comparable since Sora’s shutdown removed one.

Veo 3.1 — Google DeepMind

Best for: Photorealism, 4K native output, integrated workflows, broadest free access

Max resolution: 4K native (Flow/Vertex AI); 1080p via Veo 3.1 Lite; 720p via Google Vids free tier
Audio: Native synchronized audio
Key features: Flow unified workspace; Google Vids integration (avatars, Lyria 3 music, YouTube export); Veo 3.1 Lite developer tier; voice-driven generation on Gemini-enabled Google TV
Access: Free — 10 clips/month via Google Vids (any Google account); Google AI Pro ($19.99/mo) and Ultra for higher limits; Flow is free; also via Adobe Firefly multi-model hub (April 2026) and Gemini-enabled TCL Google TVs in the US (April 2026)
API: Vertex AI ($12/min); Veo 3.1 Lite via Gemini API ($0.05/sec 720p, $0.08/sec 1080p); Veo 3.1 Fast pricing reduced April 7, 2026 (check Gemini API docs for current per-second rates)
Milestone: 1.5 billion images and videos created by Flow users

Google’s model pushes photorealistic rendering to the point where trained observers struggle to identify generated footage in blind tests. It is the engine behind Google Flow (merged creative workspace with Whisk, ImageFX, and multi-clip sequencing) and Google Vids. Veo 3.1 has been freely available to any Google account holder since April 2026 (Weekly Roundup — April 4, 2026) — 10 generations per month, 8 seconds at 720p, from text prompts or uploaded images. Google AI Pro and Ultra subscribers get more: up to 1,000 Veo clips per month, Lyria 3 custom music generation (tracks up to 3 minutes), customizable AI avatars with scene placement and wardrobe control, and direct YouTube export. This is the first time a production-grade AI video model has been made freely accessible to Google’s full account base.

Veo arrived on Gemini-enabled TCL Google TVs in the US in April 2026 (Weekly Roundup — May 11, 2026) — voice-driven generation through Gemini’s Create tab, either from scratch or by animating still images. TCL-only and US-only at launch, with no public timeline for other manufacturers or markets.

On the developer side, Google launched Veo 3.1 Lite in March 2026 via the Gemini API (Weekly Roundup — April 4, 2026) and Google AI Studio — priced at $0.05/sec for 720p and $0.08/sec for 1080p, less than half the cost of the existing Veo 3.1 Fast tier. Veo 3.1 Fast received a further price reduction on April 7, 2026 — compressing the full developer stack from the free consumer tier through production-grade API calls. Check the Gemini API pricing documentation for current per-second rates.

Gemini Omni Flash — Google DeepMind

Best for: Multimodal input across image / audio / video / text → video output; broadest free consumer distribution via YouTube Shorts; SynthID provenance baked in by default

Max resolution: Not published at launch
Max duration: Not published at launch
Audio: Native multimodal output (video grounded in audio / image / video / text input)
Watermark: SynthID embedded in every output by default; verifiable through Gemini app, Chrome, and Search
Key feature: Any-to-any multimodal generation; SynthID provenance by default; free distribution via YouTube Shorts
Access: Free via YouTube Shorts and YouTube Create; paid via Google AI Plus ($19.99/mo), Pro, or Ultra in the Gemini app and Google Flow
API: Coming weeks (developer and enterprise; pricing unannounced)
Tier ladder: Omni Flash shipped May 19, 2026; Omni Pro on “soon” with no date
Next: Omni Pro shipping window; first independent benchmarks against Veo 3.1, Kling 3.0, and HappyHorse-1.0; API rollout

Google’s any-to-any multimodal video model launched May 19, 2026 at Google I/O (Gemini Omni launch blog, Weekly Roundup — May 25, 2026). Omni Flash is what shipped that day: paid-gated to Google AI Plus, Pro, and Ultra subscribers inside the Gemini app and Google Flow, with every output watermarked via SynthID by default. Distribution rolled free into YouTube Shorts and the YouTube Create app — putting a conversational video generator inside a Google surface with over two billion logged-in users. Developer and enterprise APIs are on “coming weeks.” A heavier Omni Pro tier was named at the keynote with no shipping date attached.

The launch keynote framed Omni as a “world model” with physics-aware reasoning, but the published evidence at launch is thin — no benchmarks, no architecture paper, no resolution or duration specs (RCTV flagship analysis →). What is verifiable is the distribution side: this is the largest consumer-AI-video distribution move since Sora launched on Plus a year and a half ago. The world-model framing waits on a benchmark.

Seedance 2.0 Pro — ByteDance

Best for: Character consistency, cinematic motion, multi-shot storytelling

Max resolution: 4K native, 10-bit color depth (confirmed at Volcano Engine FORCE, June 23, 2026); standard platform tiers vary
Audio: Native audio with lip-sync
Key feature: Multi-shot storytelling, quad-modal input, frame-level precision
Access: China via Jimeng/Dreamina; Africa, South America, Middle East, SE Asia, and US via CapCut/Dreamina Seedance 2.0; global via BigMotion ($35–$95/mo), LumeFlow AI, other third-party platforms; unlimited access via Higgsfield (exclusive BytePlus-powered unlimited tier, June 17, 2026)
API: Official global API paused; available via third-party integrations (fal.ai, Higgsfield/BytePlus, others)
US restrictions: Real-face image-to-video disabled; unauthorized IP generation blocked; invisible watermarks on all output
Seedance 2.0 mini (June 16, 2026): Lightweight tier launched via Dreamina at approximately $0.02/sec for emerging markets (Southeast Asia, Middle East, Africa, Europe, South America); US-excluded; roughly 7× below the standard 720p tier ($0.151/sec per Artificial Analysis)
Higgsfield “Seedance Unlimited” (June 17, 2026): 30-day unlimited video generation on “Enhanced Seedance 2.0 Fast” (a purpose-built speed-optimized model from ByteDance’s BytePlus B2B cloud); 1-day, 7-day, and 30-day add-on tiers for Higgsfield subscribers; 480p–720p; Higgsfield is the exclusive non-ByteDance unlimited-Seedance surface globally per BytePlus commercial agreement
Note: Benchmark position (June 28, 2026): #1 on the Artificial Analysis T2V audio-included board (Elo 1,219) and #1 on I2V with-audio (1,194); #3 on T2V no-audio (Elo 1,274), behind HappyHorse-1.0 (#1, 1,290) and HappyHorse-1.1 (#2, 1,285). Copyright legislative battle remains a three-way standoff (Blackburn vs. White House vs. CLEAR Act)

The leading commercial model for character consistency and cinematic motion quality. Seedance 2.0 Pro’s Dual-Branch Diffusion Transformer generates audio and video simultaneously in a single pass. Its quad-modal input system accepts text, images, video, and audio in a single prompt. Multi-shot native storytelling and frame-level control over character appearance, object placement, and scene timing remain best-in-class for narrative work.

ByteDance’s official global API rollout was paused indefinitely in late February 2026 after the Motion Picture Association and major studios (Disney, Netflix, Paramount, Sony, Warner Bros.) issued cease-and-desist letters over copyright concerns. The “Face-to-Voice” feature was suspended in February 2026 after it was shown to clone voices from a single photo. Japan opened a separate inquiry over unauthorized anime character reproductions.

ByteDance relaunched the model in March 2026 as Dreamina Seedance 2.0 (Weekly Roundup — March 27, 2026) across markets in Africa, South America, the Middle East, and Southeast Asia. As of April 2026, Dreamina Seedance 2.0 is available in the US via CapCut (Weekly Roundup — April 11, 2026) — a significant reversal of the prior exclusion. The deployment comes with content restrictions: image-to-video generation from inputs containing real faces is disabled, and generation of unauthorized intellectual property is blocked. All output carries an invisible watermark for off-platform identification.

In June 2026, ByteDance expanded Seedance’s distribution through two parallel channels. On June 16, Dreamina Seedance 2.0 mini launched for emerging markets — Southeast Asia, the Middle East, Africa, Europe, and South America — at approximately $0.02/sec, roughly 7× below the standard 720p tier. The US is excluded. On June 17, Higgsfield launched “Seedance Unlimited” — 30-day unlimited access to an “Enhanced Seedance 2.0 Fast” model, delivered through an exclusive partnership with BytePlus (ByteDance’s global B2B cloud platform). Higgsfield is the only non-ByteDance surface with unlimited Seedance globally; the commercial infrastructure runs through BytePlus while Higgsfield owns the creator-facing product. Two channels, one underlying model family, different regulatory profiles — the structure gives ByteDance revenue from Western markets without operating a direct consumer product there.

At Volcano Engine FORCE on June 23, 2026, ByteDance confirmed that Seedance 2.0 now supports native 4K with 10-bit color depth — the underlying capability specification the current three-platform distribution wave runs on. At the same event, ByteDance previewed Seedance 2.5; see What’s Coming.

The copyright landscape around Seedance is a three-way Washington standoff: the White House’s National Policy Framework for AI (March 2026) stated that AI training on copyrighted works does not constitute infringement — the opposite of the Blackburn bill’s position. Separately, the bipartisan CLEAR Act (Schiff/Curtis) would require public disclosure of training data without resolving the fair use question either way.

Grok Imagine — xAI

Best for: Speed, low-cost audio-native API, rapid iteration, social media distribution

Max resolution: 720p — both models (Pro 1080p tier still unshipped, 2+ months past Musk’s April commitment; no new public timeline)
Arena (Grok Imagine): #5 on Artificial Analysis T2V no-audio (Elo 1,235 — effectively tied with Kling 3.0 Omni at #4)
Arena (Grok Imagine Video 1.5): #2 on the Artificial Analysis I2V with-audio arena (Elo 1,113, as of June 2026)
Max duration: 30 seconds (via chained extensions)
Audio: Original Grok Imagine: synchronized audio; Video 1.5: audio-native (synchronized in the same generation pass)
Key feature: Video extension from frame; dual generation modes (Quality + Speed); native video understanding (Grok 4.3 Beta); fastest iteration cycle in the industry; Video 1.5 Fast variant (~25s for 6-sec 720p)
Access: X Premium / SuperGrok subscription required
API pricing:
- Grok Imagine (original): $0.05/sec ($3.00/min) — no native audio
- Grok Imagine Video 1.5 (GA June 16, 2026): $0.08/sec ($4.80/min) — audio-native; single published rate, no resolution tier on any xAI docs page; tiered figures circulating elsewhere are unconfirmed
Engine: Aurora autoregressive MoE model on 110,000 NVIDIA GB200 GPUs
Next: Grok Imagine Pro (1080p) overdue past Musk’s late-April commitment; track xAI release notes for the actual ship date
Caution: Faced regulatory scrutiny over content moderation (UK ICO, France, California AG); image editing now restricted to paid subscribers

xAI shipped four major updates between January and March 2026: API launch (January 28), Grok Imagine 1.0 with 720p video and audio (February 3), Grok 4.20 (February 17), and video extension (March 2). The “Extend from Frame” feature lets users chain clips by continuing from the final frame, enabling sequences up to 30 seconds while preserving lighting, motion, and character positioning.

In April 2026, xAI released Grok 4.3 Beta with native video understanding — letting Grok analyze video as a coherent temporal sequence rather than as isolated frames. The understanding capability is distinct from Grok Imagine’s generation pipeline, but the two now stack: Grok can both generate and reason about video within the same model family. No other major lab currently offers vertical integration of native generation, native understanding, and platform-scale distribution under a single subscription.

On June 3, 2026, xAI shipped Grok Imagine Video 1.5 as a public API preview (Weekly Roundup — June 8, 2026) — the company’s first native image-to-video model. Per xAI’s docs: image-to-video with audio output, $0.08 per second, model alias grok-imagine-video-1.5-2026-05-30. It opened at #2 on the Artificial Analysis I2V with-audio arena, behind Seedance 2.0. Secondary reports described a higher-resolution tier; the docs list a single rate, so treat the tiered-pricing figures circulating elsewhere as unconfirmed by xAI.

On June 16, 2026, xAI moved Grok Imagine Video 1.5 to general availability across the Imagine API, grok.com, iOS, and Android simultaneously. The GA launch added a Video 1.5 Fast variant — a 6-second 720p clip in approximately 25 seconds, down from 40-plus seconds in preview. The single published rate remains $0.08/sec; xAI’s docs list no resolution-tiered pricing. At GA, Grok Imagine Video 1.5 holds #2 on the Artificial Analysis I2V with-audio arena (Elo 1,113), behind Seedance 2.0 (1,194). xAI’s #1 claim at launch wasn’t borne out by the live board. The “$4.20/min” figure circulating refers to the original Grok Imagine’s $0.07/sec@720p rate (a different, older model with no native audio) — not to Video 1.5.

Grok Imagine’s API pricing undercuts every audio-capable commercial tier except the original Grok Imagine itself. The trade-off is a 720p resolution ceiling — every other major model offers 1080p or higher. The 1080p Grok Imagine Pro tier telegraphed by Elon Musk for late April 2026 missed its April window and has no new public timeline from xAI. Each week it slips, the per-minute price advantage matters less and the resolution gap matters more: Veo 3.1 Lite ships 1080p at $0.08/sec, Kling 3.0 ships native 4K at the $8/month tier, and even open-source LTX-2.3 outputs true 4K. The launch puts the two subjects of RCTV’s Grok vs. Seedance comparison at one and two on the I2V arena.

Runway Gen-4 Turbo / Aleph 2.0 — Runway

Best for: Stylized content, VFX aesthetics, professional ecosystem, real-time avatars, agentic production, multishot edit propagation

Max resolution: 1080p (Gen-4 Turbo); 1080p multishot up to 30s (Aleph 2.0 in Edit Studio); 720p real-time (Characters)
Audio: Supported
Key features: Motion brushes, style control, API maturity (Gen-4 Turbo); Characters real-time avatar API (GWM-1); Runway Agent (conversational end-to-end production, May 13, 2026); Aleph 2.0 multishot edit propagation in Edit Studio (May 21, 2026)
Access: From $12/mo (runwayml.com); Gen-4.5 also via Adobe Firefly (Creative Cloud subscription); Runway Agent and Edit Studio (Aleph 2.0) at runwayml.com on all paid plans
API: Most mature video generation API available; Characters API at dev.runwayml.com
Next: Standalone Gen-4.5 launch on runwayml.com still pending; real-time video model research preview on Vera Rubin hardware (sub-100ms TTF); independent hands-on for Aleph 2.0 against Veo 3.1 and Gemini Omni editing capabilities
Note: Characters is an enterprise API product built on GWM-1, separate from the Gen-4 Turbo generation pipeline

Runway leads in non-photorealistic and stylized video — VFX-oriented aesthetics, abstract content, and artistic directions where other models default to photorealism. Gen-4 Turbo has the most mature professional ecosystem with motion brushes, scene consistency tools, and a robust API. Runway closed a $315M Series C in February 2026 at a $5.3B valuation.

In March 2026, Runway launched Characters — a real-time video agent API built on its GWM-1 world model (Weekly Roundup — March 13, 2026). Characters generates fully conversational AI avatars from a single reference image with no fine-tuning required. The avatars sustain realistic lip-sync, facial expressions, eye contact, and gesture across extended multi-minute conversations, running at 24fps at 720p in real time. BBC and Silverside are early enterprise partners.

At NVIDIA GTC in March 2026, Runway demoed a research preview of a new real-time video generation model (Weekly Roundup — March 20, 2026) running on NVIDIA Vera Rubin hardware — achieving time-to-first-frame under 100ms for HD video. Gen-4.5 became accessible via Adobe Firefly’s multi-model video hub in April 2026 (Weekly Roundup — April 27, 2026) — Runway’s first major distribution beyond its own platform.

In May 2026, Runway launched Runway Agent (Weekly Roundup — May 18, 2026) — a conversational creative partner that runs ideation, generation, sound design, and editing end-to-end from a chat interface. The same month, Runway opened a Tokyo office and committed $40M to Japan, its third-largest market, with its enterprise base tripled in twelve months.

In May 2026, Runway released Aleph 2.0 (Weekly Roundup — May 25, 2026) — an upgraded video editing model that propagates a single-frame edit across the rest of a clip while preserving everything else. Multishot sequences up to 30 seconds at 1080p, edited across cuts in one pass instead of shot-by-shot. Available on all paid Runway plans on the desktop web app.

Pika 2.5 / Pika Agents — Pika Labs

Best for: Budget-conscious creators, rapid iteration, social media content; multi-model agent orchestration via Pika Agents

Max resolution: 1080p (Pika 2.5); 480p real-time (PikaStream 1.0)
Max duration: 42 seconds (clip); persistent for live (PikaStream)
Audio: Supported (Pika Video native; ElevenLabs / MiniMax / OpenAI Whisper via Pika Agents)
Key feature: Pikaswaps, Pikaffects, fast batch generation (Pika 2.5); PikaStream 1.0 for live agent video; Pika Agents for multi-model orchestration over Kling, Veo, Seedance, MiniMax, and Sora
Access: From $8/mo (lowest entry price among major models); Pika Agents available at pika.me and across 17+ platform surfaces
API: Available

The most accessible entry point to AI video generation. Pika’s strength is speed and volume — generate 20-30 variations of a concept in minutes, then refine. Features like Pikaswaps (face/object replacement) and Pikaffects (style transfer) add creative flexibility at a price point that undercuts every competitor.

In April 2026, Pika launched PikaStream 1.0 — a real-time AI video engine for live agent meetings at 24fps/480p with ~1.5s speech-to-video latency and persistent identity across calls.

In late April 2026, Pika reintroduced its product line as Pika Agents — a multi-modal AI creative partner that orchestrates other companies’ video models from a conversational interface. The video roster includes Pika’s own model alongside ByteDance’s Seedance 2.0, Kuaishou’s Kling, MiniMax, Google’s Veo 3, and OpenAI’s Sora. On audio: ElevenLabs, MiniMax Music and Voice, OpenAI Whisper. On images: Gemini, ChatGPT Images 2, SeedDream. The agents run inside Slack, Telegram, WhatsApp, Discord, X, Notion, GitHub, Figma, and a dozen other surfaces with persistent memory and personality across sessions.

Sora 2 — OpenAI

Status: Discontinued March 24, 2026; consumer app shutdown executed April 26, 2026. RCTV analysis →

OpenAI announced Sora’s discontinuation on March 24, 2026 — the app, the API, and the Disney licensing deal announced with it in December 2025. The stated reason was compute reallocation toward “world simulation for robotics.” The numbers tell the fuller story: estimated $15M/day peak inference cost against $2.1M in total lifetime in-app revenue, and a 66% download decline from its November 2025 peak to February 2026. Sora is removed from active tracking. See Weekly Roundup — March 27, 2026 for the full breakdown.

Shutdown timeline: The Sora consumer app and web interface went dark on April 26, 2026 (Weekly Roundup — April 27, 2026) — the export window closed at that time. The Sora API remains accessible through September 24, 2026, giving developers time to migrate integrations before the model line fully retires.

The Agentic Layer: Orchestration on Top of the Models

The model is no longer the product. The five agents in this section don’t generate video — they decide which model generates video, and on what schedule, across what surfaces. Think of the commercial models above as engines: Veo 3.1, Kling 3.0, Seedance 2.0. The agents are steering wheels. The operator-relevant question has shifted up a layer: not “which model is best?” but “which agent puts the right model on the right shot, in the right workflow, with the right context carried forward?” That question didn’t exist twelve months ago. Five companies have already shipped an answer.

Agent	Vendor	Orchestrates	Single/Multi-model	Surfaces	What it automates	Access
Luma Agents	Luma AI	Ray 3.2 / Ray 3.14, Veo 3, Nano Banana Pro, Seedream, Seedance 2.0, ElevenLabs	Multi	Luma platform, enterprise	Multi-shot composition; sustained character and style across shots; full campaign packages	Enterprise; Luma platform subscription
Pika Agents	Pika Labs	Pika Video, Kling, Veo 3, Seedance 2.0, MiniMax, Sora API; audio via ElevenLabs, MiniMax, Whisper; image via Gemini, ChatGPT Images 2, SeedDream	Multi	Slack, Telegram, WhatsApp, Discord, X, Notion, GitHub, Figma + 17 other surfaces	Conversational prompting; cross-model orchestration; persistent memory and personality across sessions	Pika subscription tiers (pika.me)
Runway Agent	Runway	Runway Gen-4 Turbo, Aleph 2.0, Edit Studio	Single-vendor	Runway web app (app.runwayml.com)	Concept ideation, multi-shot generation, sound design, editing — end to end from a single conversation	All paid Runway plans ($12/mo+)
Higgsfield Supercomputer	Higgsfield AI	Higgsfield video stack + Claude Opus 4.8, GPT-5.5 Pro, Gemini 3.1 Pro (Claude backbone upgraded to Opus 4.8, May 29)	Multi	Browser, Telegram + 30 integrations (Slack, Google Drive, Notion, Figma, Gmail)	Marketing, production, and creative-direction workflows; research-to-document conversion; scheduled content tasks	Higgsfield subscriber plans (higgsfield.ai)
Adobe Firefly AI Assistant	Adobe	Photoshop, Premiere Pro, Lightroom, Illustrator, Express, Firefly — full Creative Cloud stack	Single-vendor	Standalone Firefly web app + embedded in each Creative Cloud app; Premiere Pro with project-metadata access	Multi-step CC workflows via Creative Skills; multi-app handoff; format conversion; Frame.io feedback integration	Creative Cloud subscription; public beta since April 27, 2026

Luma Agents

Luma AI shipped the first production-grade conversational agent in this category in March 2026. By March 10, deployments were live at Publicis Groupe and Serviceplan Group — no beta waiting period. The agent works from a Luma Uni-1 reasoning layer that plans and coordinates across video, image, audio, and text before generating anything. It calls Luma’s own Ray 3.2 / Ray 3.14 for video, Google’s Veo 3 for photorealistic shots, Nano Banana Pro, ByteDance’s Seedream, and ElevenLabs for voice. Seedance 2.0 joined the roster in May 2026. The Mazda commercial deliverable in April 2026 — Johannesburg agency Boundless produced Mazda’s first AI-generated commercial using Luma Agents in under two weeks — is the most credible production-deployment signal any AI video agent has produced.

The critical technical feature: persistent context across the full asset suite. Luma Agents remembers what was generated in earlier steps and can revise upstream elements when downstream evaluation surfaces a problem. That is what makes it an orchestrator rather than a fancy prompt box.

Pika Agents

Pika’s April 28, 2026 launch was the moment the agentic-orchestration pattern became industry news rather than one lab’s experiment. Pika Agents orchestrates a broader model roster than any competitor — Kling, Veo 3, Seedance 2.0, MiniMax, Sora’s API, and Pika’s own model on the video side; ElevenLabs, MiniMax, and Whisper on audio — all from a conversational interface that runs inside 17 surfaces where creators already work. RCTV covered this as the R#2 lede because the framing was explicit: “the prompt is no longer the product.” That line has since become the editorial spine for this entire category.

PikaStream 1.0 (April 2, 2026): the real-time video engine that runs inside Pika Agents as its live-avatar capability — 24fps at 480p, ~1.5-second speech-to-video latency, persistent identity across calls. It is a streaming runtime, not a standalone orchestrator; the agent layer above it is Pika Agents.

Runway Agent

Runway shipped its agent in May 2026 — single-vendor, full Runway stack. The positioning is end-to-end: Runway Agent handles concept ideation, generation via Gen-4 Turbo or Aleph 2.0, sound design, and editing inside the same conversation thread. Runway’s “single-vendor” constraint is a deliberate product posture, not a technical limitation — they own the generation and editing stack, so the agent never needs to leave it. Whether that narrows or focuses the use case depends on whether the operator’s workflow already lives in Runway. Available on all paid plans, starting at $12/month.

Higgsfield Supercomputer

Higgsfield’s Supercomputer, launched in mid-May 2026, is the most enterprise-positioned agent in the set. “Supercomputer” is the framing: orchestrate Claude, GPT-5.5, and Gemini models alongside Higgsfield’s video stack to plan and execute full content campaigns end to end. On May 19, 2026 — the same day as Google I/O — Higgsfield updated the orchestration layer to Gemini, describing the swap as “8× cheaper, 3× faster.” On May 29, Higgsfield upgraded its Claude backbone to Opus 4.8, and on May 30 shipped Higgsfield Reframe — an MCP-native aspect-ratio reframing tool available inside Claude. The multi-model roster is the broadest LLM coverage in this category. Distribution extends via browser, Telegram, and 30+ third-party integrations. In a 12-day sprint through early June 2026 (Weekly Roundup — June 15, 2026), Higgsfield added five external surfaces — Claude MCP (May 28), Adobe Premiere/After Effects plugins (May 29), Figma (June 4), a Minecraft mod (June 5), and a DaVinci Resolve plugin (June 8) — and folded Grok Imagine 1.5 into its own platform, the clearest expression yet of the routing bet: a competitor’s model generating video inside Higgsfield’s surface. Higgsfield’s Supercomputer page carries the full capability description.

Adobe Firefly AI Assistant

Adobe is the incumbent here, and the only one that arrived via acquisition of creative-workflow context rather than from a blank slate. Firefly AI Assistant — previewed as Project Moonlight at MAX, public beta April 27, 2026 — orchestrates the Creative Cloud stack conversationally: Premiere Pro with full project metadata access, Photoshop, Lightroom, Illustrator, Express. Creative Skills are the execution layer — predefined multi-step workflows that fire from a single natural-language instruction. The operative claim is multi-app handoff without context loss; the operator verdict is still forming in the public beta.

The AI-native vs. incumbent frame is worth naming: Luma, Pika, Runway, and Higgsfield are all building agents on top of AI video first. Adobe is extending an agent over a suite it already controls. The operator question these two approaches answer is different — and which model wins the relationship depends on whether the operator’s workflow is already Premiere-centric or starting from scratch.

What to watch

Three axes will decide how this layer develops. First: single-vendor vs. multi-vendor convergence. Runway’s Runway-only posture contrasts with Pika’s eight-model roster; whether Runway opens to third-party models is the specific question. Second: AI-native startup agents vs. legacy-incumbent agents. Adobe’s Creative Skills framework is embedded in the world’s most-used professional NLE; Pika Agents runs in Slack. Neither is clearly winning the operator relationship yet. Third: agent-to-agent interoperability. None of the five currently calls another vendor’s agent — they call models. The day Pika Agents calls Runway Agent, the competitive dynamics of this category change entirely.

Open-Source & Local Generation

The open-source AI video ecosystem has matured significantly, making local generation on consumer hardware a viable option for privacy-conscious creators and developers.

LTX-2.3 — Lightricks

Best for: Local/desktop generation, consumer GPU workflows, high-frame-rate output

Max resolution: 4K native (true 4K, not upscaled)
Max duration: 20 seconds
Frame rate: Up to 50fps (24/48fps options also available)
Audio: Native synchronized audio (improved HiFi-GAN vocoder)
Portrait mode: Yes (9:16, up to 1080×1920)
Hardware: Runs on GPUs with 12GB+ VRAM; optimized for RTX 50 Series (2.5× faster via NVFP4)
Integration: ComfyUI native; standalone desktop video editor (shipped March 2026)
License: Apache 2.0 (free for companies under $10M revenue; commercial license required above that threshold)

A comprehensive rebuild released in March 2026 (Weekly Roundup — March 20, 2026): a new VAE for sharper detail, a 4× larger text connector for better prompt understanding, and an improved HiFi-GAN vocoder for cleaner native audio. The model ships alongside a dedicated desktop video editor, making the entire local pipeline accessible without a ComfyUI node graph.

Key capabilities: native portrait mode (9:16 up to 1080×1920), last-frame interpolation for seamless clip chaining, and 24/48fps output options. At GDC 2026, NVIDIA announced 2.5× performance gains on RTX 50 Series via NVFP4 quantization, 60% lower VRAM usage, and RTX Video Super Resolution for ComfyUI delivering 4K upscaling 30× faster than competing local alternatives. The ComfyUI App View strips the node-graph interface into a simplified prompt-in/video-out UI for non-technical users.

Wan 2.7 — Alibaba (Tongyi Lab)

Best for: Multi-task video generation with Thinking Mode, open-source flexibility

Max resolution: 1080p
Max duration: 2–15 seconds
Task types: T2V, I2V, video continuation, reference-to-video (up to 5 persons), video editing
Key feature: Thinking Mode (chain-of-thought reasoning before generation)
Integration: ComfyUI 0.18.5+, Alibaba Cloud Model Studio, wan.video
License: Open source

Alibaba’s Wan 2.7, released April 3, 2026 (Weekly Roundup — April 17, 2026), is a major upgrade from the 2.2 line. The headline feature is Thinking Mode — a chain-of-thought reasoning approach where the model analyzes the prompt, plans composition, then generates. This produces noticeably more coherent output with fewer artifacts than single-pass generation.

Wan 2.7 Video unifies five task types in a single model: text-to-video, image-to-video (first-frame, first-and-last-frame, audio-driven), video continuation with text guidance, reference-to-video with up to five real-person inputs, and video editing via text, reference images, or style transfer. ComfyUI added support the same day in version 0.18.5 with workflow templates for all five task types.

HappyHorse-1.0 — Alibaba ATH AI Innovation Unit

Best for: Top-ranked benchmark quality (T2V + I2V); commercial API access with joint audio-video and seven-language native lip-sync

Max resolution: 1080p
Audio: Joint audio-video generation in a single forward pass; native synced output
Lip-sync languages: 7 — English, Mandarin, Cantonese, Japanese, Korean, German, French
Architecture: 15B-parameter unified 40-layer self-attention Transformer
Inference speed: ~38 seconds for 1080p on a single NVIDIA H100
Benchmark position (June 28, 2026): #1 on Artificial Analysis T2V no-audio (Elo 1,290); HappyHorse-1.1 debuted at #2 (Elo 1,285); Seedance 2.0 #3 (1,274). Elo scores update continuously. On the audio-included T2V board the order flips: Seedance 2.0 first (1,219). On I2V with-audio: Seedance 2.0 #1 (1,194), HappyHorse-1.1 #2 (Elo 1,119), HappyHorse-1.0 #3 (1,089)
Access: API live via fal.ai ($0.14/sec 720p, $0.28/sec 1080p) and Alibaba Cloud Bailian (enterprise from April 27, 2026); open weights still pending despite ATH’s marketing claim
API: ✓ via fal.ai (4 endpoints) and Alibaba Cloud Bailian (enterprise tier)

HappyHorse-1.0 debuted anonymously on Artificial Analysis on April 7, 2026 (Weekly Roundup — April 11, 2026), immediately ranked #1 in both text-to-video and image-to-video blind testing, surpassing Seedance 2.0. Alibaba revealed its ATH AI Innovation Unit ownership on April 10. The 15-billion-parameter model uses a unified 40-layer self-attention Transformer that generates audio and video jointly in a single forward pass — no cross-attention modules, no separate audio post-processing.

In April 2026, fal launched HappyHorse-1.0 as official API partner with four endpoints at $0.14 per second for 720p output and $0.28 per second for 1080p — pay-per-second, no minimums. Alibaba Cloud Bailian opened enterprise-grade access the same day.

The open-weights story is messier. ATH’s happyhorse.me/open-source landing page describes HappyHorse-1.0 as “fully open-sourced,” but independent verification finds a public GitHub repo with no model weights, no inference code, and no license file; the Hugging Face profile remains auth-gated. Alibaba has effectively separated commercial API access (live) from open-weight distribution (still unscheduled). Until weights ship, treat HappyHorse-1.0 as a commercial model with an open-source promise — the API is the actual access surface.

HappyHorse-1.1 appeared on the Artificial Analysis leaderboard in late June 2026 — the first model update from ATH’s AI Innovation Unit since the 1.0 debut in April. On the T2V no-audio board, 1.1 debuted at #2 (Elo 1,285), one position below 1.0 (1,290). On the I2V with-audio board, 1.1 outranks 1.0: #2 (Elo 1,119) versus 1.0’s #3 (1,089). No separate pricing or distinct API endpoint has been announced for 1.1; access is through the same fal.ai and Alibaba Cloud Bailian surfaces. Open-weights delivery for 1.0 remains unchanged — nine weeks past the “fully open-sourced” marketing claim.

Other Notable Open-Source Models

SkyReels V4 (Skywork AI) — Released April 3, 2026 (Weekly Roundup — April 17, 2026). First open-source model to co-generate video and synchronized audio in a single forward pass. Dual-stream Multimodal Diffusion Transformer (MMDiT) architecture; 1080p at 32 FPS, clips up to 15 seconds. Accepts text, images, video clips, masks, and audio references. Ranked among the top models on Artificial Analysis T2V with audio leaderboard (Elo ~1,135). Free tier: 70 monthly credits on skyreels.dev; open-source weights available for local deployment
Mochi 1 — High-fidelity short video with strong prompt alignment
HunyuanVideo / HY-World 2.0 (Tencent) — HunyuanVideo offers solid image-to-video with coherent motion. In April 2026, Tencent’s Hunyuan team released HY-World 2.0 — a multi-modal world model that generates editable 3D scenes (meshes plus Gaussian Splattings) from text prompts or single reference images, with WorldMirror 2.0 inference code and weights open-sourced (github.com/Tencent-Hunyuan/HY-World-2.0). The combination of editable 3D geometry and open weights makes HY-World 2.0 the more pipeline-friendly counterpart to Alibaba’s still-gated Happy Oyster
Happy Oyster (Alibaba ATH) — Released April 16, 2026 (Weekly Roundup — April 17, 2026). World model that generates interactive, physics-aware 3D environments from text prompts; targets gaming, film, and VR. Directing and Wandering modes are designed for real-time exploration but don’t expose the underlying 3D representation in a standards-friendly way (unlike Tencent’s HY-World 2.0 above). Live demo accessible via Artificial Analysis arena; weights gated
MAGI-1 — Long-form video synthesis capabilities
Helios (Peking University / ByteDance / Canva) — 14B autoregressive diffusion model; 19.5fps real-time generation on a single NVIDIA H100; capable of minute-scale video; Apache 2.0 license. Released March 2026. Notable for real-time throughput on a single accelerator
NVIDIA Cosmos 3 (NVIDIA) — Released June 1, 2026 at Computex (Weekly Roundup — June 8, 2026). An open-weights omnimodel for physical AI that generates text, images, video, ambient sound, and actions in a unified architecture; shipped as Cosmos 3 Super and Cosmos 3 Nano on Hugging Face under the OpenMDW-1.1 license, corroborated by a 291-author arXiv paper. NVIDIA claimed top open-source rank on Artificial Analysis for text-to-image and image-to-video at launch; as of June 2026 the live board confirms Cosmos3-Super leading open-weight image-to-video at Elo 1,251 (Weekly Roundup — June 15, 2026) — the claim now borne out on the I2V side. Positioned as physical-AI / world-model infrastructure rather than a creator generation endpoint, but the open weights and unified video generation make it a tracked open-source entrant

How to Choose: A Routing Framework

The right model depends on the shot, not the project. Here’s a practical decision framework:

Which AI video model is best for broadcast-ready 4K? Kling 3.0 or Veo 3.1. Kling hits 4K at 60fps with multi-cut storyboards. Veo 3.1 leads on photorealism.

Which AI video model wins on benchmark quality with commercial API access? HappyHorse-1.0 via fal.ai ($0.14/sec 720p, $0.28/sec 1080p) or Alibaba Cloud Bailian — #1 on Artificial Analysis T2V and I2V no-audio leaderboards; joint audio-video; seven-language native lip-sync.

What’s the best free AI video model to start with? Veo 3.1 via Google Vids (10 free clips/month, any Google account).

Which AI video model is free inside an app you already use? Gemini Omni Flash via YouTube Shorts and YouTube Create (available since May 2026, any Google account).

Which AI video model accepts multimodal input (image + audio + video → video)? Gemini Omni.

Which AI video model wins character consistency across shots? Seedance 2.0 Pro via CapCut (US available since April 2026, with real-face restrictions) or Luma Ray 3.2.

Which AI video model is best for stylized and VFX work? Runway Gen-4 Turbo.

Which AI video model propagates a single-frame edit across a 30-second multishot? Runway Aleph 2.0 in Edit Studio.

Which AI video model handles professional production volume at scale? Luma Ray 3.2 (4× faster, 3× cheaper than previous Ray; adds 16-keyframe control and HDR/EXR export).

What’s the best low-cost AI video model for volume work? Pika 2.5.

What’s the cheapest AI video API in 2026? Grok Imagine ($4.20/min generated).

Which AI video model is best for local generation and privacy? LTX-2.3 via ComfyUI or desktop editor.

Which AI video API is best for real-time interactive avatars? Runway Characters (GWM-1).

What’s the best real-time AI video for live agent meetings? PikaStream 1.0 (24fps/480p, ~1.5s latency).

Which AI video model wins multi-shot narrative? Seedance 2.0 Pro via CapCut (US, with restrictions), Luma Ray 3.2, or Kling 3.0 Omni.

Which AI video models work inside Adobe Creative Cloud? Adobe Firefly multi-model hub (Veo 3.1, Kling 3.0/Omni, Runway Gen-4.5, Luma, plus 30+ others).

Which AI video orchestration agent runs Kling + Veo + Seedance + MiniMax from one chat? Pika Agents (April 28, 2026; Slack/Telegram/Discord/X/Notion/Figma, persistent memory) or Higgsfield Supercomputer (mid-May 2026; orchestrates Seedance 2.0, Gemini, GPT-5.5 on web and Telegram). Runway Agent (May 13, 2026) covers the single-vendor end-to-end case. Luma Agents for enterprise campaign production (Publicis, Adidas, Mazda deployments). Adobe Firefly AI Assistant for teams already in Creative Cloud (public beta, CC subscription). See The Agentic Layer for the full comparison table.

Which AI model is best for editable 3D world generation? Tencent HY-World 2.0 (open weights) or Alibaba Happy Oyster (gated early access).

Which AI video model has the largest built-in distribution? Grok Imagine (500M+ X users).

Most professional workflows use 2-3 models per project, routing different shots to different engines based on the specific requirements of each scene.

What’s Coming

Reactor — real-time world models as infrastructure — Reactor emerged from stealth in May 2026 with $59M (Lightspeed led; Jeffrey Katzenberg’s WndrCo, Amplify Partners, Sky9 Capital, FPV Ventures also in). Co-founders are former Apple Vision Pro technical leads. The pitch: a unified SDK and API that makes real-time world models available to developers “with a few lines of code,” targeting media and entertainment, physical AI, and robotics. This is a different category from the commercial models and agents tracked above — it is infrastructure for interactive AI worlds, not a generation endpoint. No Stack row warranted yet; worth watching as the definition of “AI video” expands toward interactive, real-time, and physics-driven output.
MCP as a distribution layer for AI video — In May 2026, Runway launched Runway MCP, connecting its model roster (Gen-4.5, Seedance 2.0, GPT Images 2.0, Kling) to Claude, ChatGPT, Cursor, Replit, and any MCP-compatible client. The following day, Pika and Higgsfield both shipped their own MCP skills — Pika’s Founder Starter Kit (four Claude skills: Build-a-Brand, App Screens, Product Sizzle, Founder Video) and Higgsfield Supercomputer as a Claude skill. Three Tier-1 AI video labs shipped MCP integrations inside 24 hours; Higgsfield also launched five Adobe Premiere Pro and After Effects plugins in the same window. MCP is becoming the standard distribution channel for AI video capability into developer and coding-agent workflows — a second distribution layer running alongside the model APIs.
Seedance 2.5 — 30-second native clips, enterprise beta — ByteDance previewed Seedance 2.5 at Volcano Engine FORCE (June 23, 2026). Capabilities per ByteDance: single clips up to 30 seconds with no post-stitching (scene changes and tempo shifts in a single generation pass); up to 50 multimodal reference inputs; targeted post-generation editing that redrafts part of a frame without touching the rest. Global enterprise beta is open now; public launch targeted early July 2026. No independent benchmarks exist yet — every spec is ByteDance’s own claim from a developer conference. If they hold, 30-second native generation from the model currently holding #1 on the T2V audio-included leaderboard resets the category’s duration ceiling.
Dreamina Octo — ByteDance’s “next chapter” beyond Seedance 2.0. Revealed at AI on the Lot (Culver City, May 27, 2026) under the framing “From Generation to Emergence” and “when the prompt isn’t the point.” Early access survey live; product confirmed as “arriving soon,” not yet shipped. If Octo ships as a conversational orchestrator rather than a generation model, it belongs in the agentic section; if it ships as a new Seedance-tier model, it belongs in the Big Eight. Watch @dreamina_ai for the actual launch.
Gemini Omni Pro shipping window — Google announced Omni Pro at I/O 2026 alongside Omni Flash with no shipping date attached. “Soon” is the operative word. Independent benchmarks for Omni Flash against Veo 3.1, Kling 3.0, and HappyHorse-1.0 are the test that converts the launch keynote’s world-model framing from rhetoric to evidence — or not (RCTV flagship analysis →)
Agentic-orchestration layer — Five conversational agents now sit on top of the model layer: Luma Agents (March 5, 2026), Pika Agents (April 28, 2026), Runway Agent (May 13, 2026), Higgsfield Supercomputer (mid-May 2026), and Adobe Firefly AI Assistant (public beta April 27, 2026). See The Agentic Layer for the full comparison
Grok Imagine Pro (1080p) — Slipped past Musk’s late-April commitment with no new public timeline as of June 2026. Grok Imagine Video 1.5 reached GA on June 16 — audio-native at $0.08/sec across the Imagine API, grok.com, iOS, and Android — but both models still top out at 720p. Until the Pro tier ships, Grok stays out of broadcast and large-format work. Track xAI release notes for the actual ship date
HappyHorse-1.0 open-source weights — Now roughly six weeks past the April 27, 2026 commercial API launch on fal, and the weights still haven’t shipped: the public GitHub repo remains empty (no weights, no inference code, no license file) and the Hugging Face profile still reads “coming soon.” Artificial Analysis now lists HappyHorse as an API-only product. ATH’s happyhorse.me/open-source “fully open-sourced” claim hasn’t softened, hasn’t acknowledged the gap, and hasn’t put a date on the artifact. Independent verification by WaveSpeedAI remains the canonical source on the marketing-vs-artifact gap
TAKE IT DOWN Act enforcement first action — The FTC opened formal enforcement after the May 19, 2026 deadline (FTC press release, business-guidance framework). The agency stood up TakeItDown.ftc.gov as the victim-facing intake surface and on the same day sent a second-wave warning letter to twelve “nudify” tool sites — on top of the fifteen platforms named May 13, 2026. First enforcement target — platform, tool, or generator — sets the operative precedent for the rest of the year. Civil penalties up to $53,088 per violation; 48-hour removal SLA; Section 5 enforcement (Section 230 is no shield)
AI executive order — signed June 2, 2026, light-touch — Trump signed Promoting Advanced Artificial Intelligence Innovation and Security on June 2, 2026. The version that landed is voluntary, not the FDA-style framework early drafts floated: developers of “covered frontier models” may give federal agencies a 30-day pre-release look, and the text explicitly bars any “mandatory governmental licensing, preclearance, or permitting requirement.” OpenAI, Anthropic, and Google all welcomed it. For AI video the exposure runs through the frontier-model layer, and the DOJ criminal-AI priority stacks onto TAKE IT DOWN. The month-long regulatory overhang resolved in industry’s favor (Weekly Roundup — June 8, 2026)
Runway Cosmos Coalition — world models as an industry alliance — In June 2026, Runway, NVIDIA, and “leading AI labs” announced the Cosmos Coalition to build and open-source frontier world models; Runway joins as a founding member. As announced it’s a mission statement — no specs, no timeline, no architecture. It landed the same day as NVIDIA’s actual Cosmos 3 release (now in Open-Source above) and Luma’s Open Physical AI Lab, making “world model” three different things in 24 hours (Weekly Roundup — June 8, 2026)
Agnes-Video — a new API price floor — Singapore’s Sapiens AI entered the Artificial Analysis arena with Agnes-Video-V2.0 at $0.30/min — the cheapest price on the board, roughly a tenth of the next commercial tier. The catch is quality: it debuts near the bottom (Elo 905, #24 on audio T2V). A price-floor entrant, not a quality threat — yet (Weekly Roundup — June 8, 2026)
Varya — India’s government-backed open-weight entrant — Bangalore’s Avataar.ai launched Varya (June 11, 2026), an open-weight video model distilled from Alibaba’s Wan 2.2 (4 steps vs. 50; 5-second 720p in 45s on an H200), weights on India’s AI Kosh repository under the $1.2B IndiaAI Mission. The hook is price: a vendor-reported ₹0.48/sec (~$0.005), roughly 20× below the global leaders. Treated as a watch entrant, not a tracked open-source row: the 14B parameter figure is Indian-outlet-only (not confirmed by TechCrunch), Artificial Analysis hasn’t benchmarked it, and Avataar’s own model page was unreachable at writing. Real as a price-floor and sovereign-AI signal; specs unverified (Weekly Roundup — June 15, 2026)
Black Forest Labs enters the video conversation — Martin Scorsese joined BFL as an adviser (June 2, 2026), publicly using FLUX to storyboard his next film; FLUX.2 shipped a multi-reference feature the same day. BFL is primarily an image-model company — a video model is in development — but it sits on $300M in Series B funding ($3.25B valuation, December 2025; a16z, NVIDIA, Salesforce Ventures) and is already a Firefly multi-model partner (Weekly Roundup — June 8, 2026)
State deepfake legislation map — Federal preemption of state AI law isn’t happening in 2026. Connecticut HB 5312 (May 2026) establishes a private right of action against creators of non-consensual AI-generated intimate imagery. Vermont’s election-deepfake bill, Iowa’s chatbot-safety law, and Utah’s nine AI bills moved in the same period. Together with Tennessee’s ELVIS Act, California’s AB 2655, and New York’s election-deepfake law, the state map is denser than the federal one. AI video labs allowing image-to-video from real-face inputs need to model state civil exposure alongside federal compliance
State synthetic-media provenance mandates — A second state-law track, distinct from the NCII-liability bills above: provenance-embedding requirements that reach the generators directly rather than punishing misuse. Connecticut SB 5 passed both chambers in May 2026 and awaits Gov. Lamont’s signature; it requires providers with >1M monthly users to embed C2PA-aligned, tamper-resistant provenance data in generated audio, image, and video (provenance obligation effective Oct 1, 2026; detectability standard Oct 1, 2027). Arizona SB 1786 passed the state Senate and awaits a House vote (reconciliation stalled as of late May 2026). California SB 1000 advanced through the state Senate 33–1 with an urgency clause in May 2026 — urgency designation triggers immediate effect on signature rather than the standard January 1 timeline. California AB 2713 (California AI Transparency Act) — a parallel California track requiring content-provenance disclosure; distinct from SB 1000’s technical watermarking mandate. Hawaii HB 2137 has sat on Governor Green’s desk without a signature; status verified against the Transparency Coalition tracker. Unlike removal mandates, provenance embedding is a model-build requirement — it changes what the model ships, not just what a platform takes down (Weekly Roundup — May 25, 2026)
Runway Gen-4.5 — Accessible via Adobe Firefly’s multi-model hub since April 2026; standalone Gen-4.5 launch on runwayml.com still pending. Previewed on NVIDIA Vera Rubin hardware at GTC (March 2026)
NVIDIA Vera Rubin cloud deployment — AWS, Google Cloud, Microsoft Azure, and OCI all confirmed H2 2026 availability. Vera Rubin delivers 10× lower inference token cost versus Blackwell — the number that will reshape per-second AI video pricing across all major cloud platforms
DLSS 5 — NVIDIA’s neural rendering technology, launching Fall 2026. Explicitly positioned for filmmaking and VFX beyond gaming; uses generative AI to infuse photoreal lighting and materials anchored to source 3D geometry
Blackburn draft AI bill — GOP Senate draft (March 19, 2026) declares AI training on copyrighted works not fair use; targets deepfakes and Section 230. Not yet introduced as legislation; path to passage uncertain
White House AI framework vs. CLEAR Act — White House (March 2026) takes the opposite position from Blackburn: AI training is not infringement; courts should decide. Bipartisan CLEAR Act (Schiff/Curtis) proposes mandatory training data disclosure without resolving fair use. Three irreconcilable positions now active in Washington simultaneously
Seedance 2.0 copyright litigation — US CapCut access available since April 2026 with real-face and IP restrictions, but the underlying copyright dispute with Disney, Paramount, Warner Bros., and Netflix remains unresolved. The restrictions are a negotiating posture, not a settlement
OpenAI robotics / world simulation — OpenAI redirected Sora’s compute toward “world simulation for robotics” after shutting the product down. The consumer app went dark on April 26, 2026 as scheduled; the Sora API remains accessible until September 24, 2026
Adobe Firefly multi-model expansion — Firefly’s video hub now hosts 30+ third-party AI models including Kling 3.0/Omni, Veo 3.1, Runway Gen-4.5, ElevenLabs Multilingual v2, Luma AI, Black Forest Labs, and Topaz Labs. Firefly AI Assistant orchestrates multi-step workflows across Photoshop, Premiere, Lightroom, Express, and Illustrator
Tencent vs. Alibaba 3D world model race — Two of China’s largest AI labs shipped 3D world models on the same day, April 16, 2026 (Alibaba’s Happy Oyster, gated; Tencent’s HY-World 2.0, open weights). Western labs have nothing comparable in production; the 6-to-12 month head start is real if world simulation matters as much as OpenAI’s Sora-shutdown framing implied
Google Vids / Workspace expansion — YouTube export is live; paid creative tiers (Pro/Ultra) include Lyria 3 music generation and AI avatars. Further Workspace AI integration expected throughout 2026
EU AI Act Article 50 — a two-step deadline, not one cliff: the disclosure and deepfake-labeling obligations take effect August 2, 2026, but the May 2026 AI Omnibus agreement granted a four-month grace period — to December 2, 2026 — on the harder machine-readable watermarking requirement (Art. 50(2)) for generative systems already on the market before August 2. The Code of Practice defining the technical standard is still in draft (Weekly Roundup — June 15, 2026)
Unlimited-length AI video — EPFL’s drift elimination breakthrough (presenting at ICLR 2026) could remove the duration ceiling entirely
xAI targeting 30-minute video — Announced goal for late 2026, with full-length films targeted for 2027

This page is maintained by RCTV as a public reference. For weekly updates on model releases and industry shifts, see our Weekly Roundup.

Have a correction or update? Contact us at rctv.oxncw@simplelogin.com

Changelog

Showing the four most recent updates. Full changelog archive →

June 28, 2026

Last updated date: Advanced from June 21 to June 28, 2026. lastmod set to 6/28 deliberately so the June 29 Weekly Roundup (R#10) leads as the headline article rather than tying with this Stack update (lead-by-one rule). Ships Sunday per the standard Stack-paired-with-Roundup cadence.
HappyHorse-1.1 enters the Artificial Analysis leaderboard: First model update from Alibaba ATH’s AI Innovation Unit since the 1.0 debut in April. T2V no-audio: HappyHorse-1.0 remains #1 (Elo 1,290, revised from 1,293); HappyHorse-1.1 debuts at #2 (Elo 1,285); Seedance 2.0 falls to #3 (1,274). I2V audio: Seedance 2.0 #1 (1,194), HappyHorse-1.1 #2 (1,119), HappyHorse-1.0 #3 (1,089). Audio T2V: Seedance 2.0 #1 (1,219). Updated Quick Verdict, Quick Reference table, HappyHorse spec card (Benchmark position bullet), and HappyHorse prose (new HH-1.1 paragraph). No separate 1.1 pricing or endpoint announced; open-weights for 1.0 remain undelivered at week 9. Source: Artificial Analysis T2V leaderboard, Artificial Analysis I2V leaderboard (verified June 28, 2026 — cited in R#10 sourcing).
Seedance 2.0 — 4K native confirmed at FORCE: At Volcano Engine FORCE (June 23, 2026), ByteDance confirmed Seedance 2.0 now supports native 4K with 10-bit color depth. Seedance spec card Max resolution updated from 2K to 4K native, 10-bit color depth; Seedance prose adds FORCE note. Benchmark Note bullet updated: audio T2V Elo 1,215 → 1,219; T2V no-audio position #2 → #3 (behind HH-1.0 and HH-1.1). Source: The Decoder — ByteDance’s Seedance 2.5 breaks the 30-second barrier.
Luma — Timeline editor / EDL Export (June 19) + Luma Connectors (June 24): Two Luma AI feature updates from the R#10 coverage window that cleared the roundup’s scope gate but belong in the Stack. Timeline editor adds multi-clip editing with Edit Decision List export for NLE handoff. Luma Connectors adds native Airtable, Dropbox, and Google Drive integrations. Both added as spec card bullets in the Luma Ray 3.2 / Ray 3.14 entry.
What’s Coming — Seedance 2.5 added: ByteDance previewed Seedance 2.5 at FORCE (June 23): 30-second native clips with no post-stitching, up to 50 multimodal reference inputs, targeted post-generation editing. Enterprise beta now; public launch early July. All specs vendor-claimed; no independent benchmarks yet. Added before Dreamina Octo in the ByteDance cluster.
What’s Coming — California AB 2713 added: Added AB 2713 (California AI Transparency Act) to the State synthetic-media provenance mandates item as a parallel California track distinct from SB 1000. Sources: R#10 Story 3 leginfo verification.
params.toml ogImage: Updated from roundup-2026-06-22.png to roundup-2026-06-29.png. Landscape counts unchanged: commercial 8 / open_source 10 / agentic 5 — no new rows this week.
Changelog trim: June 7, 2026 block removed from main page inline section (already in archive). Main page inline now shows exactly four blocks: June 28 / June 21 / June 14 / June 11.
Considered and excluded: Kling 3.0 Turbo pricing details (confirmed this week via third-party resellers — Stack material, but the Kling entry already carries the pricing caveat “Kuaishou published no official pricing”; the third-party reseller spread belongs in the roundup, not a Stack correction until Kuaishou publishes official rates). HappyHorse-1.1 as a separate Big Eight row (it is a model update to an existing entry, same API surfaces — update-in-place, not a new row). params.toml landscape counts unchanged.

June 21, 2026

Last updated date: Advanced from June 14 to June 21, 2026. lastmod set to 6/21 deliberately so the June 22 Weekly Roundup (R#9) leads as the headline article rather than tying with this Stack update (lead-by-one rule). Ships Sunday per the standard Stack-paired-with-Roundup cadence.
Luma — Ray 3.2 (June 9, 2026): Section renamed from “Luma Ray 3.14” to “Luma Ray 3.2 / Ray 3.14.” Ray 3.2 and Ray 3.14 are parallel sub-models in the Ray3 family — not sequential versions. Ray 3.2 (June 9) adds up to 16 keyframes per clip, performance tracking across 8 faces, native HDR with 16-bit EXR export, Enhanced Reframe, clips up to 20 seconds at 1080p, and first API availability. Ray 3.14 remains available for duration-change and loop workflows where its fixed-length output is an asset. Spec card, prose, Luma Agents table, and all Routing Framework deep-links updated. Source: Luma blog — introducing-ray-3-2, Luma learning center — ray-3-2-introduction-and-core-concepts.
Grok Imagine — Video 1.5 GA (June 16, 2026): Updated from “shipped as public API preview June 3” to general availability June 16 across Imagine API, grok.com, iOS, and Android. Added Video 1.5 Fast variant (~25s for a 6-sec 720p clip). Pricing reconciled: original Grok Imagine is $0.05/sec ($3.00/min, no native audio); Video 1.5 GA is $0.08/sec ($4.80/min, audio-native) — single published rate, no resolution tier on any xAI docs page. The “$4.20/min” figure that was in the Quick Verdict pointed to the old $0.07/sec@720p rate of the original model — corrected. Elo updated to 1,113 (from 1,110) per June 2026 AA leaderboard. Source: docs.x.ai/developers/models (re-verified June 21).
Kling — Turbo and O3 (June 17–18, 2026): Added Kling 3.0 Turbo (speed/cost tier: faster generation, lower price, improved lip-sync, stable motion) and Kling 3.0 O3 (up to 15s at full 4K, stronger prompt/reference consistency). Rolled to 7 partner platforms simultaneously: fal, SeaArt, Clipfly, Fotor, GlamAI, Runware, Morphic. Kuaishou published no official pricing; capability claims attributed to partner-platform announcements (independently corroborated across all 7 platforms). klingai.com remained 446-blocked. Source: fal.ai/kling-3 + 6 partner confirmations.
Seedance — two new access channels (June 16–17, 2026): Added Higgsfield “Seedance Unlimited” (June 17) — exclusive non-ByteDance unlimited-Seedance surface via BytePlus (ByteDance’s B2B cloud), “Enhanced Seedance 2.0 Fast” model, 30-day add-on tiers, 480p–720p. Added Dreamina Seedance 2.0 mini (June 16) — emerging markets (SE Asia, ME, Africa, Europe, SA; US excluded) at ~$0.02/sec, roughly 7× below standard 720p tier. Both added to spec card and Seedance prose. Source: @higgsfield X post June 17 (primary, xservice); @dreamina_ai X post June 16 (primary, xservice).
What’s Coming — Grok 1080p item updated: Noted Video 1.5 GA while preserving the 1080p-Pro-still-missing tracking.
Changelog trim: May 25, 2026 block moves to the full changelog archive to restore the four-most-recent display (per the June 14 block’s flagged pending action).
Considered and excluded: No new Big Eight rows (Ray 3.2 is an update to the Luma entry; Kling Turbo/O3 are variants of the existing Kling 3.0 entry; Seedance mini and Higgsfield Unlimited are access-channel additions, not new Big Eight entries). params.toml landscape counts unchanged: commercial 8 / open_source 10 / agentic 5.

June 14, 2026

Last updated date: Advanced from June 11 to June 14, 2026. lastmod set to 6/14 deliberately so the June 15 Weekly Roundup (R#8) leads as the headline article rather than tying with this Stack update (lead-by-one rule). Ships Sunday per the standard Stack-paired-with-Roundup cadence.
The Agentic Layer — Higgsfield Supercomputer: Added the early-June distribution sprint — five external surfaces in 12 days (Claude MCP May 28, Adobe plugins May 29, Figma June 4, Minecraft mod June 5, DaVinci Resolve plugin June 8) plus the Grok Imagine 1.5 integration into Higgsfield’s own platform — the clearest routing-bet expression yet (a competitor’s model inside Higgsfield’s surface).
Other Notable Open-Source — NVIDIA Cosmos 3: Updated the launch-time “claim not independently confirmed” to live-board confirmation — Cosmos3-Super leads open-weight image-to-video at Elo 1,251 (June 2026).
What’s Coming — two updates: Added Varya (Avataar.ai, June 11 — India IndiaAI-Mission open-weight Wan-2.2 distillation, vendor-reported ~$0.005/sec; watch entrant, specs unverified, not a tracked row). Rewrote the EU AI Act Article 50 item from a flat August cliff to the two-step deadline — disclosure August 2, machine-readable watermarking December 2 for in-market systems (May 2026 AI Omnibus grace period; surfaced by the R#8 Cowork exit-test).
Considered and excluded: Varya as a counted open-source row (specs Indian-outlet-only, AA-unbenchmarked, vendor page down — watch entrant until verified); the Cowork “HappyHorse weights flipped to published” claim (our render still showed “Coming Soon” — held out as unverified, HappyHorse open-weights item unchanged).
Paired update outside this file: params.toml ogImage → roundup-2026-06-15.png; [landscape] counts unchanged (commercial 8 / open_source 10 / agentic 5 — Varya is a watch entrant, not a counted row). Changelog trim: the May 25 block moves to the archive on the next pass to restore the four-most-recent display.
Process note: Stack deltas drafted blind of the Cowork comparison (read only after R#8 was built, exit-test discipline); the comparison’s one verified catch — the EU Art. 50(2) grace period — is reflected here and in the roundup’s What-to-Watch.

June 11, 2026

Page restructure (reference-card format): Every model and agent entry now leads with its spec card — the canonical location for volatile facts (pricing, resolution, arena positions, access) — followed by current-state analysis. Relative-time references removed page-wide (11 instances); duplicated stats reconciled to single canonical values (HappyHorse-1.0 T2V no-audio Elo confirmed at 1,293 against the live Artificial Analysis board). No facts removed: dated event history lives in this changelog.
Changelog split: This page now carries the four most recent update blocks; the full history lives in the changelog archive.
Navigation anchors: Stable anchors added to every model and agent entry; Quick Verdict bullets and Routing Framework answers now deep-link to entries.

Full changelog archive →