This is RCTV’s living reference to the AI video software stack — models, orchestration agents, and open-source generation. Updated as products launch, pricing changes, and capabilities evolve. Last updated: June 11, 2026.
Quick Verdict: Best AI Video Model by Use Case (2026)
Routing decisions based on what each model actually wins at — not benchmark Elo alone, but the practical question of which model handles a specific use case best. Scroll down for full model breakdowns and the detailed routing framework.
- Best for photorealism — Veo 3.1. 4K native, the cleanest photorealistic rendering available; free 10 clips/month via Google Vids on any Google account.
- Best for broadcast 4K and motion quality — Kling 3.0. Native 4K at 60fps with multi-cut storyboards; the only production-grade model meeting broadcast delivery standards without upscaling.
- Best for character consistency across shots — Seedance 2.0 Pro. Native multi-shot storytelling with frame-level character and scene control; available in the US via CapCut with real-face restrictions.
- Best for stylized and VFX work — Runway Gen-4 Turbo. The most mature professional ecosystem for non-photorealistic aesthetics; motion brushes, scene consistency tools, $315M Series C runway.
- Best for editing existing video — Aleph 2.0 in Runway Edit Studio. Multishot edit propagation up to 30 seconds at 1080p; edit one frame, model carries it across the sequence; available on all paid Runway plans.
- Best for free consumer distribution — Gemini Omni Flash. Free inside YouTube Shorts and YouTube Create; multimodal any-to-any input (image / audio / video / text → video); SynthID provenance by default.
- Best open-source for local generation — LTX-2.3. True 4K native on consumer GPUs (12GB+ VRAM); Apache 2.0 license; standalone desktop editor plus ComfyUI integration.
- Best benchmark quality with API access — HappyHorse-1.0. #1 on Artificial Analysis T2V no-audio (Elo 1,293); the order flips on the audio-included board, where Seedance leads. Commercial API via fal.ai and Alibaba Cloud Bailian; open weights still pending.
- Cheapest commercial API — Grok Imagine. $4.20 per minute generated video; the trade-off is a 720p ceiling, with the 1080p Pro tier overdue past Musk’s April commitment. Grok Imagine Video 1.5 (June 2026) adds native image-to-video at $0.08/sec — arena positions in the entry.
- Best multi-model orchestration agent — Pika Agents. Broadest model roster in the category (Kling, Veo, Seedance, MiniMax, Sora API, Pika Video); runs inside Slack, Telegram, Discord, Notion, Figma, and 12+ other surfaces; persistent memory across sessions. See The Agentic Layer.
Quick Reference: All Models at a Glance
| Model | Best For | Max Resolution | Free Tier | Paid From | API |
|---|---|---|---|---|---|
| Veo 3.1 (Google DeepMind) | Photorealism, widest free access | 4K | ✓ 10 clips/mo via Google Vids | $19.99/mo (AI Pro) | ✓ (also via Adobe Firefly) |
| Gemini Omni Flash (Google DeepMind) | Multimodal input, broadest free distribution, SynthID provenance by default | Not published at launch | ✓ YouTube Shorts + YouTube Create | $19.99/mo (Google AI Plus) | Coming weeks |
| Kling 3.0 / 3.0 Omni (Kuaishou) | Broadcast-ready 4K, 60fps, multi-shot storyboards | 4K native | ✓ | ~$8/mo | ✓ (also via Adobe Firefly) |
| Seedance 2.0 Pro (ByteDance) | Character consistency, multi-shot | 2K | Via CapCut (US, with restrictions) | Via CapCut / third-party | Via third-party |
| Luma Ray 3.14 (Luma AI) | Production volume, cost efficiency | 1080p native | ✓ | Available | ✓ |
| Runway Gen-4 Turbo / Aleph 2.0 | Stylized/VFX, real-time avatars, multishot edit propagation | 1080p | ✗ | $12/mo | ✓ (Gen-4.5 also via Adobe Firefly) |
| Pika 2.5 / Pika Agents (Pika Labs) | Budget creators; multi-model agent orchestration (Kling, Veo, Seedance, MiniMax, Sora) | 1080p | ✗ | $8/mo | ✓ |
| Grok Imagine / Grok Imagine 1.5 (xAI) | Speed, cheapest API; new native I2V (#2 arena) | 720p (Pro 1080p delayed); 1.5 native I2V | ✗ | X Premium / SuperGrok | ✓ $4.20/min; 1.5 at $0.08/sec |
| LTX-2.3 (Lightricks) | Local / private generation | 4K | ✓ Open source | Free (Apache 2.0) | ComfyUI |
| HappyHorse-1.0 (Alibaba) | #1 T2V no-audio; #3 I2V with-audio; commercial API live | 1080p (joint audio, 7-language lip-sync) | Weights pending; API live | $0.14/sec 720p · $0.28/sec 1080p | ✓ via fal.ai + Alibaba Cloud Bailian |
| Wan 2.7 (Alibaba) | Thinking Mode, 5 unified task types | 1080p | ✓ Open source | Free | ComfyUI / Model Studio |
| SkyReels V4 (Skywork AI) | Joint audio-video, open-source | 1080p | ✓ 70 credits/mo | Free (open source) | — |
Rankings and pricing change weekly. Scroll down for full model breakdowns.
The Big Eight: Commercial Models
These are the production-grade models dominating professional and creator workflows in 2026. The market has matured to the point where no single model leads across all dimensions — the professional standard is now multi-model routing, choosing the right tool for each specific shot.
Luma Ray 3.14 — Luma AI
Best for: Professional production volume, 1080p native output, cost-efficient multi-shot workflows
- Max resolution: 1080p native
- Key features: Ray3 Modify (hybrid performance/acting control), Luma Agents platform (enterprise creative automation)
- Speed: 4× faster generation than previous Ray model
- Pricing: 3× cheaper per-second than previous Ray
- Access: Luma AI subscription; free tier available
- API: Available; enterprise deployments via Luma Agents
Luma AI’s Ray 3.14 shipped in March 2026 as the model that stepped into the commercial tier vacated by Sora’s shutdown. (Weekly Roundup — March 27, 2026) Native 1080p output, generation 4× faster than the previous Ray 3 model, per-second pricing 3× cheaper. Ray3 Modify, a companion tool for hybrid performance and acting workflows, gives studios more control over scene continuity and character consistency across shots.
Luma is positioning Ray explicitly as professional infrastructure priced for production volume rather than a consumer app — a distinction that looks strategically deliberate given Sora’s failure. The company’s $900M Series C led by HUMAIN, a London office, and enterprise Luma Agents deployments at Publicis, Adidas, and Mazda all reinforce this direction. The Mazda relationship produced a concrete deliverable in April 2026: Boundless, a Johannesburg agency, used Luma Agents to deliver Mazda’s first AI-produced commercial in under two weeks — the most credible production-deployment signal for any AI video platform to date.
Kling 3.0 / Kling 3.0 Omni — Kuaishou
Best for: Feature density, broadcast-ready output, motion quality
- Max resolution: 4K native (60fps)
- Frame rate: Up to 60fps
- Audio: Native built-in audio in six languages
- Key feature: Multi-cut storyboard generation (up to 6 camera cuts, 15s); Omni adds shot/camera/character controls
- Access: Free tier available; paid plans from ~$8/mo; also via Adobe Firefly (Creative Cloud subscription)
- API: Available via Kuaishou and third-party platforms
The most capability-dense model available. Kling 3.0 is the first AI video model to meet broadcast delivery standards without upscaling, offering native 4K at 60fps. The storyboard feature generates up to six camera cuts in a single generation with visual consistency — a production-first capability no other model matches. The Kling 3.0 Omni variant adds finer-grained controls for shot duration, camera angle, and character movement across multi-shot sequences.
In April 2026, both Kling 3.0 and Kling 3.0 Omni joined Adobe Firefly’s multi-model video hub alongside Veo 3.1, Runway Gen-4.5, and 30+ other AI models — significantly broadening Kling’s distribution to Adobe Creative Cloud’s existing professional user base.
Kuaishou confirmed in May 2026 it is weighing a restructuring that would bring external financing into Kling AI, characterizing the plan as preliminary with no definitive agreements signed (Weekly Roundup — May 18, 2026). The figures around it are press attribution, not company-confirmed: a reported ~$2B pre-IPO round at a ~$20B target valuation, Tencent among potential investors, and Kling annualized revenue near $500M — roughly double its January run rate, the majority from overseas markets. Treated with appropriate caution, it is still the clearest commercial-scale signal in AI video to date, and the first pure-play comparable since Sora’s shutdown removed one.
Veo 3.1 — Google DeepMind
Best for: Photorealism, 4K native output, integrated workflows, broadest free access
- Max resolution: 4K native (Flow/Vertex AI); 1080p via Veo 3.1 Lite; 720p via Google Vids free tier
- Audio: Native synchronized audio
- Key features: Flow unified workspace; Google Vids integration (avatars, Lyria 3 music, YouTube export); Veo 3.1 Lite developer tier; voice-driven generation on Gemini-enabled Google TV
- Access: Free — 10 clips/month via Google Vids (any Google account); Google AI Pro ($19.99/mo) and Ultra for higher limits; Flow is free; also via Adobe Firefly multi-model hub (April 2026) and Gemini-enabled TCL Google TVs in the US (April 2026)
- API: Vertex AI ($12/min); Veo 3.1 Lite via Gemini API ($0.05/sec 720p, $0.08/sec 1080p); Veo 3.1 Fast pricing reduced April 7, 2026 (check Gemini API docs for current per-second rates)
- Milestone: 1.5 billion images and videos created by Flow users
Google’s model pushes photorealistic rendering to the point where trained observers struggle to identify generated footage in blind tests. It is the engine behind Google Flow (merged creative workspace with Whisk, ImageFX, and multi-clip sequencing) and Google Vids. Veo 3.1 has been freely available to any Google account holder since April 2026 (Weekly Roundup — April 4, 2026) — 10 generations per month, 8 seconds at 720p, from text prompts or uploaded images. Google AI Pro and Ultra subscribers get more: up to 1,000 Veo clips per month, Lyria 3 custom music generation (tracks up to 3 minutes), customizable AI avatars with scene placement and wardrobe control, and direct YouTube export. This is the first time a production-grade AI video model has been made freely accessible to Google’s full account base.
Veo arrived on Gemini-enabled TCL Google TVs in the US in April 2026 (Weekly Roundup — May 11, 2026) — voice-driven generation through Gemini’s Create tab, either from scratch or by animating still images. TCL-only and US-only at launch, with no public timeline for other manufacturers or markets.
On the developer side, Google launched Veo 3.1 Lite in March 2026 via the Gemini API (Weekly Roundup — April 4, 2026) and Google AI Studio — priced at $0.05/sec for 720p and $0.08/sec for 1080p, less than half the cost of the existing Veo 3.1 Fast tier. Veo 3.1 Fast received a further price reduction on April 7, 2026 — compressing the full developer stack from the free consumer tier through production-grade API calls. Check the Gemini API pricing documentation for current per-second rates.
Gemini Omni Flash — Google DeepMind
Best for: Multimodal input across image / audio / video / text → video output; broadest free consumer distribution via YouTube Shorts; SynthID provenance baked in by default
- Max resolution: Not published at launch
- Max duration: Not published at launch
- Audio: Native multimodal output (video grounded in audio / image / video / text input)
- Watermark: SynthID embedded in every output by default; verifiable through Gemini app, Chrome, and Search
- Key feature: Any-to-any multimodal generation; SynthID provenance by default; free distribution via YouTube Shorts
- Access: Free via YouTube Shorts and YouTube Create; paid via Google AI Plus ($19.99/mo), Pro, or Ultra in the Gemini app and Google Flow
- API: Coming weeks (developer and enterprise; pricing unannounced)
- Tier ladder: Omni Flash shipped May 19, 2026; Omni Pro on “soon” with no date
- Next: Omni Pro shipping window; first independent benchmarks against Veo 3.1, Kling 3.0, and HappyHorse-1.0; API rollout
Google’s any-to-any multimodal video model launched May 19, 2026 at Google I/O (Gemini Omni launch blog, Weekly Roundup — May 25, 2026). Omni Flash is what shipped that day: paid-gated to Google AI Plus, Pro, and Ultra subscribers inside the Gemini app and Google Flow, with every output watermarked via SynthID by default. Distribution rolled free into YouTube Shorts and the YouTube Create app — putting a conversational video generator inside a Google surface with over two billion logged-in users. Developer and enterprise APIs are on “coming weeks.” A heavier Omni Pro tier was named at the keynote with no shipping date attached.
The launch keynote framed Omni as a “world model” with physics-aware reasoning, but the published evidence at launch is thin — no benchmarks, no architecture paper, no resolution or duration specs (RCTV flagship analysis →). What is verifiable is the distribution side: this is the largest consumer-AI-video distribution move since Sora launched on Plus a year and a half ago. The world-model framing waits on a benchmark.
Seedance 2.0 Pro — ByteDance
Best for: Character consistency, cinematic motion, multi-shot storytelling
- Max resolution: 2K
- Audio: Native audio with lip-sync
- Key feature: Multi-shot storytelling, quad-modal input, frame-level precision
- Access: China via Jimeng/Dreamina; Africa, South America, Middle East, SE Asia, and US via CapCut/Dreamina Seedance 2.0; global via BigMotion ($35–$95/mo), LumeFlow AI, other third-party platforms
- API: Official global API paused; available via third-party integrations (fal.ai, others)
- US restrictions: Real-face image-to-video disabled; unauthorized IP generation blocked; invisible watermarks on all output
- Note: Benchmark position (June 2026): #1 on the Artificial Analysis T2V audio-included board (Elo 1,215) and #1 on I2V with-audio (1,194); #2 on T2V no-audio (Elo 1,274) behind HappyHorse-1.0. Copyright legislative battle remains a three-way standoff (Blackburn vs. White House vs. CLEAR Act)
The leading commercial model for character consistency and cinematic motion quality. Seedance 2.0 Pro’s Dual-Branch Diffusion Transformer generates audio and video simultaneously in a single pass. Its quad-modal input system accepts text, images, video, and audio in a single prompt. Multi-shot native storytelling and frame-level control over character appearance, object placement, and scene timing remain best-in-class for narrative work.
ByteDance’s official global API rollout was paused indefinitely in late February 2026 after the Motion Picture Association and major studios (Disney, Netflix, Paramount, Sony, Warner Bros.) issued cease-and-desist letters over copyright concerns. The “Face-to-Voice” feature was suspended in February 2026 after it was shown to clone voices from a single photo. Japan opened a separate inquiry over unauthorized anime character reproductions.
ByteDance relaunched the model in March 2026 as Dreamina Seedance 2.0 (Weekly Roundup — March 27, 2026) across markets in Africa, South America, the Middle East, and Southeast Asia. As of April 2026, Dreamina Seedance 2.0 is available in the US via CapCut (Weekly Roundup — April 11, 2026) — a significant reversal of the prior exclusion. The deployment comes with content restrictions: image-to-video generation from inputs containing real faces is disabled, and generation of unauthorized intellectual property is blocked. All output carries an invisible watermark for off-platform identification.
The copyright landscape around Seedance is a three-way Washington standoff: the White House’s National Policy Framework for AI (March 2026) stated that AI training on copyrighted works does not constitute infringement — the opposite of the Blackburn bill’s position. Separately, the bipartisan CLEAR Act (Schiff/Curtis) would require public disclosure of training data without resolving the fair use question either way.
Grok Imagine — xAI
Best for: Speed, low-cost API, rapid iteration, social media distribution
- Max resolution: 720p (Pro 1080p tier slipped past its late-April commitment; no new public timeline); Grok Imagine Video 1.5 adds native I2V (June 2026)
- Arena: #5 on Artificial Analysis T2V no-audio (Elo 1,235 — effectively tied with Kling 3.0 Omni at #4); Grok Imagine Video 1.5 #2 on the I2V with-audio arena (Elo 1,110, June 2026)
- Max duration: 30 seconds (via chained extensions)
- Audio: Synchronized audio
- Key feature: Video extension from frame; dual generation modes (Quality + Speed); native video understanding (Grok 4.3 Beta); fastest iteration cycle in the industry
- Access: X Premium / SuperGrok subscription required
- API: Available ($4.20/min generated video — cheapest major model)
- Engine: Aurora autoregressive MoE model on 110,000 NVIDIA GB200 GPUs
- Next: Grok Imagine Pro (1080p) overdue past Musk’s late-April commitment; track xAI release notes for the actual ship date
- Caution: Faced regulatory scrutiny over content moderation (UK ICO, France, California AG); image editing now restricted to paid subscribers
xAI shipped four major updates between January and March 2026: API launch (January 28), Grok Imagine 1.0 with 720p video and audio (February 3), Grok 4.20 (February 17), and video extension (March 2). The “Extend from Frame” feature lets users chain clips by continuing from the final frame, enabling sequences up to 30 seconds while preserving lighting, motion, and character positioning.
In April 2026, xAI released Grok 4.3 Beta with native video understanding — letting Grok analyze video as a coherent temporal sequence rather than as isolated frames. The understanding capability is distinct from Grok Imagine’s generation pipeline, but the two now stack: Grok can both generate and reason about video within the same model family. No other major lab currently offers vertical integration of native generation, native understanding, and platform-scale distribution under a single subscription.
Grok Imagine’s API pricing dramatically undercuts the field. The trade-off is a 720p resolution ceiling — every other major model offers 1080p or higher. The 1080p Grok Imagine Pro tier telegraphed by Elon Musk for late April 2026 missed its April window and has no new public timeline from xAI. Each week it slips, the per-minute price advantage matters less and the resolution gap matters more: Veo 3.1 Lite ships 1080p at $0.08/sec, Kling 3.0 ships native 4K at the $8/month tier, and even open-source LTX-2.3 outputs true 4K.
On June 3, 2026, xAI shipped Grok Imagine Video 1.5 as a public API (Weekly Roundup — June 8, 2026) — the company’s first native image-to-video model. Per xAI’s docs: image-to-video with audio output, $0.08 per second, a 60-requests-per-minute cap, model alias grok-imagine-video-1.5-2026-05-30. It opened at #2 on the Artificial Analysis I2V with-audio arena (Elo 1,110), behind Seedance 2.0 (1,194) and ahead of HappyHorse-1.0 (1,094) — a strong debut, though xAI’s launch claim of a #1 finish isn’t borne out by the live board. Secondary reports describe a higher-resolution tier; the docs list a single rate, so treat the tiered-pricing figures circulating elsewhere as unconfirmed by xAI. The launch puts the two subjects of RCTV’s Grok vs. Seedance comparison at one and two on the I2V arena.
Runway Gen-4 Turbo / Aleph 2.0 — Runway
Best for: Stylized content, VFX aesthetics, professional ecosystem, real-time avatars, agentic production, multishot edit propagation
- Max resolution: 1080p (Gen-4 Turbo); 1080p multishot up to 30s (Aleph 2.0 in Edit Studio); 720p real-time (Characters)
- Audio: Supported
- Key features: Motion brushes, style control, API maturity (Gen-4 Turbo); Characters real-time avatar API (GWM-1); Runway Agent (conversational end-to-end production, May 13, 2026); Aleph 2.0 multishot edit propagation in Edit Studio (May 21, 2026)
- Access: From $12/mo (runwayml.com); Gen-4.5 also via Adobe Firefly (Creative Cloud subscription); Runway Agent and Edit Studio (Aleph 2.0) at runwayml.com on all paid plans
- API: Most mature video generation API available; Characters API at dev.runwayml.com
- Next: Standalone Gen-4.5 launch on runwayml.com still pending; real-time video model research preview on Vera Rubin hardware (sub-100ms TTF); independent hands-on for Aleph 2.0 against Veo 3.1 and Gemini Omni editing capabilities
- Note: Characters is an enterprise API product built on GWM-1, separate from the Gen-4 Turbo generation pipeline
Runway leads in non-photorealistic and stylized video — VFX-oriented aesthetics, abstract content, and artistic directions where other models default to photorealism. Gen-4 Turbo has the most mature professional ecosystem with motion brushes, scene consistency tools, and a robust API. Runway closed a $315M Series C in February 2026 at a $5.3B valuation.
In March 2026, Runway launched Characters — a real-time video agent API built on its GWM-1 world model (Weekly Roundup — March 13, 2026). Characters generates fully conversational AI avatars from a single reference image with no fine-tuning required. The avatars sustain realistic lip-sync, facial expressions, eye contact, and gesture across extended multi-minute conversations, running at 24fps at 720p in real time. BBC and Silverside are early enterprise partners.
At NVIDIA GTC in March 2026, Runway demoed a research preview of a new real-time video generation model (Weekly Roundup — March 20, 2026) running on NVIDIA Vera Rubin hardware — achieving time-to-first-frame under 100ms for HD video. Gen-4.5 became accessible via Adobe Firefly’s multi-model video hub in April 2026 (Weekly Roundup — April 27, 2026) — Runway’s first major distribution beyond its own platform.
In May 2026, Runway launched Runway Agent (Weekly Roundup — May 18, 2026) — a conversational creative partner that runs ideation, generation, sound design, and editing end-to-end from a chat interface. The same month, Runway opened a Tokyo office and committed $40M to Japan, its third-largest market, with its enterprise base tripled in twelve months.
In May 2026, Runway released Aleph 2.0 (Weekly Roundup — May 25, 2026) — an upgraded video editing model that propagates a single-frame edit across the rest of a clip while preserving everything else. Multishot sequences up to 30 seconds at 1080p, edited across cuts in one pass instead of shot-by-shot. Available on all paid Runway plans on the desktop web app.
Pika 2.5 / Pika Agents — Pika Labs
Best for: Budget-conscious creators, rapid iteration, social media content; multi-model agent orchestration via Pika Agents
- Max resolution: 1080p (Pika 2.5); 480p real-time (PikaStream 1.0)
- Max duration: 42 seconds (clip); persistent for live (PikaStream)
- Audio: Supported (Pika Video native; ElevenLabs / MiniMax / OpenAI Whisper via Pika Agents)
- Key feature: Pikaswaps, Pikaffects, fast batch generation (Pika 2.5); PikaStream 1.0 for live agent video; Pika Agents for multi-model orchestration over Kling, Veo, Seedance, MiniMax, and Sora
- Access: From $8/mo (lowest entry price among major models); Pika Agents available at pika.me and across 17+ platform surfaces
- API: Available
The most accessible entry point to AI video generation. Pika’s strength is speed and volume — generate 20-30 variations of a concept in minutes, then refine. Features like Pikaswaps (face/object replacement) and Pikaffects (style transfer) add creative flexibility at a price point that undercuts every competitor.
In April 2026, Pika launched PikaStream 1.0 — a real-time AI video engine for live agent meetings at 24fps/480p with ~1.5s speech-to-video latency and persistent identity across calls.
In late April 2026, Pika reintroduced its product line as Pika Agents — a multi-modal AI creative partner that orchestrates other companies’ video models from a conversational interface. The video roster includes Pika’s own model alongside ByteDance’s Seedance 2.0, Kuaishou’s Kling, MiniMax, Google’s Veo 3, and OpenAI’s Sora. On audio: ElevenLabs, MiniMax Music and Voice, OpenAI Whisper. On images: Gemini, ChatGPT Images 2, SeedDream. The agents run inside Slack, Telegram, WhatsApp, Discord, X, Notion, GitHub, Figma, and a dozen other surfaces with persistent memory and personality across sessions.
Sora 2 — OpenAI
Status: Discontinued March 24, 2026; consumer app shutdown executed April 26, 2026. RCTV analysis →
OpenAI announced Sora’s discontinuation on March 24, 2026 — the app, the API, and the Disney licensing deal announced with it in December 2025. The stated reason was compute reallocation toward “world simulation for robotics.” The numbers tell the fuller story: estimated $15M/day peak inference cost against $2.1M in total lifetime in-app revenue, and a 66% download decline from its November 2025 peak to February 2026. Sora is removed from active tracking. See Weekly Roundup — March 27, 2026 for the full breakdown.
Shutdown timeline: The Sora consumer app and web interface went dark on April 26, 2026 (Weekly Roundup — April 27, 2026) — the export window closed at that time. The Sora API remains accessible through September 24, 2026, giving developers time to migrate integrations before the model line fully retires.
The Agentic Layer: Orchestration on Top of the Models
The model is no longer the product. The five agents in this section don’t generate video — they decide which model generates video, and on what schedule, across what surfaces. Think of the commercial models above as engines: Veo 3.1, Kling 3.0, Seedance 2.0. The agents are steering wheels. The operator-relevant question has shifted up a layer: not “which model is best?” but “which agent puts the right model on the right shot, in the right workflow, with the right context carried forward?” That question didn’t exist twelve months ago. Five companies have already shipped an answer.
| Agent | Vendor | Orchestrates | Single/Multi-model | Surfaces | What it automates | Access |
|---|---|---|---|---|---|---|
| Luma Agents | Luma AI | Ray 3.14, Veo 3, Nano Banana Pro, Seedream, Seedance 2.0, ElevenLabs | Multi | Luma platform, enterprise | Multi-shot composition; sustained character and style across shots; full campaign packages | Enterprise; Luma platform subscription |
| Pika Agents | Pika Labs | Pika Video, Kling, Veo 3, Seedance 2.0, MiniMax, Sora API; audio via ElevenLabs, MiniMax, Whisper; image via Gemini, ChatGPT Images 2, SeedDream | Multi | Slack, Telegram, WhatsApp, Discord, X, Notion, GitHub, Figma + 17 other surfaces | Conversational prompting; cross-model orchestration; persistent memory and personality across sessions | Pika subscription tiers (pika.me) |
| Runway Agent | Runway | Runway Gen-4 Turbo, Aleph 2.0, Edit Studio | Single-vendor | Runway web app (app.runwayml.com) | Concept ideation, multi-shot generation, sound design, editing — end to end from a single conversation | All paid Runway plans ($12/mo+) |
| Higgsfield Supercomputer | Higgsfield AI | Higgsfield video stack + Claude Opus 4.8, GPT-5.5 Pro, Gemini 3.1 Pro (Claude backbone upgraded to Opus 4.8, May 29) | Multi | Browser, Telegram + 30 integrations (Slack, Google Drive, Notion, Figma, Gmail) | Marketing, production, and creative-direction workflows; research-to-document conversion; scheduled content tasks | Higgsfield subscriber plans (higgsfield.ai) |
| Adobe Firefly AI Assistant | Adobe | Photoshop, Premiere Pro, Lightroom, Illustrator, Express, Firefly — full Creative Cloud stack | Single-vendor | Standalone Firefly web app + embedded in each Creative Cloud app; Premiere Pro with project-metadata access | Multi-step CC workflows via Creative Skills; multi-app handoff; format conversion; Frame.io feedback integration | Creative Cloud subscription; public beta since April 27, 2026 |
Luma Agents
Luma AI shipped the first production-grade conversational agent in this category in March 2026. By March 10, deployments were live at Publicis Groupe and Serviceplan Group — no beta waiting period. The agent works from a Luma Uni-1 reasoning layer that plans and coordinates across video, image, audio, and text before generating anything. It calls Luma’s own Ray 3.14 for video, Google’s Veo 3 for photorealistic shots, Nano Banana Pro, ByteDance’s Seedream, and ElevenLabs for voice. Seedance 2.0 joined the roster in May 2026. The Mazda commercial deliverable in April 2026 — Johannesburg agency Boundless produced Mazda’s first AI-generated commercial using Luma Agents in under two weeks — is the most credible production-deployment signal any AI video agent has produced.
The critical technical feature: persistent context across the full asset suite. Luma Agents remembers what was generated in earlier steps and can revise upstream elements when downstream evaluation surfaces a problem. That is what makes it an orchestrator rather than a fancy prompt box.
Pika Agents
Pika’s April 28, 2026 launch was the moment the agentic-orchestration pattern became industry news rather than one lab’s experiment. Pika Agents orchestrates a broader model roster than any competitor — Kling, Veo 3, Seedance 2.0, MiniMax, Sora’s API, and Pika’s own model on the video side; ElevenLabs, MiniMax, and Whisper on audio — all from a conversational interface that runs inside 17 surfaces where creators already work. RCTV covered this as the R#2 lede because the framing was explicit: “the prompt is no longer the product.” That line has since become the editorial spine for this entire category.
PikaStream 1.0 (April 2, 2026): the real-time video engine that runs inside Pika Agents as its live-avatar capability — 24fps at 480p, ~1.5-second speech-to-video latency, persistent identity across calls. It is a streaming runtime, not a standalone orchestrator; the agent layer above it is Pika Agents.
Runway Agent
Runway shipped its agent in May 2026 — single-vendor, full Runway stack. The positioning is end-to-end: Runway Agent handles concept ideation, generation via Gen-4 Turbo or Aleph 2.0, sound design, and editing inside the same conversation thread. Runway’s “single-vendor” constraint is a deliberate product posture, not a technical limitation — they own the generation and editing stack, so the agent never needs to leave it. Whether that narrows or focuses the use case depends on whether the operator’s workflow already lives in Runway. Available on all paid plans, starting at $12/month.
Higgsfield Supercomputer
Higgsfield’s Supercomputer, launched in mid-May 2026, is the most enterprise-positioned agent in the set. “Supercomputer” is the framing: orchestrate Claude, GPT-5.5, and Gemini models alongside Higgsfield’s video stack to plan and execute full content campaigns end to end. On May 19, 2026 — the same day as Google I/O — Higgsfield updated the orchestration layer to Gemini, describing the swap as “8× cheaper, 3× faster.” On May 29, Higgsfield upgraded its Claude backbone to Opus 4.8, and on May 30 shipped Higgsfield Reframe — an MCP-native aspect-ratio reframing tool available inside Claude. The multi-model roster is the broadest LLM coverage in this category. Distribution extends via browser, Telegram, and 30+ third-party integrations. In a 12-day sprint through early June 2026 (Weekly Roundup — June 15, 2026), Higgsfield added five external surfaces — Claude MCP (May 28), Adobe Premiere/After Effects plugins (May 29), Figma (June 4), a Minecraft mod (June 5), and a DaVinci Resolve plugin (June 8) — and folded Grok Imagine 1.5 into its own platform, the clearest expression yet of the routing bet: a competitor’s model generating video inside Higgsfield’s surface. Higgsfield’s Supercomputer page carries the full capability description.
Adobe Firefly AI Assistant
Adobe is the incumbent here, and the only one that arrived via acquisition of creative-workflow context rather than from a blank slate. Firefly AI Assistant — previewed as Project Moonlight at MAX, public beta April 27, 2026 — orchestrates the Creative Cloud stack conversationally: Premiere Pro with full project metadata access, Photoshop, Lightroom, Illustrator, Express. Creative Skills are the execution layer — predefined multi-step workflows that fire from a single natural-language instruction. The operative claim is multi-app handoff without context loss; the operator verdict is still forming in the public beta.
The AI-native vs. incumbent frame is worth naming: Luma, Pika, Runway, and Higgsfield are all building agents on top of AI video first. Adobe is extending an agent over a suite it already controls. The operator question these two approaches answer is different — and which model wins the relationship depends on whether the operator’s workflow is already Premiere-centric or starting from scratch.
What to watch
Three axes will decide how this layer develops. First: single-vendor vs. multi-vendor convergence. Runway’s Runway-only posture contrasts with Pika’s eight-model roster; whether Runway opens to third-party models is the specific question. Second: AI-native startup agents vs. legacy-incumbent agents. Adobe’s Creative Skills framework is embedded in the world’s most-used professional NLE; Pika Agents runs in Slack. Neither is clearly winning the operator relationship yet. Third: agent-to-agent interoperability. None of the five currently calls another vendor’s agent — they call models. The day Pika Agents calls Runway Agent, the competitive dynamics of this category change entirely.
Open-Source & Local Generation
The open-source AI video ecosystem has matured significantly, making local generation on consumer hardware a viable option for privacy-conscious creators and developers.
LTX-2.3 — Lightricks
Best for: Local/desktop generation, consumer GPU workflows, high-frame-rate output
- Max resolution: 4K native (true 4K, not upscaled)
- Max duration: 20 seconds
- Frame rate: Up to 50fps (24/48fps options also available)
- Audio: Native synchronized audio (improved HiFi-GAN vocoder)
- Portrait mode: Yes (9:16, up to 1080×1920)
- Hardware: Runs on GPUs with 12GB+ VRAM; optimized for RTX 50 Series (2.5× faster via NVFP4)
- Integration: ComfyUI native; standalone desktop video editor (shipped March 2026)
- License: Apache 2.0 (free for companies under $10M revenue; commercial license required above that threshold)
A comprehensive rebuild released in March 2026 (Weekly Roundup — March 20, 2026): a new VAE for sharper detail, a 4× larger text connector for better prompt understanding, and an improved HiFi-GAN vocoder for cleaner native audio. The model ships alongside a dedicated desktop video editor, making the entire local pipeline accessible without a ComfyUI node graph.
Key capabilities: native portrait mode (9:16 up to 1080×1920), last-frame interpolation for seamless clip chaining, and 24/48fps output options. At GDC 2026, NVIDIA announced 2.5× performance gains on RTX 50 Series via NVFP4 quantization, 60% lower VRAM usage, and RTX Video Super Resolution for ComfyUI delivering 4K upscaling 30× faster than competing local alternatives. The ComfyUI App View strips the node-graph interface into a simplified prompt-in/video-out UI for non-technical users.
Wan 2.7 — Alibaba (Tongyi Lab)
Best for: Multi-task video generation with Thinking Mode, open-source flexibility
- Max resolution: 1080p
- Max duration: 2–15 seconds
- Task types: T2V, I2V, video continuation, reference-to-video (up to 5 persons), video editing
- Key feature: Thinking Mode (chain-of-thought reasoning before generation)
- Integration: ComfyUI 0.18.5+, Alibaba Cloud Model Studio, wan.video
- License: Open source
Alibaba’s Wan 2.7, released April 3, 2026 (Weekly Roundup — April 17, 2026), is a major upgrade from the 2.2 line. The headline feature is Thinking Mode — a chain-of-thought reasoning approach where the model analyzes the prompt, plans composition, then generates. This produces noticeably more coherent output with fewer artifacts than single-pass generation.
Wan 2.7 Video unifies five task types in a single model: text-to-video, image-to-video (first-frame, first-and-last-frame, audio-driven), video continuation with text guidance, reference-to-video with up to five real-person inputs, and video editing via text, reference images, or style transfer. ComfyUI added support the same day in version 0.18.5 with workflow templates for all five task types.
HappyHorse-1.0 — Alibaba ATH AI Innovation Unit
Best for: Top-ranked benchmark quality (T2V + I2V); commercial API access with joint audio-video and seven-language native lip-sync
- Max resolution: 1080p
- Audio: Joint audio-video generation in a single forward pass; native synced output
- Lip-sync languages: 7 — English, Mandarin, Cantonese, Japanese, Korean, German, French
- Architecture: 15B-parameter unified 40-layer self-attention Transformer
- Inference speed: ~38 seconds for 1080p on a single NVIDIA H100
- Benchmark position: #1 on Artificial Analysis T2V no-audio (Elo 1,293; Seedance 2.0 Pro second at 1,274, as of June 2026 — Elo scores update continuously). On the audio-included T2V board the order flips: Seedance 2.0 first (1,215), HappyHorse #2 (1,122) — a 92-point gap that reopened after a June arena recalibration. On I2V with-audio it now sits #3 (Elo 1,094), behind Seedance 2.0 and Grok Imagine 1.5
- Access: API live via fal.ai ($0.14/sec 720p, $0.28/sec 1080p) and Alibaba Cloud Bailian (enterprise from April 27, 2026); open weights still pending despite ATH’s marketing claim
- API: ✓ via fal.ai (4 endpoints) and Alibaba Cloud Bailian (enterprise tier)
HappyHorse-1.0 debuted anonymously on Artificial Analysis on April 7, 2026 (Weekly Roundup — April 11, 2026), immediately ranked #1 in both text-to-video and image-to-video blind testing, surpassing Seedance 2.0. Alibaba revealed its ATH AI Innovation Unit ownership on April 10. The 15-billion-parameter model uses a unified 40-layer self-attention Transformer that generates audio and video jointly in a single forward pass — no cross-attention modules, no separate audio post-processing.
In April 2026, fal launched HappyHorse-1.0 as official API partner with four endpoints at $0.14 per second for 720p output and $0.28 per second for 1080p — pay-per-second, no minimums. Alibaba Cloud Bailian opened enterprise-grade access the same day.
The open-weights story is messier. ATH’s happyhorse.me/open-source landing page describes HappyHorse-1.0 as “fully open-sourced,” but independent verification finds a public GitHub repo with no model weights, no inference code, and no license file; the Hugging Face profile remains “coming soon.” Alibaba has effectively separated commercial API access (live) from open-weight distribution (still unscheduled). Until weights ship, treat HappyHorse-1.0 as a commercial model with an open-source promise — the API is the actual access surface.
Other Notable Open-Source Models
- SkyReels V4 (Skywork AI) — Released April 3, 2026 (Weekly Roundup — April 17, 2026). First open-source model to co-generate video and synchronized audio in a single forward pass. Dual-stream Multimodal Diffusion Transformer (MMDiT) architecture; 1080p at 32 FPS, clips up to 15 seconds. Accepts text, images, video clips, masks, and audio references. Ranked among the top models on Artificial Analysis T2V with audio leaderboard (Elo ~1,135). Free tier: 70 monthly credits on skyreels.dev; open-source weights available for local deployment
- Mochi 1 — High-fidelity short video with strong prompt alignment
- HunyuanVideo / HY-World 2.0 (Tencent) — HunyuanVideo offers solid image-to-video with coherent motion. In April 2026, Tencent’s Hunyuan team released HY-World 2.0 — a multi-modal world model that generates editable 3D scenes (meshes plus Gaussian Splattings) from text prompts or single reference images, with WorldMirror 2.0 inference code and weights open-sourced (github.com/Tencent-Hunyuan/HY-World-2.0). The combination of editable 3D geometry and open weights makes HY-World 2.0 the more pipeline-friendly counterpart to Alibaba’s still-gated Happy Oyster
- Happy Oyster (Alibaba ATH) — Released April 16, 2026 (Weekly Roundup — April 17, 2026). World model that generates interactive, physics-aware 3D environments from text prompts; targets gaming, film, and VR. Directing and Wandering modes are designed for real-time exploration but don’t expose the underlying 3D representation in a standards-friendly way (unlike Tencent’s HY-World 2.0 above). Live demo accessible via Artificial Analysis arena; weights gated
- MAGI-1 — Long-form video synthesis capabilities
- Helios (Peking University / ByteDance / Canva) — 14B autoregressive diffusion model; 19.5fps real-time generation on a single NVIDIA H100; capable of minute-scale video; Apache 2.0 license. Released March 2026. Notable for real-time throughput on a single accelerator
- NVIDIA Cosmos 3 (NVIDIA) — Released June 1, 2026 at Computex (Weekly Roundup — June 8, 2026). An open-weights omnimodel for physical AI that generates text, images, video, ambient sound, and actions in a unified architecture; shipped as Cosmos 3 Super and Cosmos 3 Nano on Hugging Face under the OpenMDW-1.1 license, corroborated by a 291-author arXiv paper. NVIDIA claimed top open-source rank on Artificial Analysis for text-to-image and image-to-video at launch; as of June 2026 the live board confirms Cosmos3-Super leading open-weight image-to-video at Elo 1,251 (Weekly Roundup — June 15, 2026) — the claim now borne out on the I2V side. Positioned as physical-AI / world-model infrastructure rather than a creator generation endpoint, but the open weights and unified video generation make it a tracked open-source entrant
How to Choose: A Routing Framework
The right model depends on the shot, not the project. Here’s a practical decision framework:
Which AI video model is best for broadcast-ready 4K? Kling 3.0 or Veo 3.1. Kling hits 4K at 60fps with multi-cut storyboards. Veo 3.1 leads on photorealism.
Which AI video model wins on benchmark quality with commercial API access? HappyHorse-1.0 via fal.ai ($0.14/sec 720p, $0.28/sec 1080p) or Alibaba Cloud Bailian — #1 on Artificial Analysis T2V and I2V no-audio leaderboards; joint audio-video; seven-language native lip-sync.
What’s the best free AI video model to start with? Veo 3.1 via Google Vids (10 free clips/month, any Google account).
Which AI video model is free inside an app you already use? Gemini Omni Flash via YouTube Shorts and YouTube Create (available since May 2026, any Google account).
Which AI video model accepts multimodal input (image + audio + video → video)? Gemini Omni.
Which AI video model wins character consistency across shots? Seedance 2.0 Pro via CapCut (US available since April 2026, with real-face restrictions) or Luma Ray 3.14.
Which AI video model is best for stylized and VFX work? Runway Gen-4 Turbo.
Which AI video model propagates a single-frame edit across a 30-second multishot? Runway Aleph 2.0 in Edit Studio.
Which AI video model handles professional production volume at scale? Luma Ray 3.14 (4× faster, 3× cheaper than previous Ray).
What’s the best low-cost AI video model for volume work? Pika 2.5.
What’s the cheapest AI video API in 2026? Grok Imagine ($4.20/min generated).
Which AI video model is best for local generation and privacy? LTX-2.3 via ComfyUI or desktop editor.
Which AI video API is best for real-time interactive avatars? Runway Characters (GWM-1).
What’s the best real-time AI video for live agent meetings? PikaStream 1.0 (24fps/480p, ~1.5s latency).
Which AI video model wins multi-shot narrative? Seedance 2.0 Pro via CapCut (US, with restrictions), Luma Ray 3.14, or Kling 3.0 Omni.
Which AI video models work inside Adobe Creative Cloud? Adobe Firefly multi-model hub (Veo 3.1, Kling 3.0/Omni, Runway Gen-4.5, Luma, plus 30+ others).
Which AI video orchestration agent runs Kling + Veo + Seedance + MiniMax from one chat? Pika Agents (April 28, 2026; Slack/Telegram/Discord/X/Notion/Figma, persistent memory) or Higgsfield Supercomputer (mid-May 2026; orchestrates Seedance 2.0, Gemini, GPT-5.5 on web and Telegram). Runway Agent (May 13, 2026) covers the single-vendor end-to-end case. Luma Agents for enterprise campaign production (Publicis, Adidas, Mazda deployments). Adobe Firefly AI Assistant for teams already in Creative Cloud (public beta, CC subscription). See The Agentic Layer for the full comparison table.
Which AI model is best for editable 3D world generation? Tencent HY-World 2.0 (open weights) or Alibaba Happy Oyster (gated early access).
Which AI video model has the largest built-in distribution? Grok Imagine (500M+ X users).
Most professional workflows use 2-3 models per project, routing different shots to different engines based on the specific requirements of each scene.
What’s Coming
- Reactor — real-time world models as infrastructure — Reactor emerged from stealth in May 2026 with $59M (Lightspeed led; Jeffrey Katzenberg’s WndrCo, Amplify Partners, Sky9 Capital, FPV Ventures also in). Co-founders are former Apple Vision Pro technical leads. The pitch: a unified SDK and API that makes real-time world models available to developers “with a few lines of code,” targeting media and entertainment, physical AI, and robotics. This is a different category from the commercial models and agents tracked above — it is infrastructure for interactive AI worlds, not a generation endpoint. No Stack row warranted yet; worth watching as the definition of “AI video” expands toward interactive, real-time, and physics-driven output.
- MCP as a distribution layer for AI video — In May 2026, Runway launched Runway MCP, connecting its model roster (Gen-4.5, Seedance 2.0, GPT Images 2.0, Kling) to Claude, ChatGPT, Cursor, Replit, and any MCP-compatible client. The following day, Pika and Higgsfield both shipped their own MCP skills — Pika’s Founder Starter Kit (four Claude skills: Build-a-Brand, App Screens, Product Sizzle, Founder Video) and Higgsfield Supercomputer as a Claude skill. Three Tier-1 AI video labs shipped MCP integrations inside 24 hours; Higgsfield also launched five Adobe Premiere Pro and After Effects plugins in the same window. MCP is becoming the standard distribution channel for AI video capability into developer and coding-agent workflows — a second distribution layer running alongside the model APIs.
- Dreamina Octo — ByteDance’s “next chapter” beyond Seedance 2.0. Revealed at AI on the Lot (Culver City, May 27, 2026) under the framing “From Generation to Emergence” and “when the prompt isn’t the point.” Early access survey live; product confirmed as “arriving soon,” not yet shipped. If Octo ships as a conversational orchestrator rather than a generation model, it belongs in the agentic section; if it ships as a new Seedance-tier model, it belongs in the Big Eight. Watch @dreamina_ai for the actual launch.
- Gemini Omni Pro shipping window — Google announced Omni Pro at I/O 2026 alongside Omni Flash with no shipping date attached. “Soon” is the operative word. Independent benchmarks for Omni Flash against Veo 3.1, Kling 3.0, and HappyHorse-1.0 are the test that converts the launch keynote’s world-model framing from rhetoric to evidence — or not (RCTV flagship analysis →)
- Agentic-orchestration layer — Five conversational agents now sit on top of the model layer: Luma Agents (March 5, 2026), Pika Agents (April 28, 2026), Runway Agent (May 13, 2026), Higgsfield Supercomputer (mid-May 2026), and Adobe Firefly AI Assistant (public beta April 27, 2026). See The Agentic Layer for the full comparison
- Grok Imagine Pro (1080p) — Slipped past Musk’s late-April commitment with no new public timeline as of June 2026. Until it ships, Grok’s per-minute price advantage matters less and the 720p resolution gap matters more. Track xAI release notes for the actual ship date
- HappyHorse-1.0 open-source weights — Now roughly six weeks past the April 27, 2026 commercial API launch on fal, and the weights still haven’t shipped: the public GitHub repo remains empty (no weights, no inference code, no license file) and the Hugging Face profile still reads “coming soon.” Artificial Analysis now lists HappyHorse as an API-only product. ATH’s happyhorse.me/open-source “fully open-sourced” claim hasn’t softened, hasn’t acknowledged the gap, and hasn’t put a date on the artifact. Independent verification by WaveSpeedAI remains the canonical source on the marketing-vs-artifact gap
- TAKE IT DOWN Act enforcement first action — The FTC opened formal enforcement after the May 19, 2026 deadline (FTC press release, business-guidance framework). The agency stood up TakeItDown.ftc.gov as the victim-facing intake surface and on the same day sent a second-wave warning letter to twelve “nudify” tool sites — on top of the fifteen platforms named May 13, 2026. First enforcement target — platform, tool, or generator — sets the operative precedent for the rest of the year. Civil penalties up to $53,088 per violation; 48-hour removal SLA; Section 5 enforcement (Section 230 is no shield)
- AI executive order — signed June 2, 2026, light-touch — Trump signed Promoting Advanced Artificial Intelligence Innovation and Security on June 2, 2026. The version that landed is voluntary, not the FDA-style framework early drafts floated: developers of “covered frontier models” may give federal agencies a 30-day pre-release look, and the text explicitly bars any “mandatory governmental licensing, preclearance, or permitting requirement.” OpenAI, Anthropic, and Google all welcomed it. For AI video the exposure runs through the frontier-model layer, and the DOJ criminal-AI priority stacks onto TAKE IT DOWN. The month-long regulatory overhang resolved in industry’s favor (Weekly Roundup — June 8, 2026)
- Runway Cosmos Coalition — world models as an industry alliance — In June 2026, Runway, NVIDIA, and “leading AI labs” announced the Cosmos Coalition to build and open-source frontier world models; Runway joins as a founding member. As announced it’s a mission statement — no specs, no timeline, no architecture. It landed the same day as NVIDIA’s actual Cosmos 3 release (now in Open-Source above) and Luma’s Open Physical AI Lab, making “world model” three different things in 24 hours (Weekly Roundup — June 8, 2026)
- Agnes-Video — a new API price floor — Singapore’s Sapiens AI entered the Artificial Analysis arena with Agnes-Video-V2.0 at $0.30/min — the cheapest price on the board, roughly a tenth of the next commercial tier. The catch is quality: it debuts near the bottom (Elo 905, #24 on audio T2V). A price-floor entrant, not a quality threat — yet (Weekly Roundup — June 8, 2026)
- Varya — India’s government-backed open-weight entrant — Bangalore’s Avataar.ai launched Varya (June 11, 2026), an open-weight video model distilled from Alibaba’s Wan 2.2 (4 steps vs. 50; 5-second 720p in 45s on an H200), weights on India’s AI Kosh repository under the $1.2B IndiaAI Mission. The hook is price: a vendor-reported ₹0.48/sec (~$0.005), roughly 20× below the global leaders. Treated as a watch entrant, not a tracked open-source row: the 14B parameter figure is Indian-outlet-only (not confirmed by TechCrunch), Artificial Analysis hasn’t benchmarked it, and Avataar’s own model page was unreachable at writing. Real as a price-floor and sovereign-AI signal; specs unverified (Weekly Roundup — June 15, 2026)
- Black Forest Labs enters the video conversation — Martin Scorsese joined BFL as an adviser (June 2, 2026), publicly using FLUX to storyboard his next film; FLUX.2 shipped a multi-reference feature the same day. BFL is primarily an image-model company — a video model is in development — but it sits on $300M in Series B funding ($3.25B valuation, December 2025; a16z, NVIDIA, Salesforce Ventures) and is already a Firefly multi-model partner (Weekly Roundup — June 8, 2026)
- State deepfake legislation map — Federal preemption of state AI law isn’t happening in 2026. Connecticut HB 5312 (May 2026) establishes a private right of action against creators of non-consensual AI-generated intimate imagery. Vermont’s election-deepfake bill, Iowa’s chatbot-safety law, and Utah’s nine AI bills moved in the same period. Together with Tennessee’s ELVIS Act, California’s AB 2655, and New York’s election-deepfake law, the state map is denser than the federal one. AI video labs allowing image-to-video from real-face inputs need to model state civil exposure alongside federal compliance
- State synthetic-media provenance mandates — A second state-law track, distinct from the NCII-liability bills above: provenance-embedding requirements that reach the generators directly rather than punishing misuse. Connecticut SB 5 passed both chambers in May 2026 and awaits Gov. Lamont’s signature; it requires providers with >1M monthly users to embed C2PA-aligned, tamper-resistant provenance data in generated audio, image, and video (provenance obligation effective Oct 1, 2026; detectability standard Oct 1, 2027). Arizona SB 1786 passed the state Senate and awaits a House vote (reconciliation stalled as of late May 2026). California SB 1000 advanced through the state Senate 33–1 with an urgency clause in May 2026 — urgency designation triggers immediate effect on signature rather than the standard January 1 timeline. Hawaii HB 2137 has sat on Governor Green’s desk without a signature; status verified against the Transparency Coalition tracker. Unlike removal mandates, provenance embedding is a model-build requirement — it changes what the model ships, not just what a platform takes down (Weekly Roundup — May 25, 2026)
- Runway Gen-4.5 — Accessible via Adobe Firefly’s multi-model hub since April 2026; standalone Gen-4.5 launch on runwayml.com still pending. Previewed on NVIDIA Vera Rubin hardware at GTC (March 2026)
- NVIDIA Vera Rubin cloud deployment — AWS, Google Cloud, Microsoft Azure, and OCI all confirmed H2 2026 availability. Vera Rubin delivers 10× lower inference token cost versus Blackwell — the number that will reshape per-second AI video pricing across all major cloud platforms
- DLSS 5 — NVIDIA’s neural rendering technology, launching Fall 2026. Explicitly positioned for filmmaking and VFX beyond gaming; uses generative AI to infuse photoreal lighting and materials anchored to source 3D geometry
- Blackburn draft AI bill — GOP Senate draft (March 19, 2026) declares AI training on copyrighted works not fair use; targets deepfakes and Section 230. Not yet introduced as legislation; path to passage uncertain
- White House AI framework vs. CLEAR Act — White House (March 2026) takes the opposite position from Blackburn: AI training is not infringement; courts should decide. Bipartisan CLEAR Act (Schiff/Curtis) proposes mandatory training data disclosure without resolving fair use. Three irreconcilable positions now active in Washington simultaneously
- Seedance 2.0 copyright litigation — US CapCut access available since April 2026 with real-face and IP restrictions, but the underlying copyright dispute with Disney, Paramount, Warner Bros., and Netflix remains unresolved. The restrictions are a negotiating posture, not a settlement
- OpenAI robotics / world simulation — OpenAI redirected Sora’s compute toward “world simulation for robotics” after shutting the product down. The consumer app went dark on April 26, 2026 as scheduled; the Sora API remains accessible until September 24, 2026
- Adobe Firefly multi-model expansion — Firefly’s video hub now hosts 30+ third-party AI models including Kling 3.0/Omni, Veo 3.1, Runway Gen-4.5, ElevenLabs Multilingual v2, Luma AI, Black Forest Labs, and Topaz Labs. Firefly AI Assistant orchestrates multi-step workflows across Photoshop, Premiere, Lightroom, Express, and Illustrator
- Tencent vs. Alibaba 3D world model race — Two of China’s largest AI labs shipped 3D world models on the same day, April 16, 2026 (Alibaba’s Happy Oyster, gated; Tencent’s HY-World 2.0, open weights). Western labs have nothing comparable in production; the 6-to-12 month head start is real if world simulation matters as much as OpenAI’s Sora-shutdown framing implied
- Google Vids / Workspace expansion — YouTube export is live; paid creative tiers (Pro/Ultra) include Lyria 3 music generation and AI avatars. Further Workspace AI integration expected throughout 2026
- EU AI Act Article 50 — a two-step deadline, not one cliff: the disclosure and deepfake-labeling obligations take effect August 2, 2026, but the May 2026 AI Omnibus agreement granted a four-month grace period — to December 2, 2026 — on the harder machine-readable watermarking requirement (Art. 50(2)) for generative systems already on the market before August 2. The Code of Practice defining the technical standard is still in draft (Weekly Roundup — June 15, 2026)
- Unlimited-length AI video — EPFL’s drift elimination breakthrough (presenting at ICLR 2026) could remove the duration ceiling entirely
- xAI targeting 30-minute video — Announced goal for late 2026, with full-length films targeted for 2027
This page is maintained by RCTV as a public reference. For weekly updates on model releases and industry shifts, see our Weekly Roundup.
Have a correction or update? Contact us at rctv.oxncw@simplelogin.com
Changelog
Showing the four most recent updates. Full changelog archive →
June 14, 2026
- Last updated date: Advanced from June 11 to June 14, 2026.
lastmodset to 6/14 deliberately so the June 15 Weekly Roundup (R#8) leads as the headline article rather than tying with this Stack update (lead-by-one rule). Ships Sunday per the standard Stack-paired-with-Roundup cadence. - The Agentic Layer — Higgsfield Supercomputer: Added the early-June distribution sprint — five external surfaces in 12 days (Claude MCP May 28, Adobe plugins May 29, Figma June 4, Minecraft mod June 5, DaVinci Resolve plugin June 8) plus the Grok Imagine 1.5 integration into Higgsfield’s own platform — the clearest routing-bet expression yet (a competitor’s model inside Higgsfield’s surface).
- Other Notable Open-Source — NVIDIA Cosmos 3: Updated the launch-time “claim not independently confirmed” to live-board confirmation — Cosmos3-Super leads open-weight image-to-video at Elo 1,251 (June 2026).
- What’s Coming — two updates: Added Varya (Avataar.ai, June 11 — India IndiaAI-Mission open-weight Wan-2.2 distillation, vendor-reported ~$0.005/sec; watch entrant, specs unverified, not a tracked row). Rewrote the EU AI Act Article 50 item from a flat August cliff to the two-step deadline — disclosure August 2, machine-readable watermarking December 2 for in-market systems (May 2026 AI Omnibus grace period; surfaced by the R#8 Cowork exit-test).
- Considered and excluded: Varya as a counted open-source row (specs Indian-outlet-only, AA-unbenchmarked, vendor page down — watch entrant until verified); the Cowork “HappyHorse weights flipped to published” claim (our render still showed “Coming Soon” — held out as unverified, HappyHorse open-weights item unchanged).
- Paired update outside this file:
params.tomlogImage→roundup-2026-06-15.png;[landscape]counts unchanged (commercial 8 / open_source 10 / agentic 5 — Varya is a watch entrant, not a counted row). Changelog trim: the May 25 block moves to the archive on the next pass to restore the four-most-recent display. - Process note: Stack deltas drafted blind of the Cowork comparison (read only after R#8 was built, exit-test discipline); the comparison’s one verified catch — the EU Art. 50(2) grace period — is reflected here and in the roundup’s What-to-Watch.
June 11, 2026
- Page restructure (reference-card format): Every model and agent entry now leads with its spec card — the canonical location for volatile facts (pricing, resolution, arena positions, access) — followed by current-state analysis. Relative-time references removed page-wide (11 instances); duplicated stats reconciled to single canonical values (HappyHorse-1.0 T2V no-audio Elo confirmed at 1,293 against the live Artificial Analysis board). No facts removed: dated event history lives in this changelog.
- Changelog split: This page now carries the four most recent update blocks; the full history lives in the changelog archive.
- Navigation anchors: Stable anchors added to every model and agent entry; Quick Verdict bullets and Routing Framework answers now deep-link to entries.
June 7, 2026
- Last updated date: Advanced from May 31 to June 7, 2026.
lastmodset to 6/7 deliberately so the June 8 Weekly Roundup (R#7) leads as the headline article rather than tying with this Stack update (lead-by-one rule). Ships Sunday per the standard Stack-paired-with-Roundup cadence. - Tags: Added
cosmos,black-forest-labs,flux,agnes-video— NVIDIA Cosmos 3 enters open-source tracking; BFL/FLUX and Agnes-Video surface as What’s-Coming entrants. - Grok Imagine — xAI: Added Grok Imagine Video 1.5 (June 3) — xAI’s first native image-to-video API ($0.08/sec, 60 RPM, audio output, alias
grok-imagine-video-1.5-2026-05-30); opens #2 on the Artificial Analysis I2V with-audio arena (Elo 1,110), behind Seedance 2.0 and ahead of HappyHorse-1.0. xAI’s #1-debut claim not borne out by the live board. New model paragraph + Arena / Max-resolution bullets; Quick Verdict + Quick Reference Grok row updated. - Arena standings refresh (verified against live Artificial Analysis boards, June 7): T2V no-audio — HappyHorse-1.0 #1 (1,293), Seedance 2.0 #2 (1,274), Grok #5 (1,235). T2V with-audio — Seedance 2.0 #1 (1,215), HappyHorse #2 (1,122): a 92-point gap reopened after a June recalibration. I2V with-audio — Seedance 2.0 #1 (1,194), Grok 1.5 #2 (1,110), HappyHorse #3 (1,094). Corrected a stale board-conflation in the prior Quick Verdict / HappyHorse standings. Updated Quick Verdict (HappyHorse + Grok), Quick Reference (Grok + HappyHorse rows), and the Seedance / HappyHorse / Grok model-entry benchmark bullets.
- Other Notable Open-Source — added NVIDIA Cosmos 3: Open-weights omnimodel (Cosmos 3 Super + Nano on Hugging Face, OpenMDW-1.1, June 1; 291-author arXiv paper). NVIDIA’s “top open-source T2I/I2V” claim attributed, not independently confirmed.
- The Agentic Layer — Higgsfield Supercomputer: Backbone upgraded to Claude Opus 4.8 (May 29); Higgsfield Reframe (May 30, MCP-native) noted. Table cell + paragraph.
- What’s Coming: Rewrote the “AI executive order” item from postponement to signed June 2 (voluntary 30-day frontier review, no-licensing language, cybersecurity package; OpenAI/Anthropic/Google welcomed) — resolves the May 21 pull. Added “Runway Cosmos Coalition” (June 1 alliance, base model unspecified), “Agnes-Video — a new API price floor” ($0.30/min, Elo 905 #24), and “Black Forest Labs enters the video conversation” (Scorsese adviser, FLUX.2, $300M Series B). Updated the HappyHorse open-weights item to ~6 weeks, AA-API-only.
- Considered and excluded: Agnes-Video as a Big Eight row (Elo 905, price-floor not production-grade — What’s-Coming watch); NVIDIA Cosmos 3 as a Big Eight row (open-weights physical-AI omnimodel — Open-Source Notable); Luma Human After All / Open Physical AI Lab (partnership + research lab, no shipped model — roundup material, not Stack).
- Paired update outside this file:
params.toml [landscape] open_source9 → 10 (Cosmos 3 added); commercial (8) and agentic (5) unchanged. R#7 (ai-video-weekly-roundup-2026-06-08.md) reads the Stack’s live.Lastmodfor itsup_nextcard — no date sync needed (template change shipped 5/31). - Process note: Phase 1 pre-flight surfaced the 11-item change list (all standings pre-verified against live AA boards) before this Phase-2 atomic Edit sequence; no full-file rewrite (transcription-risk avoidance, ~660-line page). Stack deltas drafted blind of the June-5 Cowork comparison drop, read only after R#7 was built (exit-test discipline); the comparison’s verified catches — EO signing, Agnes, the arena recalibration — are reflected here.
May 31, 2026
- Last updated date: Advanced from May 25 to May 31, 2026.
lastmodset to 5/31 deliberately so the June 1 Weekly Roundup (R#6) leads as the headline article rather than tying with this Stack update (lead-by-one rule); R#6up_next.datesynced to 5/31 in the same pass. Ships today (Sunday) per the standard Stack-paired-with-Roundup cadence. - NEW SECTION — The Agentic Layer: Orchestration on Top of the Models {#agentic}: Dedicated H2 placed after The Big Eight, before Open-Source. Resolves the open structural question raised 2026-05-18 (agentic layer had no home in the Stack). Build derived from
drafts/2026-05-26-agentic-orchestration-brief.md(Carlos-approved); decision-to-build pulled forward from the 6/2 gate to today. Five confirmed agents in a 7-column orchestration table + per-agent paragraphs: Luma Agents, Pika Agents (PikaStream 1.0 as a sub-bullet — runtime, not orchestrator), Runway Agent, Higgsfield Supercomputer, Adobe Firefly AI Assistant. Lead paragraph carries the “model is the engine, agent is the steering wheel” thesis; what-to-watch footer names three axes (single/multi-vendor convergence, AI-native vs incumbent, agent-to-agent interop). Archive callbacks to R#2 (Pika Agents lede) and R#4 (Mazda/Boundless) threaded. - Census frontier closed: Krea AI and LTX Studio both evaluated against the agent-vs-aggregator test and excluded — Krea is a menu-driven model picker; LTX Studio is a timeline/studio tool (multi-model but menu-driven, no persistent cross-session context). Both fit the deferred “Tools” framing, not the agentic section.
- Arena updates (routine refresh, through 5/29): HappyHorse-1.0 repositioned to #2 on Artificial Analysis T2V no-audio (Elo 1,212; Seedance 2.0 now #1 at 1,215 — 3-Elo gap, inside the noise band); #1 on I2V no-audio unchanged. Grok Imagine added at #5 T2V no-audio (Elo 1,234 — effectively tied with Kling 3.0 Omni at #4). Updated in the Quick Verdict, Quick Reference table, and the HappyHorse + Grok Imagine model entries; new Quick Verdict bullet “Best multi-model orchestration agent — Pika Agents.”
- What’s Coming — three new items: Reactor (real-time world models as infrastructure — $59M stealth exit May 28, Lightspeed-led, Katzenberg board observer); MCP as a distribution layer for AI video (Runway/Pika/Higgsfield MCP cluster May 27–28 + Higgsfield Adobe Premiere/After Effects plugins); Dreamina Octo (pre-launch, “Vibe Create,” early-access survey live). Existing agentic-orchestration item updated from four to five agents with
#agenticanchor links and “now has its own dedicated section.” - Homepage band — third tier:
params.toml [landscape] agentic = 5added. - Paired updates outside this file:
.claude/commands/rctv-landscape-update.md;knowledge/design-system-brief.md;content/posts/ai-video-weekly-roundup-2026-06-01.md.
May 25, 2026
- Last updated date: Advanced from May 24 to May 25, 2026. Off-cycle mid-week edit (the standard cadence is Stack-paired-with-Roundup on Mondays). Operator-approved out-of-band.
- Quick Verdict section added: New
## Quick Verdict: Best AI Video Model by Use Case (2026)block inserted between the intro paragraph and the Quick Reference table. Nine declarative “Best for X — Model Y” verdicts each carrying a one-line judgment, anchored at#quick-verdict. - How to Choose routing framework restructured for SEO efficiency: All 19 bullets reworded from need-form (
Need X? → Y) to declarative-question form anchored on head keywords (Which AI video model wins X? Y.). Targets long-tail question-shape queries. - Sora 2 — OpenAI entry moved to end of Big Eight section: Sora 2 is discontinued and previously sat in the first position — an editorial tension between the section intro’s dominance claim and a tombstone-as-first-entry. Move-to-end resolves it: active models lead the section, Sora 2 closes as the historical reference.