AI Video Weekly Roundup

Two stories bracketed the week. At the policy layer, the regulatory overhang finally lifted — Trump signed the long-delayed AI executive order, and it landed lighter than the industry feared. At the model layer, three organizations reached for the same phrase — “world model” — on the same day to describe three very different things. Below both, Grok launched into the I2V arena’s second slot, and Martin Scorsese put his name and his money behind an AI image lab.

Models covered: Grok Imagine · NVIDIA Cosmos · FLUX

🏛️ Trump Signs the AI Executive Order — and It’s Light-Touch

The regulatory overhang that sat over the field for a month is gone. President Trump signed Promoting Advanced Artificial Intelligence Innovation and Security on June 2 — the order whose public signing got pulled on May 21. The version that landed is meaningfully narrower than the FDA-style framework the early drafts floated.

The mechanism is voluntary. Developers of “covered frontier models” may give federal agencies a 30-day look before wider release, and a classified benchmarking process will define which models cross the threshold. The order goes out of its way to foreclose the industry’s central fear, barring any “mandatory governmental licensing, preclearance, or permitting requirement.” The rest is a cybersecurity and national-security package. OpenAI, Anthropic, and Google all welcomed it; Sam Altman called it “an appropriate balance.”

For AI video, the exposure runs through the model layer. The order regulates frontier models, and among video players xAI’s Grok is the one most likely to meet the “covered” bar — which would put the medium’s most capable models on a voluntary federal track for the first time. The new DOJ criminal-AI priority stacks onto the TAKE IT DOWN Act cases already moving, and the August 2 EU Article 50 labeling deadline still sits underneath all of it.

Why it matters: for a month, the unsigned order was the single largest structural unknown hanging over the field. It’s resolved, and it resolved light-touch — no licensing, no pre-clearance, no mandatory review. The industry can stop planning for the FDA-style regime that looked likely in early May.

🌐 “World Model” Has Lost the Plot

Three announcements, one day. They are not the same thing.

NVIDIA Cosmos 3 is a real, downloadable artifact. The company released Cosmos 3 Super and Cosmos 3 Nano on Hugging Face under an open license, corroborated by a 291-author arXiv paper submitted June 1. The architecture handles text, images, video, ambient sound, and actions in a unified system. NVIDIA claims Cosmos 3 ranks first among open models on Artificial Analysis for text-to-image and image-to-video — NVIDIA’s framing, not an independent finding.

The Runway Cosmos Coalition is a press release with intentions attached. Runway co-CEO Anastasis Germanidis: “The future of AI will be built on world models.” The coalition’s actual commitment is “a base model codeveloped by Runway and NVIDIA” — no specs, no timeline, no architecture. NVIDIA’s newsroom names Runway as one of six founding members. Today it has produced a mission statement.

Luma’s Open Physical AI Lab is a hiring page and a research agenda. Luma’s thesis — that the generalization problem in robotics requires the same multimodal data approach it has used for video — is coherent. But the lab has shipped nothing: no code, no weights, no datasets. The announcement describes intentions and invites partners.

Three rungs: shipped artifact, credibility-signal alliance, strategic-direction announcement. One clarification: NVIDIA Cosmos 3 is NVIDIA’s own open model. It is not the coalition’s “base model codeveloped by Runway and NVIDIA,” which remains unspecified.

We adjudicated this term at Google I/O three weeks ago. Now it’s an industry vocabulary, and the dilution is the story.

Why it matters: when the same technical term describes an open-weights model, an industry alliance, and a research hiring push — in 24 hours — it stops functioning as information. “World model” on a vendor roadmap now tells you nothing. Ask what shipped.

🎬 Grok Lands at Number Two in the I2V Arena

xAI launched Grok Imagine Video 1.5 as a public API on June 3 — four days after we published Grok Imagine vs. Seedance 2.0, which framed the two as opposite bets converging on the same competitive middle. Grok arrived in that middle and went straight to the second slot, directly behind Seedance.

This is xAI’s first native video generation product at a public API. Per xAI’s docs: image-to-video, $0.08 per second, a 60-requests-per-minute cap, model alias grok-imagine-video-1.5-2026-05-30. Secondary reports describe a higher-resolution tier and bundled audio pricing; the docs list a single rate, so treat the tiered figures circulating elsewhere as unconfirmed by xAI.

On the Artificial Analysis Image-to-Video arena — the with-audio board, which confirms Grok generates audio — the live standing as of this writing is Seedance 2.0 720p first (Elo 1194), Grok Imagine 1.5 second (1110), and Alibaba’s HappyHorse-1.0 third (1094). xAI’s launch materials claimed a #1 debut; the board doesn’t bear that out, and launch-day Elo rides thin vote volume regardless. Second on arrival, behind a model that’s held the top for weeks, is still a real debut — and it puts the two subjects of our June 4 comparison at one and two.

Elon Musk shared a Troy/Iliad trailer produced with the model on June 4 — marketing, not a benchmark, but a public quality reference. The I2V race is now explicitly four-way: Grok, Kling, Seedance, Veo. A model xAI didn’t have six months ago opened in second place.

Why it matters: the frontier reshuffled this week, and it reshuffled into the exact pairing we wrote up four days earlier. An operator picking an I2V model today weighs Seedance against a Grok that’s transparent on price and a half-step behind on quality votes.

🎥 Scorsese Is Using AI to Storyboard. He’s Also Backing the Company.

Martin Scorsese joined Black Forest Labs as an adviser. Per Variety and Rolling Stone, he is publicly using FLUX to storyboard scenes from his next film, What Happens at Night, starring Leonardo DiCaprio and Jennifer Lawrence. The announcement came June 2. His manager Rick Yorn brokered the deal through BroadLight Capital — Yorn’s fund, which was already a BFL investor from the company’s $300M Series B (December 2025, $3.25B valuation, a16z / NVIDIA / Salesforce Ventures among backers).

That structure — manager’s fund already invested, manager brokers the advisory deal — is worth naming. Not a scandal; a structure. FLUX.2 shipped simultaneously with a multi-reference feature, accepting multiple reference images to guide a single output. That’s the concrete product news wrapped inside the headline.

Calibration: Scorsese is among Hollywood’s most vocal defenders of traditional filmmaking. His use of an AI image tool for pre-production storyboarding is a genuine industry signal — someone with seven decades of hand-drawn storyboards finds AI image generation useful for communicating shots to his crew. That tells you something real about where the tool has landed. It does not tell you AI is rewriting cinema. He’s using an image tool to sketch shots, on a film that hasn’t started shooting, in pre-production.

“Scorsese endorses AI filmmaking” is the headline many outlets will write. “Scorsese uses an image tool to replace his hand-drawn storyboard sketches” is what happened. One is more useful to the reader.

BFL is primarily an image model company; a video model is in development. This belongs in an AI video roundup because storyboarding sits in the generative video production pipeline, and the Hollywood-AI arc it extends — Hell Grind last week, AI on the Lot at Culver City the week before — is this publication’s beat.

Why it matters: the most prominent traditionalist endorsement AI image generation has received is paired with a financial stake. The Hollywood-AI relationship is no longer just about who is suing whom.

📈 By the Numbers

30-day review — the voluntary frontier-model access window in Trump’s AI executive order, signed June 2; the text bars any “mandatory governmental licensing, preclearance, or permitting requirement.”
3 — “world model” announcements on June 1: Cosmos Coalition, NVIDIA Cosmos 3, Luma Physical AI Lab. One shipped a model. One shipped a coalition. One shipped a hiring page.
291+ authors — the arXiv paper behind Cosmos 3, submitted June 1, the open-weights model under the day’s three claims.
$0.08/sec — Grok Imagine Video 1.5, xAI’s first native I2V API; it opens second on the Artificial Analysis I2V arena, behind Seedance 2.0 and ahead of HappyHorse-1.0.
$300M / $3.25B — Black Forest Labs Series B, December 2025. The Scorsese adviser announcement followed six months later, on June 2, 2026.
$0.30/min — Agnes-Video-V2.0 from Singapore’s Sapiens AI, the cheapest API price on the Artificial Analysis board — a volume tier; it debuts near the bottom on quality.

🔮 What to Watch Next Week

The EO’s implementation clocks. The agency-guidance and classified-benchmark deadlines start running; watch which video-model providers opt into the voluntary frontier-model track first.
Whether the Cosmos Coalition’s “base model codeveloped by Runway and NVIDIA” gets specs. NVIDIA Cosmos 3 is not it. The gap between the announcement and the artifact is the watch.
The I2V arena after Grok 1.5’s first full week. Launch-day Elo rides thin vote volume. Whether Grok closes the gap on Seedance 2.0, holds second, or slides as votes accumulate is the number to watch.
Luma’s Human After All. Luma’s other June move — Ray3.14 doing real-time talent transformation at a new Paris studio, with Google and ElevenLabs — is a quiet signal of professional creative adoption outside the US.

For full specs, pricing, and access details on every model covered this week, see the AI Video Stack 2026 reference page — updated every Monday.

AI Video Weekly Roundup — June 8, 2026