AI Video Weekly Roundup

Alibaba’s HappyHorse-1.0 went commercial on fal Monday. A day later, Pika Labs reintroduced itself as an agent that orchestrates Kling, Veo, MiniMax, and Sora — including its own video model in the lineup. Two launches, one theme: the labs gaining ground are the ones building over the model layer, not winning the model layer.

Models covered: HappyHorse · Pika · Kling · Veo · Seedance · MiniMax · Sora

🐎 HappyHorse-1.0 Goes Commercial on fal

Alibaba’s HappyHorse-1.0 — the model that’s held #1 on Artificial Analysis’s text-to-video leaderboard since its April 7 reveal — went live on fal as an official API partner on April 27. Four endpoints — text-to-video, image-to-video, reference-to-video, and video-edit — at $0.14 per second for 720p output and $0.28 per second for 1080p. Pay-per-second, no minimums.

The technical specs back the leaderboard position. HappyHorse runs a unified 15-billion-parameter, 40-layer self-attention Transformer that generates audio and video jointly in a single forward pass — no cross-attention modules, no separate audio post-processing, no second-stage Foley layer. Native lip-sync works in seven languages: English, Mandarin, Cantonese, Japanese, Korean, German, and French. Inference clocks roughly 38 seconds for a 1080p clip on a single H100.

The Artificial Analysis scoring tells the rest of the story. As of May 3, HappyHorse holds Elo 1,354 on text-to-video — 84 points ahead of Dreamina Seedance 2.0 in second place, and 102 points ahead of Kling 3.0 1080p Pro in third. The reveal-time score was 1,333; the model has gained ground as more arena votes accumulated. Image-to-video sits at Elo 1,392, an even larger lead.

The leaderboard split between audio-included and audio-excluded arenas is the production-routing signal worth flagging. On the text-to-video-with-audio leaderboard, HappyHorse drops to #2 at Elo 1,218 — two points behind Dreamina Seedance 2.0 at 1,220. Essentially tied, but a clean inversion of the no-audio result. The gap reflects a less mature joint audio synthesis pipeline, not weakness in the visual output itself. For shots where audio is part of the brief, Seedance still leads on the live leaderboard. For silent or music-overlaid work, HappyHorse holds the ceiling.

Alibaba Cloud Bailian opened enterprise-grade access to its own customers on April 27 with full commercialization queued for May. Between fal and Bailian, the global API market gets two parallel commercial pipelines for the model with no realistic competitor at the top of the leaderboard.

The open-weights story is messier. ATH’s happyhorse.me/open-source landing page describes HappyHorse-1.0 as “fully open-sourced.” Independent verification by WaveSpeedAI found a public GitHub repo with no model weights, no inference code, and no license file; the Hugging Face profile remains “coming soon.” The charitable read: Alibaba has separated commercial API access (live now) from open-weight distribution (still unscheduled), and the marketing language has gotten ahead of the engineering. The less charitable read: “fully open source” was the marketing claim that drove three weeks of leaderboard coverage, and the gap between that claim and the artifact is now a credibility test for the rest of the rollout. Anyone planning a local-deployment integration is still waiting.

Why it matters: The HappyHorse arc — pseudonymous reveal April 7, Alibaba unmasking April 10, API launch April 27 — closed in three weeks. Faster than any frontier-model commercialization we’ve tracked, and faster than the labs HappyHorse displaced have moved on roadmap items. fal’s role here is the second story. fal isn’t Alibaba; it’s an aggregator. The world’s #1 video model now reaches developers through a third-party API marketplace, not through Alibaba’s own developer portal — the same platform-layer pattern we flagged in last week’s Adobe Firefly piece, one rung deeper down the stack.

🤖 Pika Reintroduces Itself as a Multi-Model Agent

We flagged Pika’s silence as something to watch in last week’s roundup. On April 28, the company answered: Pika Agents, a multi-modal AI creative partner that orchestrates other companies’ video models from a conversational interface. Users describe what they want; the agent decides which models to call, applies stylistic preferences from prior conversations, and iterates from feedback like “make the lighting moodier.”

The model roster Pika Agents orchestrate is striking. On video: Pika’s own model, ByteDance’s Seedance 2.0, Kuaishou’s Kling, MiniMax, Google’s Veo 3, and OpenAI’s Sora. On audio: ElevenLabs, MiniMax Music and Voice, OpenAI Whisper. On images: Gemini, ChatGPT Images 2, SeedDream. Pika has built a meta-layer over the entire competitive AI media stack — including the model OpenAI just sunset on the consumer side.

The platform integrations are the second half of the bet. Pika Agents run inside Slack, Telegram, WhatsApp, Discord, Signal, iMessage, X, Instagram, LinkedIn, YouTube, Notion, GitHub, Dropbox, Figma, and Zoom — among others. The pitch is that the agent lives where the work already happens, with persistent memory and personality across sessions, on whichever surface the user is in at the moment.

The framing shift is the bigger story. Pika spent 2024 and 2025 as one of the loudest “we’re a video model” brands. As of this week, the company has effectively conceded that the model competition is being won elsewhere — and pivoted to selling the conductor instead of the orchestra. PikaStream 1.0 from April 2 makes more sense in retrospect; the live-avatar engine is one capability inside a broader agent product, not a standalone bet.

Why it matters: A year ago, Pika was raising at a video-model valuation. This week the company is re-introducing itself as an agentic platform. Two readings, both true. The optimistic one: Pika spotted the platform shift early and is moving before its model business gets fully commoditized. The cynical one: when a frontier video lab decides its best move is to wrap its rivals’ models, the model itself is no longer the differentiator. Either way, the architectural pattern — agent layer above model layer — is the one Adobe, OpenAI, and the next round of startups will try to copy.

🟡 Grok Imagine Pro Misses Its April Window

Musk’s “later this month” commitment for Grok Imagine Pro closed without a launch. SuperGrok subscribers are still capped at 720p video. The Pro tier — telegraphed for late April with an expected $30/month price point and 1080p output — has slipped into May with no new public timeline from xAI.

The competitive context is the part xAI now has to reckon with. Veo 3.1 Lite ships 1080p at $0.08 per second on the Gemini API. Kling 3.0 ships native 4K at the $8/month tier, with that 4K output reaching Adobe Firefly users last week without a separate Kuaishou subscription. Pika 2.5 ships 1080p at $8/month. The open-source LTX-2.3 outputs true 4K. Grok Imagine’s $4.20-per-minute API remains the cheapest commercial endpoint we track, but speed-and-price-only is a positioning that gets harder to defend each week the resolution gap holds.

Last week we framed xAI’s vertical bet — generation, video understanding, and X-platform distribution under one subscription — as the cleanest end-to-end stack of any lab. That argument depends on the Pro launch closing the resolution gap with the rest of the field. As of May 4, the gap is widening, not closing.

Why it matters: Pro is the upgrade that turns Grok Imagine from “the cheapest 720p generator with 500-million-user distribution” into a credible mid-tier alternative to Pika and Luma. Each week it slips, the per-minute price advantage matters less and the ceiling matters more. We don’t know what’s holding the launch and won’t speculate; we’ll track the xAI release notes and report when the artifact ships.

🪦 One Week After Sora, the Aggregators Filled the Vacuum

Sora’s consumer app went dark seven days before this roundup published. The market response since has been clarifying. The biggest video-model news of the week arrived through fal’s marketplace, not Alibaba’s developer portal. The biggest new-product news was a former model lab repositioning itself as a meta-orchestrator. Even the week’s most visible failure — Grok Imagine Pro’s missed launch — is a single-model lab story; the platform layer kept marching.

Three weekly roundups in a row have landed on this theme without us forcing it: Adobe Firefly’s 30-model hub, then Sora’s consumer exit, now HappyHorse-via-fal and Pika Agents. A useful test for the next month: if the next Veo, Kling, or Seedance release lands first as an Adobe Firefly integration or a fal endpoint rather than as a standalone consumer product, the surface layer has won the distribution argument outright.

Why it matters: A year ago, AI video read as a model-quality race — better physics, longer clips, higher resolution. This week it reads as a distribution race, with model quality treated as table-stakes. For creators, the practical implication is cheaper than the news suggests: pick the platform that aggregates the right models, and let the routing layer pick which model runs the prompt. The labs still set the ceiling. The action is moving to the surface that picks among them.

📈 By the Numbers

1,354 — HappyHorse-1.0’s current Artificial Analysis Elo on text-to-video, holding #1 by 84 points over Dreamina Seedance 2.0 (live leaderboard, captured May 3)
April 27 — fal launched HappyHorse-1.0 as official API partner with four endpoints
15 billion — Parameters in HappyHorse-1.0’s unified 40-layer Transformer; audio and video generated jointly in a single forward pass
7 — Languages with native lip-sync in HappyHorse: English, Mandarin, Cantonese, Japanese, Korean, German, and French
6 — Video models orchestrated by Pika Agents: Pika Video, Seedance 2.0, Kling, MiniMax, Veo 3, and Sora
17+ — Platform surfaces Pika Agents run on, from Slack and Telegram to Notion, GitHub, Figma, and Zoom

🔮 What to Watch Next Week

Google I/O 2026 — May 19–20. Veo 4 is widely expected; Google has used I/O for major Veo announcements in both 2024 and 2025. We’re tracking; if it ships, it lands as a top-tier story in the May 25 roundup.
TAKE IT DOWN Act compliance deadline — May 19, 15 days out. Every covered platform serving the public must have a notice-and-takedown system live with a 48-hour removal window for non-consensual intimate imagery and AI-generated deepfakes. The first federal criminal conviction in early April established the criminal precedent; the May 19 deadline opens the civil-enforcement profile, with FTC penalty exposure that operates outside Section 230. We’re watching for formal compliance pages from YouTube, TikTok, Meta, X, and the AI video labs themselves — none have published yet.
HappyHorse open weights — ATH’s marketing now claims “fully open-sourced” but the public GitHub repo is empty as of May 3. Whether weights actually ship in May or “coming soon” stays the operational posture through summer is the credibility test for the rest of the rollout.

For full specs, pricing, and access details on every model covered this week, see the AI Video Generation Tools 2026 reference page.