Alibaba's Happy Horse Just Took the AI Video Crown. China Now Owns Two Frontiers.
Alibaba opened public beta for HappyHorse 1.0 this morning. If you have not been tracking the video Arena, here is the short version: a Chinese model from Alibaba's ATH unit is now sitting #1 on the Artificial Analysis Video Arena leaderboard, beating Google's Veo 3, Runway's Gen-4, and ByteDance's Seedance 2.0. By a lot.
Add this to last Friday's DeepSeek V4 release and the picture gets harder to wave away. The open, weights-bearing frontier in two distinct modalities (general purpose LLM and AI video) is now Chinese. That has not been true at any prior point in this race.
What Shipped Today
The launch itself is incremental on paper. HappyHorse 1.0 went anonymous on the Arena in early April, dominated immediately, and Alibaba confirmed authorship on April 10. Today is when public creators can sign up at the Happy Horse website and start generating, with Alibaba Cloud's Bailian (Model Studio) API testing rolling out at the same time. Full API general availability is slated for April 30.
The model itself is the headline. 15 billion parameters. A unified Transfusion architecture, which means the same network handles autoregressive text-to-token planning and continuous diffusion of video frames in a single pass, with audio generated jointly rather than dubbed in afterwards. The output is 1080p with synchronized audio and multilingual lip-sync. That is a small model doing a hard thing.
It supports all four standard video modes: text to video, image to video, each with or without audio. No 5 second cap that I can find documented yet. Resolutions of 480p, 720p, and 1080p are confirmed. Beta is free for now, with paid tiers expected when the API flips to GA.
The Benchmark Gap Is Not Subtle
The Artificial Analysis Video Arena is the closest thing the video model space has to LM Arena: human pairwise preference voting on generated clips, aggregated into Elo. The current standings as of this morning:
| Model | Provider | Arena Elo | License |
|---|---|---|---|
| HappyHorse 1.0 | Alibaba ATH | 1389 | Apache 2.0 (weights pending) |
| Seedance 2.0 | ByteDance | 1274 | Closed |
| Veo 3 | Google DeepMind | 1252 | Closed |
| Runway Gen-4 | Runway | 1218 | Closed |
| Kling 2.5 | Kuaishou | 1206 | Closed |
A 115 point Elo gap is not noise. In LM Arena terms that is roughly the spread between GPT-4o and the median open source model from a year ago. In a leaderboard that has historically clustered the top 4 models within 30 to 50 Elo, HappyHorse is sitting alone.
The gap is largest on prompts that involve native audio. Veo 3 was the first model with synchronized audio at this quality, but its release cadence has been quiet since the Cloud Next demo, and HappyHorse appears to have leapfrogged it on the dimension Google got there first on.
The Sora-Shaped Hole
We covered the death of Sora last month. Sora burned $15M per day in compute and brought in $2.1M total lifetime revenue before OpenAI shut it down. The Disney deal that was supposed to subsidize the rest fell apart. OpenAI no longer has a flagship video model.
That left three serious Western video labs in the running: Google with Veo 3, Runway with Gen-4, and Luma with Ray 3. None of them are open. None of them ship audio plus video plus 1080p in a 15B parameter model. None of them are now #1.
The AI video frontier was never as crowded as the LLM frontier, and what crowd existed just got pushed out of the top spot by a Chinese lab that intends to publish weights. The difference between losing the lead to OpenAI and losing the lead to a model you can download is a different conversation entirely.
Two Chinese Frontiers in One Week
DeepSeek V4 dropped Friday under MIT, with 1.6T total parameters, 1M context, and pricing that undercuts GPT-5.5 by an order of magnitude. HappyHorse goes into beta today under Apache 2.0, holding the top of the video leaderboard. Both labs are Chinese. Both ship weights (or commit to). Both clear or beat the closed Western state of the art in their respective categories.
| Modality | Frontier Model | Provider | Open? |
|---|---|---|---|
| General LLM | GPT-5.5 / DeepSeek V4 Pro | OpenAI / DeepSeek | No / Yes (MIT) |
| Reasoning | Claude Opus 4.7 | Anthropic | No |
| Long context | Gemini 3.1 Pro | No | |
| Video | HappyHorse 1.0 | Alibaba ATH | Yes (Apache 2.0) |
| Image | Nano Banana 2 / Imagen 4 | No | |
| Voice | Voxtral / Eleven v4 | Mistral / ElevenLabs | Mostly closed |
Two of those rows now belong to Chinese labs that publish weights. That is the change. The framing of "open source models close the gap, slowly, on western frontier models" that we leaned on for most of 2025 is now an artifact. They are at the front in two categories.
Why The Architecture Matters
Transfusion is not new (Meta described the technique in 2024) but HappyHorse is the first deployed video model where it appears to actually pay off at scale. The pitch is that you do not need a separate text encoder, audio decoder, and video diffusion model stitched together. The single network handles all of it, with the autoregressive part planning the scene structure and the diffusion part filling in the pixels.
The practical consequence is parameter efficiency. 15B parameters is small. Veo 3 and Sora are both believed to be in the 50B to 100B range based on inference cost reports. A 15B model running at 1080p with synchronized audio means the inference economics for video are about to look very different from what Sora's collapse suggested.
That matters because video generation is the modality where unit economics have killed the most companies. If HappyHorse genuinely runs cheap on inference and Apache-licensed weights drop in the next few weeks, the wave of video startups that have been quietly dying behind their VC bridge rounds suddenly has a path again. Build product on top of weights you can self-host, do not pay anyone $0.50 per second.
What I'm Watching
Three things over the next two weeks.
First, the actual weights drop. Alibaba committed to Apache 2.0. DeepSeek delivered on the same commitment in 72 hours. If HappyHorse weights are not on Hugging Face by mid May, the open frontier story gets one big asterisk.
Second, what Google and Runway do at NAB and at Cloud Next follow-on events. Veo 3 was the audio plus video first mover. Losing the leaderboard inside 60 days of release will force a response, and Google has the compute and the model team to ship one. Runway is in a tougher spot.
Third, the response from Washington. Two Chinese frontier wins in a week, both structured as open weight releases that put pressure on US compute exports, will not be ignored. Whether that response is more chip restrictions, model export controls, or a push for federal compute subsidies is the live question.
Our Take
Last Friday I wrote that DeepSeek V4 was the first open source frontier LLM and that closed labs should be worried. Today I am writing the same thing about video. Two data points is not a trend, but it is past being a coincidence.
For builders, the practical advice has not changed: assume open weight Chinese models will be at or above the closed Western frontier in your modality within 6 months, architect your stack to swap in self-hosted inference cleanly, and budget for the possibility that your video pipeline costs drop 80% in the next two quarters.
We are adding HappyHorse 1.0 to our models tracker this afternoon and will publish a side by side video output comparison with Veo 3, Seedance 2.0, and Runway Gen-4 once the API is fully open. If you want to test generations yourself, the public beta is at the Happy Horse website starting today, with Alibaba Cloud Bailian onboarding active for API test access.
The horse left the gate first. Now we find out who is fast enough to catch it.