Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Up to 3x the speed with no loss of quality—is it too good to be true?
Signal weather
Stable
The story has moved beyond the first headline and now acts as a reliable context anchor.
Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments
Stay on the signal
Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.
Story map
Understand this topic fast
A quick entry into the story: why it matters now, who is involved, and where to go next for context.
Why it matters now
Topic constellation
Open the live map for this story
See which entities, story threads, sources, and follow-up articles shape this story right now.
Click nodes to continue
Entity pages
Story timeline
Continue with this story
A short sequence of events and follow-up stories to understand the arc quickly.
How reliable this looks
Signal and trust for Ars Technica
This source works at a steady pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.
Reliability
92
Freshness
100
Sources in storyline
1
Related articles
More stories that share tags, source, or category context.
Doorbell cam filmed Tesla Autopilot crash that killed woman in her home
Tesla touts Autopilot as lifesaving a day after grandmother died in crash.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Lucid lays off 1,500 workers in second big cut of the year
The cuts and redundancies are part of a plan to "simplify the company," the CEO says.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
A US military exercise in space got underway with barely anyone noticing
The Space Force wants to cut the time to field new satellites from years to weeks, days, or hours.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
1,250 hp hybrid Corvette shatters the Pikes Peak production record
The high-altitude race is a unique test of car and driver.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
More from Ars Technica
Fresh reporting and follow-up coverage from the same newsroom.
Doorbell cam filmed Tesla Autopilot crash that killed woman in her home
Tesla touts Autopilot as lifesaving a day after grandmother died in crash.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Lucid lays off 1,500 workers in second big cut of the year
The cuts and redundancies are part of a plan to "simplify the company," the CEO says.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
A US military exercise in space got underway with barely anyone noticing
The Space Force wants to cut the time to field new satellites from years to weeks, days, or hours.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
1,250 hp hybrid Corvette shatters the Pikes Peak production record
The high-altitude race is a unique test of car and driver.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.