Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Up to 3x the speed with no loss of quality—is it too good to be true?
Signal weather
Rising
Momentum is building quickly, so this card is a good early entry point into the topic.
Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments
Stay on the signal
Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.
Story map
Understand this topic fast
A quick entry into the story: why it matters now, who is involved, and where to go next for context.
Why it matters now
Topic constellation
Open the live map for this story
See which entities, story threads, sources, and follow-up articles shape this story right now.
Click nodes to continue
Entity pages
Story timeline
Continue with this story
A short sequence of events and follow-up stories to understand the arc quickly.
How reliable this looks
Signal and trust for Ars Technica
This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.
Reliability
92
Freshness
100
Sources in storyline
2
Related articles
More stories that share tags, source, or category context.
Google updates AI search to include ‘expert advice’ from Reddit and other web forums
While citing web forums and discussion boards can help users find answers to more niche queries, this design choice could also prove chaotic.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Here's what has to happen if NASA wants to land on the Moon every month
NASA is serious about taking more shots on goal, but some of them need to start landing.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Infants are bleeding out after parents decline vitamin K shots given at birth
Hospitals report more parents are declining vitamin K shots for their newborns.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
OpenAI president forced to read his personal diary entries to jury
Elon Musk argued the journals show the moment when OpenAI abandoned its mission.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
More from Ars Technica
Fresh reporting and follow-up coverage from the same newsroom.
Here's what has to happen if NASA wants to land on the Moon every month
NASA is serious about taking more shots on goal, but some of them need to start landing.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Infants are bleeding out after parents decline vitamin K shots given at birth
Hospitals report more parents are declining vitamin K shots for their newborns.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
Ars Asks: Share your shell and show us your tricked-out terminals!
A celebration of the tweaks and customizations that make life easier at the CLI.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.
More than just an SUV? Rivian is working on more R2 variants.
Without giving much away, CEO RJ Scaringe teased the idea of an R2 pickup and an R2X.
Signal weather
Momentum is building quickly, so this card is a good early entry point into the topic.
Why now
Fresh coverage with immediate momentum.