News Grower

Independent coverage of AI, startups, and technology.

Ars Technica May 6, 2026 at 15:44 Big Tech Stable Warm

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

Signal weather

Stable

The story has moved beyond the first headline and now acts as a reliable context anchor.

By Ryan Whitwam Original source
Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments

Stay on the signal

Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.

We send a confirmation link first, then only meaningful digests.

Story map

Understand this topic fast

A quick entry into the story: why it matters now, who is involved, and where to go next for context.

Why it matters now

This story is still moving and pulling follow-up coverage.
There are already 6 connected articles in the same storyline to continue from here.
The story keeps orbiting around Ars Technica, Future Tokens, and Gemma, so the entity pages are the fastest way to build context.
Ars Technica already has 4 follow-up stories on the same theme.

Topic constellation

Open the live map for this story

See which entities, story threads, sources, and follow-up articles shape this story right now.

Click nodes to continue

Entity Cluster Article Hub Source

Story timeline

Continue with this story

A short sequence of events and follow-up stories to understand the arc quickly.

Jun 22, 2026 at 17:10 Ars Technica

Doorbell cam filmed Tesla Autopilot crash that killed woman in her home

Tesla touts Autopilot as lifesaving a day after grandmother died in crash.

Jun 22, 2026 at 15:22 Ars Technica

Lucid lays off 1,500 workers in second big cut of the year

The cuts and redundancies are part of a plan to "simplify the company," the CEO says.

Jun 22, 2026 at 15:18 Ars Technica

A US military exercise in space got underway with barely anyone noticing

The Space Force wants to cut the time to field new satellites from years to weeks, days, or hours.

Jun 22, 2026 at 15:07 Ars Technica

1,250 hp hybrid Corvette shatters the Pikes Peak production record

The high-altitude race is a unique test of car and driver.

Jun 22, 2026 at 14:11 Ars Technica

This former hacker saw the light—and now wants to collect all of it

"I don’t know of a bigger question we can answer as humans."

May 6, 2026 at 15:44 Ars Technica

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

How reliable this looks

Signal and trust for Ars Technica

This source works at a steady pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.

Trusted

Reliability

92

Freshness

100

Sources in storyline

1

Related articles

More stories that share tags, source, or category context.

More from Ars Technica

Fresh reporting and follow-up coverage from the same newsroom.

Open source page