Ars Technica May 6, 2026 at 15:44 Big Tech Rising Hot

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

Signal weather

Rising

Momentum is building quickly, so this card is a good early entry point into the topic.

By Ryan Whitwam Original source

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google's take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. The latest Gemma models are built on the same underlying technology that powers Google's frontier Gemini AI, but they're tuned to run locally. Gemini is optimized to run on Google's custom TPU chips, which operate in enormous clusters with super-fast interconnects and memory. A single high-power AI accelerator can run the largest Gemma 4 model at full precision, and quantizing will let it run on a consumer GPU. Gemma allows users to tinker with AI on their hardware rather than sharing all their data with a cloud AI system from Google or someone else. Google also changed the license for Gemma 4 to Apache 2.0, which is much more permissive than the custom Gemma license Google employed for previous releases. However, there are inherent limitations in the hardware most people have to run local AI models. That's where MTP comes in. Read full article Comments

Read the full article

Stay on the signal

Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Follow this story beyond a single article: new follow-ups, adjacent sources, and the evolving storyline.

Story map

Understand this topic fast

A quick entry into the story: why it matters now, who is involved, and where to go next for context.

Why it matters now

Fresh coverage with immediate momentum.

There are already 6 connected articles in the same storyline to continue from here.

The story keeps orbiting around Ars Technica, Future Tokens, and Gemma, so the entity pages are the fastest way to build context.

Ars Technica already has 4 follow-up stories on the same theme.

Topic constellation

Open the live map for this story

See which entities, story threads, sources, and follow-up articles shape this story right now.

Click nodes to continue

Entity Cluster Article Hub Source

Entity pages

Ars Technica Future Tokens Gemma Google Models Predicting

Story threads

Ars Technica

Latest coverage and related links about Ars Technica.

Ars Technica

Последние материалы и связанный контекст по теме Ars Technica.

Gemma

Latest coverage and related links about Gemma.

Gemma

Последние материалы и связанный контекст по теме Gemma.

Story timeline

Continue with this story

A short sequence of events and follow-up stories to understand the arc quickly.

May 6, 2026 at 15:49 TechCrunch

Google updates AI search to include ‘expert advice’ from Reddit and other web forums

While citing web forums and discussion boards can help users find answers to more niche queries, this design choice could also prove chao...

May 6, 2026 at 15:44 Ars Technica

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Up to 3x the speed with no loss of quality—is it too good to be true?

May 6, 2026 at 14:56 Ars Technica

Here's what has to happen if NASA wants to land on the Moon every month

NASA is serious about taking more shots on goal, but some of them need to start landing.

May 6, 2026 at 14:30 Ars Technica

Infants are bleeding out after parents decline vitamin K shots given at birth

Hospitals report more parents are declining vitamin K shots for their newborns.

May 5, 2026 at 22:28 Ars Technica

OpenAI president forced to read his personal diary entries to jury

Elon Musk argued the journals show the moment when OpenAI abandoned its mission.

May 5, 2026 at 21:41 Ars Technica

Silicon Valley bets $200M on AI data centers floating in the ocean

Panthalassa aims to test floating AI computing nodes in the Pacific in 2026.

How reliable this looks

Signal and trust for Ars Technica

This source works at a rapid pace: 100% of recent stories land in the hot window, and 0% carry visible search signal.

Trusted

Reliability

Freshness

100

Sources in storyline

More stories that share tags, source, or category context.

TechCrunch May 6, 2026 at 15:49 Startups

Rising Hot

Google updates AI search to include ‘expert advice’ from Reddit and other web forums

While citing web forums and discussion boards can help users find answers to more niche queries, this design choice could also prove chaotic.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Answers Discussion Expert Advice Forums

Read article Follow story

techcrunch.com

Here's what has to happen if NASA wants to land on the Moon every month

Ars Technica May 6, 2026 at 14:56 Big Tech

Rising Hot

Here's what has to happen if NASA wants to land on the Moon every month

NASA is serious about taking more shots on goal, but some of them need to start landing.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Happen Here Landing

Read article Follow story

arstechnica.com

Infants are bleeding out after parents decline vitamin K shots given at birth

Ars Technica May 6, 2026 at 14:30 Big Tech

Rising Hot

Infants are bleeding out after parents decline vitamin K shots given at birth

Hospitals report more parents are declining vitamin K shots for their newborns.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Bleeding Decline Vitamin Declining

Read article Follow story

arstechnica.com

OpenAI president forced to read his personal diary entries to jury

Ars Technica May 5, 2026 at 22:28 Big Tech

Rising Hot

OpenAI president forced to read his personal diary entries to jury

Elon Musk argued the journals show the moment when OpenAI abandoned its mission.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Abandoned Ars Technica Elon Musk Journals

Read article Follow story

arstechnica.com

More from Ars Technica

Fresh reporting and follow-up coverage from the same newsroom.

Open source page

Ars Technica May 6, 2026 at 14:56 Big Tech

Rising Hot

Here's what has to happen if NASA wants to land on the Moon every month

NASA is serious about taking more shots on goal, but some of them need to start landing.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Happen Here Landing

Read article Follow story

arstechnica.com

Ars Technica May 6, 2026 at 14:30 Big Tech

Rising Hot

Infants are bleeding out after parents decline vitamin K shots given at birth

Hospitals report more parents are declining vitamin K shots for their newborns.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Technica Bleeding Decline Vitamin Declining

Read article Follow story

arstechnica.com

Ars Asks: Share your shell and show us your tricked-out terminals!

Ars Technica May 6, 2026 at 13:32 Big Tech

Rising Hot

Ars Asks: Share your shell and show us your tricked-out terminals!

A celebration of the tweaks and customizations that make life easier at the CLI.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

Ars Asks Celebration CLI. Ars Technica Customizations

Read article Follow story

arstechnica.com

More than just an SUV? Rivian is working on more R2 variants.

Ars Technica May 6, 2026 at 12:48 Big Tech

Rising Hot

More than just an SUV? Rivian is working on more R2 variants.

Without giving much away, CEO RJ Scaringe teased the idea of an R2 pickup and an R2X.

Signal weather

Momentum is building quickly, so this card is a good early entry point into the topic.

Why now

Fresh coverage with immediate momentum.

CEO RJ Scaringe R2X. Ars Technica Rivian Scaringe

Read article Follow story

arstechnica.com

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Follow Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Understand this topic fast

Why it matters now

Open the live map for this story

Entity pages

Story threads

Continue with this story

Signal and trust for Ars Technica

Related articles

More from Ars Technica