Gemma 4 Secret MTP Discovery Sparks Developer Backlash

AI-AnalyzedAnalysis generated by Gemini, reviewed editorially. Methodology

Why It Matters

The removal of performance-enhancing features suggests a growing trend of 'nerfing' open-weights models to maintain a gap between free and paid enterprise offerings. This impacts developers' ability to optimize on-device inference for mobile applications.

Key Points

A developer discovered hidden Multi-Token Prediction (MTP) weights in Gemma 4 files via the LiteRT API.
A Google employee allegedly confirmed MTP was disabled to ensure 'compatibility and broad usability' across different hardware.
The community is exploring reverse-engineering the LiteRT compute graph to restore the disabled performance features.

Google has come under scrutiny after developers discovered that the Gemma 4 model architecture contains latent Multi-Token Prediction (MTP) weights that were disabled in the official release. The discovery was made by a developer using the LiteRT API on a Google Pixel 9 device, where tensor shape errors revealed the presence of MTP heads designed for speculative decoding. A Google representative reportedly confirmed that the feature was intentionally removed to ensure broad compatibility across various hardware environments. This revelation follows previous community disappointment regarding the unreleased Gemma 124B model. Technical experts are now discussing the possibility of reverse-engineering the LiteRT compute graph to reactivate these high-speed generation capabilities. Google has not yet issued a formal statement regarding whether a 'Pro' version of the weights with MTP enabled will be released to the public.

A developer digging into Google’s new Gemma 4 model found some hidden 'secret sauce' that makes it run much faster. It turns out Google included the hardware for 'Multi-Token Prediction'—which lets the AI guess multiple words at once—but they turned it off before giving it to the public. When the developer’s phone crashed trying to load the hidden files, a Google employee confirmed they disabled it on purpose to make sure it works on older devices. Now, the AI community is frustrated because they feel they're getting a slower version of what Google actually built.

Sides

Critics

Electrical-Monitor27C

Discovered the hidden weights and expressed frustration that Google 'nerfed' the model's speed.

AI Developer CommunityC

Argues that Google is gatekeeping performance and wants full transparency regarding model capabilities.

Defenders

GoogleC

Maintains that removing the feature was a technical decision to ensure the model runs reliably on a wider range of consumer devices.

Join the Discussion

Discuss this story

HN Reddit Bluesky Telegram

Community comments coming in a future update

Be the first to share your perspective. Subscribe to comment.

Noise Level

Reach

Engagement

Star Power

Duration

Cross-Platform

Polarity

Industry Impact

Forecast

AI Analysis — Possible Scenarios

Independent researchers will likely attempt to patch the Gemma 4 weights to re-enable MTP within the next few weeks. Google may face pressure to release an 'Experimental' or 'Turbo' branch of Gemma 4 that officially supports these faster inference methods.

Based on current signals. Events may develop differently.

Timeline

Apr 7, 10:00 AM
Google confirmation reported
The developer claims a Google employee confirmed the intentional removal of MTP for compatibility reasons on a Hugging Face discussion thread.
Apr 7, 08:42 AM
Hidden MTP weights discovered
Reddit user Electrical-Monitor27 reports finding MTP prediction heads while debugging LiteRT on a Pixel 9.