Esc
EmergingCorporate

Gemma 4 Secret MTP Discovery Sparks Developer Backlash

AI-AnalyzedAnalysis generated by Gemini, reviewed editorially. Methodology

Why It Matters

The removal of performance-enhancing features suggests a growing trend of 'nerfing' open-weights models to maintain a gap between free and paid enterprise offerings. This impacts developers' ability to optimize on-device inference for mobile applications.

Key Points

  • A developer discovered hidden Multi-Token Prediction (MTP) weights in Gemma 4 files via the LiteRT API.
  • A Google employee allegedly confirmed MTP was disabled to ensure 'compatibility and broad usability' across different hardware.
  • The community is exploring reverse-engineering the LiteRT compute graph to restore the disabled performance features.

Google has come under scrutiny after developers discovered that the Gemma 4 model architecture contains latent Multi-Token Prediction (MTP) weights that were disabled in the official release. The discovery was made by a developer using the LiteRT API on a Google Pixel 9 device, where tensor shape errors revealed the presence of MTP heads designed for speculative decoding. A Google representative reportedly confirmed that the feature was intentionally removed to ensure broad compatibility across various hardware environments. This revelation follows previous community disappointment regarding the unreleased Gemma 124B model. Technical experts are now discussing the possibility of reverse-engineering the LiteRT compute graph to reactivate these high-speed generation capabilities. Google has not yet issued a formal statement regarding whether a 'Pro' version of the weights with MTP enabled will be released to the public.

A developer digging into Google’s new Gemma 4 model found some hidden 'secret sauce' that makes it run much faster. It turns out Google included the hardware for 'Multi-Token Prediction'—which lets the AI guess multiple words at once—but they turned it off before giving it to the public. When the developer’s phone crashed trying to load the hidden files, a Google employee confirmed they disabled it on purpose to make sure it works on older devices. Now, the AI community is frustrated because they feel they're getting a slower version of what Google actually built.

Sides

Critics

Electrical-Monitor27C

Discovered the hidden weights and expressed frustration that Google 'nerfed' the model's speed.

AI Developer CommunityC

Argues that Google is gatekeeping performance and wants full transparency regarding model capabilities.

Defenders

GoogleC

Maintains that removing the feature was a technical decision to ensure the model runs reliably on a wider range of consumer devices.

Join the Discussion

Discuss this story

Community comments coming in a future update

Be the first to share your perspective. Subscribe to comment.

Noise Level

Murmur38?Noise Score (0–100): how loud a controversy is. Composite of reach, engagement, star power, cross-platform spread, polarity, duration, and industry impact — with 7-day decay.
Decay: 98%
Reach
38
Engagement
78
Star Power
15
Duration
6
Cross-Platform
20
Polarity
50
Industry Impact
50

Forecast

AI Analysis — Possible Scenarios

Independent researchers will likely attempt to patch the Gemma 4 weights to re-enable MTP within the next few weeks. Google may face pressure to release an 'Experimental' or 'Turbo' branch of Gemma 4 that officially supports these faster inference methods.

Based on current signals. Events may develop differently.

Timeline

Today

R@/u/Electrical-Monitor27

Turns out Gemma 4 had MTP (multi token prediction) all along

Turns out Gemma 4 had MTP (multi token prediction) all along Hey Everyone, While I was trying to utilize Gemma 4 through the LiteRT api in my android app, I noticed that Gemma 4 was throwing errors when loading it on my Google Pixel 9 test device of the "mtp weights being an inco…

Timeline

  1. Google confirmation reported

    The developer claims a Google employee confirmed the intentional removal of MTP for compatibility reasons on a Hugging Face discussion thread.

  2. Hidden MTP weights discovered

    Reddit user Electrical-Monitor27 reports finding MTP prediction heads while debugging LiteRT on a Pixel 9.