Efficiency Controversy Hits GLM 5.1 Over Excessive Chain-of-Thought Tokens

AI-AnalyzedAnalysis generated by Gemini, reviewed editorially. Methodology

Why It Matters

The shift toward 'thinking' models raises questions about the economic viability and efficiency of open-weights models compared to proprietary giants. It highlights a potential trend where model performance is boosted through sheer token volume rather than architectural intelligence.

Key Points

GLM 5.1 is reportedly using excessive Chain-of-Thought tokens, reaching over 150,000 tokens for simple coding requests.
Users report significant latency issues, with the model 'thinking' for 20 to 30 minutes before providing a final answer.
Despite the high token overhead, the output quality remains inconsistent, with reports of basic programming errors like accessing protected members.
The controversy highlights a shift in AI benchmarking where 'intelligence' may be tied to token volume rather than architectural efficiency.

A controversy has emerged regarding the efficiency and cost-effectiveness of the recently released GLM 5.1 open-weights model. Users are reporting that the model's 'Chain-of-Thought' (CoT) reasoning process consumes an excessive number of tokens for relatively simple tasks, such as basic UI programming. Reports indicate that the model can spend upwards of 30 minutes and 150,000 tokens on a single prompt, frequently oscillating through internal 'corrections' before producing a final output. While the model is praised for its accessibility as an open-weights alternative to Claude and ChatGPT, critics argue that the high token consumption negates its price advantage. Preliminary user tests also suggest that despite the exhaustive 'thinking' phase, the resulting code still contains basic syntax errors and requires human intervention, sparking a debate on whether state-of-the-art performance is being artificiality inflated via brute-force token generation.

People are starting to notice that the new GLM 5.1 model is like a student who writes a 50-page essay just to answer a multiple-choice question. While it is 'open source' and powerful, it spends an absurd amount of time and money 'thinking' through simple coding tasks. One user spent 30 minutes and 150,000 tokens just trying to get a simple button code, only for the final result to still have basic errors. This raises the question: is the model actually smarter, or is it just 'over-thinking' its way to an answer while burning through your wallet?

Sides

Critics

FPham (Reddit User/Developer)C

Argues that the excessive token usage makes the model 'price-unsmart' and inefficient compared to Claude or ChatGPT.

Defenders

Zhipu AI (GLM Developers)C

Developing state-of-the-art open-weights models that utilize extensive reasoning to match proprietary performance.

Neutral

Open Source AI CommunityC

Divided between appreciating the power of open-weights models and concerns over the hardware/cost requirements of running them.

Join the Discussion

Discuss this story

HN Reddit Bluesky Telegram

Community comments coming in a future update

Be the first to share your perspective. Subscribe to comment.

Noise Level

Reach

Engagement

Star Power

Duration

Cross-Platform

Polarity

Industry Impact

Forecast

AI Analysis — Possible Scenarios

Model developers will likely introduce 'thinking' limits or more aggressive pruning of reasoning paths to manage costs. In the near term, expect new benchmarks to emerge that measure 'intelligence per token' to penalize models that use brute-force reasoning.

Based on current signals. Events may develop differently.

Timeline

Apr 12, 09:45 PM
Model Quality Self-Correction
The model's internal reasoning admits it is 'overcomplicating' the task, leading to user skepticism about the SOTA claims.
Apr 12, 09:20 PM
Token Usage Analysis
Reports surface that the model consumed over 100k tokens before producing functional code.
Apr 12, 09:07 PM
GLM 5.1 User Report Gains Traction
A developer documents a 30-minute 'thinking' loop for a simple C++ UI component request.