Efficiency Controversy Hits GLM 5.1 Over Excessive Chain-of-Thought Tokens
Why It Matters
The shift toward 'thinking' models raises questions about the economic viability and efficiency of open-weights models compared to proprietary giants. It highlights a potential trend where model performance is boosted through sheer token volume rather than architectural intelligence.
Key Points
- GLM 5.1 is reportedly using excessive Chain-of-Thought tokens, reaching over 150,000 tokens for simple coding requests.
- Users report significant latency issues, with the model 'thinking' for 20 to 30 minutes before providing a final answer.
- Despite the high token overhead, the output quality remains inconsistent, with reports of basic programming errors like accessing protected members.
- The controversy highlights a shift in AI benchmarking where 'intelligence' may be tied to token volume rather than architectural efficiency.
A controversy has emerged regarding the efficiency and cost-effectiveness of the recently released GLM 5.1 open-weights model. Users are reporting that the model's 'Chain-of-Thought' (CoT) reasoning process consumes an excessive number of tokens for relatively simple tasks, such as basic UI programming. Reports indicate that the model can spend upwards of 30 minutes and 150,000 tokens on a single prompt, frequently oscillating through internal 'corrections' before producing a final output. While the model is praised for its accessibility as an open-weights alternative to Claude and ChatGPT, critics argue that the high token consumption negates its price advantage. Preliminary user tests also suggest that despite the exhaustive 'thinking' phase, the resulting code still contains basic syntax errors and requires human intervention, sparking a debate on whether state-of-the-art performance is being artificiality inflated via brute-force token generation.
People are starting to notice that the new GLM 5.1 model is like a student who writes a 50-page essay just to answer a multiple-choice question. While it is 'open source' and powerful, it spends an absurd amount of time and money 'thinking' through simple coding tasks. One user spent 30 minutes and 150,000 tokens just trying to get a simple button code, only for the final result to still have basic errors. This raises the question: is the model actually smarter, or is it just 'over-thinking' its way to an answer while burning through your wallet?
Sides
Critics
Argues that the excessive token usage makes the model 'price-unsmart' and inefficient compared to Claude or ChatGPT.
Defenders
Developing state-of-the-art open-weights models that utilize extensive reasoning to match proprietary performance.
Neutral
Divided between appreciating the power of open-weights models and concerns over the hardware/cost requirements of running them.
Noise Level
Forecast
Model developers will likely introduce 'thinking' limits or more aggressive pruning of reasoning paths to manage costs. In the near term, expect new benchmarks to emerge that measure 'intelligence per token' to penalize models that use brute-force reasoning.
Based on current signals. Events may develop differently.
Timeline
Model Quality Self-Correction
The model's internal reasoning admits it is 'overcomplicating' the task, leading to user skepticism about the SOTA claims.
Token Usage Analysis
Reports surface that the model consumed over 100k tokens before producing functional code.
GLM 5.1 User Report Gains Traction
A developer documents a 30-minute 'thinking' loop for a simple C++ UI component request.
Join the Discussion
Discuss this story
Community comments coming in a future update
Be the first to share your perspective. Subscribe to comment.