Uncensored Qwen3.6-35B Released Using Advanced MoE Abliteration
Why It Matters
This marks a technical evolution in jailbreaking Mixture-of-Experts (MoE) models by targeting specific 'safety experts' rather than global attention layers. It demonstrates that as AI architectures become more complex, removal of safety constraints is becoming more precise and effective.
Key Points
- The release targets the Mixture-of-Experts (MoE) architecture by suppressing specific 'safety experts' and modifying MLP down-projections.
- The model achieved a refusal rate of 7/100 using a strict LLM-based judge, compared to 100/100 for the original base model.
- A new framework called 'Abliterix' was introduced to handle the technical challenges of abliterating non-dense model architectures.
An independent developer has released an 'abliterated' version of Alibaba’s Qwen3.6-35B-A3B model on Hugging Face, specifically designed to bypass the model's internal refusal mechanisms. Unlike traditional fine-tuning or weight manipulation for dense models, this release utilizes a framework called 'Abliterix' to target the Mixture-of-Experts (MoE) architecture. The technique suppresses 'safety experts' within the model's expert path and applies orthogonalized steering vectors to the MLP down-projections. The developer claims a significant reduction in refusal rates, dropping from a baseline of 100% to 7% under strict evaluation by a Gemini 3 Flash judge. The release highlights an ongoing arms race between corporate safety alignment and open-source efforts to produce unrestricted models, with the developer cautioning that lower refusal rates reported by others often rely on flawed evaluation metrics.
A developer just released a 'jailbroken' version of the new Qwen3.6 model that is much harder to censor. Usually, when people try to stop an AI from saying 'bad' things, they tweak the whole brain at once. But this model is like a team of 256 specialists, and the developer figured out how to find and quiet just the 'safety police' experts in that group. By using a new method called Abliterix, they made the AI much more likely to answer dangerous or restricted questions without the usual 'I cannot fulfill this request' response. It's a big deal because it shows that even complex, multi-part AI models can be stripped of their safety features quite easily.
Sides
Critics
As the original creators, they implement safety guardrails to prevent misuse and ensure alignment with corporate and regulatory standards.
Defenders
The developer argues that removing safety guardrails is necessary for research and that many current safety metrics are misleadingly optimistic.
Neutral
Used as a 'Judge' model to objectively evaluate the refusal rates and output quality of the modified model.
Noise Level
Forecast
Regulatory pressure on model hosting platforms like Hugging Face will likely increase as 'abliteration' techniques become more automated and effective. In the short term, expect a wave of similar MoE-specific uncensored releases for other high-performance models like Mixtral and DBRX.
Based on current signals. Events may develop differently.
Timeline
Abliterated Qwen3.6 Released
Developer Free_Change5638 posts the modified Qwen3.6-35B-A3B model to Hugging Face and Reddit.
Join the Discussion
Discuss this story
Community comments coming in a future update
Be the first to share your perspective. Subscribe to comment.