Uncensored Qwen3.6-35B Released Using Advanced MoE Abliteration

AI-AnalyzedAnalysis generated by Gemini, reviewed editorially. Methodology

Why It Matters

This marks a technical evolution in jailbreaking Mixture-of-Experts (MoE) models by targeting specific 'safety experts' rather than global attention layers. It demonstrates that as AI architectures become more complex, removal of safety constraints is becoming more precise and effective.

Key Points

The release targets the Mixture-of-Experts (MoE) architecture by suppressing specific 'safety experts' and modifying MLP down-projections.
The model achieved a refusal rate of 7/100 using a strict LLM-based judge, compared to 100/100 for the original base model.
A new framework called 'Abliterix' was introduced to handle the technical challenges of abliterating non-dense model architectures.

An independent developer has released an 'abliterated' version of Alibaba’s Qwen3.6-35B-A3B model on Hugging Face, specifically designed to bypass the model's internal refusal mechanisms. Unlike traditional fine-tuning or weight manipulation for dense models, this release utilizes a framework called 'Abliterix' to target the Mixture-of-Experts (MoE) architecture. The technique suppresses 'safety experts' within the model's expert path and applies orthogonalized steering vectors to the MLP down-projections. The developer claims a significant reduction in refusal rates, dropping from a baseline of 100% to 7% under strict evaluation by a Gemini 3 Flash judge. The release highlights an ongoing arms race between corporate safety alignment and open-source efforts to produce unrestricted models, with the developer cautioning that lower refusal rates reported by others often rely on flawed evaluation metrics.

A developer just released a 'jailbroken' version of the new Qwen3.6 model that is much harder to censor. Usually, when people try to stop an AI from saying 'bad' things, they tweak the whole brain at once. But this model is like a team of 256 specialists, and the developer figured out how to find and quiet just the 'safety police' experts in that group. By using a new method called Abliterix, they made the AI much more likely to answer dangerous or restricted questions without the usual 'I cannot fulfill this request' response. It's a big deal because it shows that even complex, multi-part AI models can be stripped of their safety features quite easily.

Sides

Critics

Alibaba Qwen TeamC

As the original creators, they implement safety guardrails to prevent misuse and ensure alignment with corporate and regulatory standards.

Defenders

/u/Free_Change5638C

The developer argues that removing safety guardrails is necessary for research and that many current safety metrics are misleadingly optimistic.

Neutral

Google (Gemini 3 Flash)C

Used as a 'Judge' model to objectively evaluate the refusal rates and output quality of the modified model.

Join the Discussion

Discuss this story

HN Reddit Bluesky Telegram

Community comments coming in a future update

Be the first to share your perspective. Subscribe to comment.

Noise Level

Reach

Engagement

Star Power

Duration

Cross-Platform

Polarity

Industry Impact

Forecast

AI Analysis — Possible Scenarios

Regulatory pressure on model hosting platforms like Hugging Face will likely increase as 'abliteration' techniques become more automated and effective. In the short term, expect a wave of similar MoE-specific uncensored releases for other high-performance models like Mixtral and DBRX.

Based on current signals. Events may develop differently.

Timeline

Apr 17, 03:32 AM
Abliterated Qwen3.6 Released
Developer Free_Change5638 posts the modified Qwen3.6-35B-A3B model to Hugging Face and Reddit.