Abliterated Qwen 3.6-35B-A3B Model Released on Hugging Face
Why It Matters
This marks a technical shift in model jailbreaking as techniques adapt to Mixture-of-Experts architectures, making safety alignment increasingly fragile. It highlights the difficulty of enforcing usage policies once weights are released open-source.
Key Points
- A researcher released a modified Qwen 3.6-35B-A3B model that successfully bypasses standard safety refusals.
- The 'Abliterix' technique specifically targets Mixture-of-Experts (MoE) architectures by suppressing 'safety experts' and modifying MLP layers.
- The modified model achieved a 7/100 refusal rate according to a strict Gemini 3 Flash judge evaluation.
- The author claims previous abliteration projects inflate success rates by using weak keyword-based detection rather than LLM-based evaluation.
An independent researcher, operating under the pseudonym Free_Change5638, has released a modified 'abliterated' version of Alibaba's Qwen 3.6-35B-A3B model on Hugging Face. The modification specifically targets the model's safety guardrails, reducing refusal rates from a baseline of 100% to 7% on restricted prompts. Unlike traditional methods that target attention mechanisms, this 'Abliterix' framework addresses the Mixture-of-Experts (MoE) architecture by suppressing 'safety experts' and orthogonalizing steering vectors in the MLP down-projection layers. The researcher claims this method maintains high output quality, reported via a low Kullback–Leibler divergence from the base model. The release includes a warning that the model is for research purposes only, though it provides a functional version of a powerful LLM with significantly reduced content filtering.
A developer just dropped a version of the new Qwen 3.6 model that has its 'safety brain' surgically removed. Usually, AI models are trained to say 'no' to dangerous or spicy questions, but this project used a new technique called 'abliteration' to find and mute the specific parts of the code responsible for being polite. Because this model uses a 'Mixture-of-Experts' setup—like a team of specialists—the dev had to hunt down the specific 'safety experts' in the code to shut them up. It is basically a powerful AI with the filter turned off, raising big questions about how we keep open-source AI safe.
Sides
Critics
No critics identified
Defenders
Original developers of the Qwen architecture who implement safety guardrails to prevent harmful model outputs.
Neutral
Developed and released the abliterated model for research, arguing that strict safety evals are needed to measure true refusal rates.
The hosting platform where the modified weights are currently stored, acting as a repository for both aligned and unaligned models.
Noise Level
Forecast
Regulatory pressure on model hosting platforms like Hugging Face will likely increase as automated safety-removal tools become more sophisticated. We should expect a 'cat-and-mouse' game where developers bake safety deeper into the base weights, while jailbreakers develop more granular expert-level suppression techniques.
Based on current signals. Events may develop differently.
Timeline
Abliterated Qwen 3.6-35B-A3B Published
Researcher Free_Change5638 posts the model and technical methodology to Reddit and Hugging Face.
Join the Discussion
Discuss this story
Community comments coming in a future update
Be the first to share your perspective. Subscribe to comment.