SafetyCase Closed

Abliterated Qwen 3.6 MoE Release Sparks New Safety Debate

Is this a scandal?

No longer — the story has resolved. Noise 1/100, cooling down, across 0 sources.

SCAND-76012as of July 30, 2026Methodology

Cite this incident

"Abliterated Qwen 3.6 MoE Release Sparks New Safety Debate." SCAND.Ai incident SCAND-76012, noise 1/100 as of July 30, 2026. https://scand.ai/scandal/abliterated-qwen-3-6-moe-safety-debate

FORECASTForecast, not fact

Regulatory pressure on model hosting platforms like Hugging Face will likely increase as 'abliteration' techniques become more automated and effective. We can expect model creators to experiment with more deeply integrated safety logic that is harder to isolate from general reasoning capabilities.

Noise 1/100 — louder than 90% of tracked AI controversies.

AI-assisted analysis · How we work

Why it matters

This development demonstrates that sophisticated Mixture-of-Experts (MoE) architectures are not immune to safety-stripping techniques, potentially rendering centralized safety tuning obsolete for open-source releases.

Key points

The researcher used the 'Abliterix' framework to target MoE-specific refusal signals within the expert path rather than standard attention layers.
Refusal rates were reportedly reduced from 100% to 7% using a strict Gemini 3 Flash evaluation metric.
The process involved suppressing the top 10 'safety experts' and applying orthogonalized steering vectors across model layers.
The creator criticized other abliterated models for inflating success rates through shallow keyword-based evaluations.

The story

An independent researcher has released an 'abliterated' version of the Qwen 3.6-35B-A3B model, utilizing a specialized framework to remove embedded safety guardrails. Unlike traditional methods targeting attention mechanisms, this approach specifically suppresses 'safety experts' and modifies the Mixture-of-Experts (MoE) router to prevent refusal behaviors. The researcher claims a significant reduction in refusal rates, dropping from a baseline of 100/100 to 7/100 as measured by an LLM-based judge. The release highlights the increasing technical sophistication of the model-tuning community in bypassing corporate-aligned safety constraints. While the creator framed the project as research-oriented, the availability of high-parameter models without alignment triggers ongoing concerns regarding the proliferation of unregulated AI capabilities.

Who's involved

Critic

Safety Advocates

Generally oppose the removal of guardrails due to risks of misuse for generating harmful content or malware.

Defender

/u/Free_Change5638

Argues that abliteration is a technical research pursuit and provides more transparent, high-quality evals than other 'jailbreakers'.

Neutral

Alibaba Qwen Team

Original developers of the base model with built-in safety guardrails (not directly quoted in this instance).

Join the Discussion

Discuss this story

HN Reddit Bluesky Telegram

Community comments coming in a future update

Be the first to share your perspective. Subscribe to comment.

Noise Level

Reach

Engagement

Star Power

Duration

Cross-Platform

Polarity

Industry Impact

The timeline

Apr 17, 2026
Abliterated Qwen 3.6 Model Published
Researcher posts the modified Qwen3.6-35B-A3B to Hugging Face with detailed methodology on MoE expert suppression.

The forecast

Forecast, not fact — an editorial estimate we score when this resolves.

You're up to date

That's the complete picture as of July 30, 2026 — nothing more to know right now. We'll update this page the moment it changes.