LLM Position Bias Benchmark Reveals Significant Primacy Effect

AI-AnalyzedAnalysis generated by Gemini, reviewed editorially. Methodology

Why It Matters

This bias undermines the reliability of LLMs used for automated grading, legal document review, and ranking-based decision-making. If models consistently favor first-listed options, it introduces systemic unfairness in any evaluative pipeline.

Key Points

Models select the first presented option 63.3% of the time on average.
Choice consistency is low, with 44.8% of decisions flipping when option order is reversed.
The GPT-5x family is identified as having significantly higher position bias than competitors like Opus 4.6.
LLM 'primacy bias' is the inverse of the human 'recency bias' typically found in psychological studies.

The Mazur 2026 benchmark has identified a significant 'primacy bias' in Large Language Models, finding that models select the first of two options approximately 63.3% of the time. When the order of options is reversed, the models' decisions flip in 44.8% of cases, indicating that the choice is often dictated by position rather than content. The study highlights that the GPT-5x series exhibits particularly high levels of this bias compared to its peers. Researchers contrasted this with human behavior, which typically displays a 'recency bias' due to memory limitations during oral presentation. The findings suggest that current training methodologies, including RLHF, have failed to address these structural architectural tendencies. This benchmark raises questions about the validity of using AI for objective ranking tasks without rigorous shuffling and normalization of input data.

Imagine asking a friend to pick between 'Apples' or 'Bananas' and they pick Apples just because you said it first. That is exactly what is happening with AI models like GPT-5, but at a massive scale. A new study found that AI models pick the first choice over 60% of the time, regardless of what that choice actually is. Interestingly, humans usually do the opposite, picking the last thing they heard because it is fresher in their minds. This means if you use an AI to grade resumes or rank products, the order in which they appear might matter more than how good they actually are.

Sides

Critics

MazurC

Conducted the 2026 study demonstrating that position bias is a systemic flaw in current LLM architectures.

Defenders

OpenAI (GPT-5x)C

The developer of the models cited as having particularly egregious position bias, though they have not yet released a formal response.

Neutral

/u/COAGULOPATHC

Socialized the findings and proposed that the bias may stem from how forward passes recompute activations for earlier tokens.

Join the Discussion

Discuss this story

HN Reddit Bluesky Telegram

Community comments coming in a future update

Be the first to share your perspective. Subscribe to comment.

Noise Level

Reach

Engagement

Star Power

Duration

Cross-Platform

Polarity

Industry Impact

Forecast

AI Analysis — Possible Scenarios

Developers will likely implement mandatory 'shuffling' protocols for all ranking tasks to mitigate this effect. In the long term, we should expect new training objectives specifically designed to penalize positional dependency in evaluative prompts.

Based on current signals. Events may develop differently.

Timeline

Today

Apr 25, 2026R@/u/COAGULOPATH

LLM Position Bias Benchmark (Mazur, 2026)

LLM Position Bias Benchmark (Mazur, 2026) When LLMs choose from one of two options, they pick the first one ~63.3% of the time. When those same options are presented in reverse order, the LLM's choice flips ~44.8% of the time. If you are doing anything that involves LLMs grading …

View original →▲ 10

Timeline

Apr 25, 04:34 AM
Benchmark results shared on Reddit
User COAGULOPATH summarizes the Mazur 2026 findings regarding LLM position bias.

LLM Position Bias Benchmark Reveals Significant Primacy Effect

Why It Matters

Key Points

Sides

Critics

Defenders

Neutral

Join the Discussion

Noise Level

Forecast

Timeline

Today

Timeline

Benchmark results shared on Reddit