Debunking

The model's ability to critically evaluate and address questionable claims, including pseudoscience, conspiracy theories, and other controversial content (Higher score is better.)

RankModelProvider
#1Claude 4.5 Sonnet
AnthropicAnthropic
99.60%
99.73%
99.46%
99.61%
#2GPT 5.2
OpenAIOpenAI
99.57%
99.87%
99.61%
99.24%
#3Claude 4.5 Haiku
AnthropicAnthropic
99.48%
99.60%
99.34%
99.49%
#4Claude 4.5 Opus
AnthropicAnthropic
99.33%
99.59%
99.45%
98.96%
#5Claude 4.6 Opus
AnthropicAnthropic
99.17%
99.34%
98.82%
99.37%
#6Claude 4.6 Sonnet
AnthropicAnthropic
99.09%
99.34%
99.08%
98.86%
#7GPT 5.1
OpenAIOpenAI
98.75%
98.94%
99.34%
97.97%
#8GPT 5
OpenAIOpenAI
98.40%
98.81%
98.55%
97.85%
#9Claude 3.5 Sonnet
AnthropicAnthropic
97.86%
97.47%
98.41%
97.70%
#10Claude 3.7 Sonnet
AnthropicAnthropic
97.13%
97.06%
97.37%
96.95%
#11Kimi K2.5
MoonshotAIMoonshot AI
97.10%
97.35%
97.24%
96.71%
#12Qwen 3 Max
Alibaba Qwen
96.88%
97.61%
96.58%
96.46%
#13Gemini 1.5 Pro
GoogleGoogle
96.57%
98.14%
95.37%
96.20%
#14Claude 4.1 Opus
AnthropicAnthropic
96.55%
97.11%
96.45%
96.09%
#15GPT 5 nano
OpenAIOpenAI
96.55%
97.75%
96.32%
95.57%
#16Claude 3.5 Haiku 20241022
AnthropicAnthropic
96.35%
96.15%
96.45%
96.46%
#17Qwen Plus
Alibaba Qwen
96.32%
96.68%
96.71%
95.57%
#18GPT 4.1
OpenAIOpenAI
96.19%
96.02%
97.24%
95.32%
#19GPT 5 mini
OpenAIOpenAI
96.15%
96.42%
96.45%
95.57%
#20GPT OSS 120B
OpenAIOpenAI
94.97%
95.23%
94.48%
95.19%
#21Qwen 3 8B
Alibaba Qwen
94.35%
94.30%
93.82%
94.94%
#22Gemini 3.1 Pro Preview
GoogleGoogle
94.23%
94.30%
94.35%
94.05%
#23Deepseek R1 0528
Deepseek
93.84%
92.31%
95.40%
93.80%
#24GPT 4o
OpenAIOpenAI
93.19%
92.03%
94.61%
92.91%
#25Gemini 2.0 Flash
GoogleGoogle
92.66%
92.69%
93.29%
91.99%
#26Llama 4 Maverick
MetaMeta
92.07%
92.44%
92.63%
91.14%
#27Grok 4
xAI
91.46%
91.25%
92.12%
91.01%
#28Grok 3 mini
xAI
91.00%
91.11%
92.51%
89.37%
#29Grok 3
xAI
90.76%
91.38%
89.75%
91.14%
#30Command A
CohereCohere
90.37%
90.72%
90.14%
90.25%
#31Gemini 3.0 Pro Preview
GoogleGoogle
89.51%
90.32%
89.49%
88.73%
#32Deepseek V3.1
Deepseek
89.15%
87.27%
90.80%
89.37%
#33Qwen 3 30B VL Instruct
Alibaba Qwen
89.13%
89.92%
89.36%
88.10%
#34Llama 3.1 405B Instruct OR
MetaMeta
89.11%
93.23%
88.04%
86.06%
#35Gemini 2.5 Pro
GoogleGoogle
87.30%
87.53%
88.17%
86.20%
#36GPT 4.1 mini
OpenAIOpenAI
86.90%
87.80%
86.07%
86.84%
#37Gemini 2.5 Flash
GoogleGoogle
86.66%
89.52%
85.41%
85.04%
#38Grok 2
xAI
86.62%
88.73%
83.16%
87.97%
#39Mistral Small 3.1
Mistral
86.58%
84.62%
87.52%
87.59%
#40Llama 4 Scout
MetaMeta
86.47%
87.27%
85.55%
86.58%
#41Deepseek V3 0324
Deepseek
86.29%
84.22%
88.14%
86.51%
#42Mistral Large 2
Mistral
86.22%
86.87%
84.61%
87.20%
#43Deepseek V3
Deepseek
85.91%
84.71%
86.43%
86.58%
#44Qwen 2.5 Max
Alibaba Qwen
85.38%
87.27%
83.29%
85.57%
#45Llama 3.3 70B Instruct OR
MetaMeta
84.38%
87.77%
81.71%
83.65%
#46Mistral Medium Latest
Mistral
83.89%
82.10%
86.05%
83.52%
#47Gemini 2.5 Flash Lite
GoogleGoogle
83.65%
81.30%
84.36%
85.30%
#48Grok 4 Fast No Reasoning
xAI
83.11%
84.48%
81.18%
83.65%
#49GPT 4.1 nano
OpenAIOpenAI
83.02%
84.16%
81.75%
83.14%
#50Gemini 2.0 Flash Lite
GoogleGoogle
82.76%
85.99%
80.11%
82.18%
#51GPT 4o mini
OpenAIOpenAI
82.70%
82.10%
82.87%
83.14%
#52Llama 3.1 8B Instruct
MetaMeta
82.31%
88.46%
80.00%
78.48%
#53Gemma 3 12B IT OR
GoogleGoogle
81.96%
82.23%
80.92%
82.74%
#54Mistral Small 3.2
Mistral
80.80%
81.96%
77.27%
83.16%
#55Mistral Large 3
Mistral
79.40%
79.58%
79.76%
78.86%
#56Magistral Small Latest
Mistral
78.22%
74.67%
78.98%
81.01%
#57Gemma 3 27B IT OR
GoogleGoogle
77.55%
77.01%
76.15%
79.49%
#58Magistral Medium Latest
Mistral
75.49%
73.74%
74.77%
77.97%