Factuality

The model's ability to provide accurate responses to general knowledge questions using language-specific sources, without fabricating information. (Higher score is better.)

RankModelProvider
#1Gemini 3.0 Pro Preview
GoogleGoogle
83.30%
88.61%
84.76%
76.53%
#2GPT 5.1
OpenAIOpenAI
78.20%
85.77%
78.10%
70.75%
#3GPT 5
OpenAIOpenAI
78.16%
85.77%
75.24%
73.47%
#4Gemini 2.5 Pro
GoogleGoogle
77.63%
83.99%
77.14%
71.77%
#5Grok 4
xAI
76.85%
84.70%
72.38%
73.47%
#6GPT 4.1
OpenAIOpenAI
74.75%
83.63%
69.52%
71.09%
#7Claude 4.5 Opus
AnthropicAnthropic
74.70%
86.48%
65.71%
71.92%
#8Grok 3
xAI
74.19%
81.49%
67.62%
73.47%
#9Claude 3.5 Sonnet
AnthropicAnthropic
73.61%
83.21%
63.81%
73.81%
#10Claude 3.7 Sonnet
AnthropicAnthropic
72.89%
85.05%
62.86%
70.75%
#11Claude 4.1 Opus
AnthropicAnthropic
72.03%
83.99%
64.76%
67.35%
#12GPT 4o
OpenAIOpenAI
71.10%
82.56%
60.00%
70.75%
#13Claude 4.5 Sonnet
AnthropicAnthropic
70.04%
83.63%
60.95%
65.53%
#14Deepseek R1 0528
Deepseek
68.37%
80.00%
62.86%
62.24%
#15Gemini 2.0 Flash
GoogleGoogle
68.31%
77.94%
60.00%
67.01%
#16Deepseek V3 0324
Deepseek
67.70%
77.94%
57.14%
68.03%
#17Deepseek V3
Deepseek
67.02%
77.94%
57.14%
65.99%
#18Gemini 1.5 Pro
GoogleGoogle
66.64%
79.36%
53.33%
67.24%
#19Mistral Large 3
Mistral
66.40%
77.58%
59.05%
62.59%
#20Gemini 2.5 Flash
GoogleGoogle
66.34%
79.36%
58.10%
61.56%
#21Mistral Large 2
Mistral
65.02%
79.36%
51.43%
64.29%
#22Qwen 3 Max
Alibaba Qwen
64.69%
77.94%
56.19%
59.93%
#23Mistral Medium Latest
Mistral
63.91%
77.22%
52.38%
62.12%
#24Grok 3 mini
xAI
63.72%
76.87%
52.38%
61.90%
#25GPT 5 mini
OpenAIOpenAI
63.04%
78.29%
48.57%
62.24%
#26Qwen 2.5 Max
Alibaba Qwen
62.92%
77.58%
50.96%
60.20%
#27Deepseek V3.1
Deepseek
62.07%
77.22%
50.48%
58.50%
#28Llama 4 Maverick
MetaMeta
61.52%
70.82%
55.24%
58.50%
#29Command A
CohereCohere
60.88%
72.24%
49.52%
60.88%
#30Llama 3.3 70B Instruct OR
MetaMeta
60.34%
73.67%
49.52%
57.82%
#31Gemini 2.0 Flash Lite
GoogleGoogle
59.93%
71.68%
47.62%
60.48%
#32Grok 2
xAI
59.66%
78.29%
42.86%
57.82%
#33Llama 3.1 405B Instruct OR
MetaMeta
59.16%
72.24%
45.71%
59.52%
#34GPT 4.1 mini
OpenAIOpenAI
58.58%
70.11%
47.62%
58.02%
#35Qwen Plus
Alibaba Qwen
57.75%
73.31%
48.57%
51.36%
#36Claude 3.5 Haiku 20241022
AnthropicAnthropic
56.80%
70.82%
43.81%
55.78%
#37Magistral Medium Latest
Mistral
56.40%
71.17%
37.14%
60.88%
#38Mistral Small 3.1
Mistral
55.86%
68.33%
43.81%
55.44%
#39Grok 4 Fast No Reasoning
xAI
55.56%
70.36%
38.10%
58.22%
#40GPT 4o mini
OpenAIOpenAI
54.98%
70.46%
39.05%
55.44%
#41Gemini 2.5 Flash Lite
GoogleGoogle
54.67%
65.84%
44.76%
53.40%
#42Claude 4.5 Haiku
AnthropicAnthropic
54.49%
67.62%
43.81%
52.04%
#43GPT 5 nano
OpenAIOpenAI
53.15%
66.19%
37.14%
56.12%
#44Mistral Small 3.2
Mistral
52.58%
67.62%
38.10%
52.04%
#45Magistral Small Latest
Mistral
51.91%
63.70%
40.00%
52.04%
#46GPT OSS 120B
OpenAIOpenAI
51.61%
64.41%
43.81%
46.60%
#47Gemma 3 27B IT OR
GoogleGoogle
51.01%
65.48%
40.95%
46.60%
#48Llama 4 Scout
MetaMeta
46.22%
58.72%
35.24%
44.71%
#49GPT 4.1 nano
OpenAIOpenAI
45.46%
61.57%
35.24%
39.59%
#50Qwen 3 30B VL Instruct
Alibaba Qwen
44.86%
60.71%
32.38%
41.50%
#51Gemma 3 12B IT OR
GoogleGoogle
38.39%
52.67%
26.67%
35.84%
#52Llama 3.1 8B Instruct
MetaMeta
34.97%
48.75%
24.76%
31.40%
#53Qwen 3 8B
Alibaba Qwen
31.84%
43.06%
22.86%
29.59%