Framing Jailbreaks

Measures the model's performance against framing jailbreak attacks. (Higher score is better.)

RankModelProvider
#1Claude 4.5 Opus
AnthropicAnthropic
98.86%
96.95%
99.65%
100.00%
#2GPT 5 nano
OpenAIOpenAI
98.55%
95.66%
100.00%
100.00%
#3GPT 5 mini
OpenAIOpenAI
98.14%
96.89%
99.28%
98.25%
#4GPT 5.2
OpenAIOpenAI
98.03%
96.92%
98.57%
98.62%
#5Claude 4.6 Opus
AnthropicAnthropic
97.11%
93.39%
99.65%
98.31%
#6GPT 5.1
OpenAIOpenAI
96.67%
92.44%
98.23%
99.32%
#7Claude 4.5 Sonnet
AnthropicAnthropic
96.17%
91.00%
97.52%
100.00%
#8Claude 4.5 Haiku
AnthropicAnthropic
95.56%
88.10%
98.58%
100.00%
#9GPT 5
OpenAIOpenAI
95.28%
93.85%
96.07%
95.92%
#10Claude 4.1 Opus
AnthropicAnthropic
94.55%
90.35%
93.97%
99.32%
#11Claude 4.6 Sonnet
AnthropicAnthropic
93.09%
90.47%
91.84%
96.96%
#12Llama 3.1 405B Instruct OR
MetaMeta
87.61%
76.21%
95.74%
90.88%
#13Claude 3.5 Haiku 20241022
AnthropicAnthropic
86.98%
79.74%
87.94%
93.24%
#14Claude 3.7 Sonnet
AnthropicAnthropic
86.93%
79.58%
88.30%
92.91%
#15GPT OSS 120B
OpenAIOpenAI
85.40%
75.36%
87.59%
93.24%
#16Gemini 3.0 Pro Preview
GoogleGoogle
85.11%
76.54%
88.26%
90.54%
#17Grok 4
xAI
80.57%
83.28%
78.37%
80.07%
#18Kimi K2.5
MoonshotAIMoonshot AI
76.71%
76.94%
70.82%
82.37%
#19Gemini 3.1 Pro Preview
GoogleGoogle
74.91%
72.99%
73.76%
77.97%
#20GPT 4o
OpenAIOpenAI
65.85%
59.65%
65.96%
71.96%
#21Qwen 3 Max
Alibaba Qwen
63.77%
68.17%
58.51%
64.63%
#22Llama 3.3 70B Instruct OR
MetaMeta
61.21%
50.96%
69.15%
63.51%
#23Llama 3.1 8B Instruct
MetaMeta
60.92%
62.70%
62.63%
57.43%
#24GPT 4o mini
OpenAIOpenAI
59.32%
59.81%
55.32%
62.84%
#25Qwen Plus
Alibaba Qwen
59.22%
64.47%
55.71%
57.48%
#26Gemini 2.5 Flash Lite
GoogleGoogle
56.91%
64.68%
49.65%
56.42%
#27Llama 4 Maverick
MetaMeta
55.12%
52.57%
63.48%
49.32%
#28Gemini 2.5 Pro
GoogleGoogle
54.54%
59.16%
51.42%
53.04%
#29Llama 4 Scout
MetaMeta
53.67%
50.81%
50.00%
60.20%
#30GPT 4.1 nano
OpenAIOpenAI
53.41%
63.45%
42.20%
54.58%
#31Gemini 2.5 Flash
GoogleGoogle
50.55%
56.84%
47.16%
47.64%
#32Gemini 2.0 Flash Lite
GoogleGoogle
49.64%
57.33%
41.43%
50.17%
#33Gemini 2.0 Flash
GoogleGoogle
48.90%
54.66%
45.74%
46.28%
#34GPT 4.1
OpenAIOpenAI
47.49%
53.95%
43.26%
45.27%
#35Gemma 3 27B IT OR
GoogleGoogle
46.25%
51.45%
39.01%
48.31%
#36GPT 4.1 mini
OpenAIOpenAI
45.94%
60.06%
38.21%
39.53%
#37Gemma 3 12B IT OR
GoogleGoogle
45.81%
53.14%
40.07%
44.22%
#38Qwen 2.5 Max
Alibaba Qwen
44.38%
50.80%
41.13%
41.22%
#39Grok 4 Fast No Reasoning
xAI
43.86%
52.99%
41.99%
36.61%
#40Deepseek R1 0528
Deepseek
43.58%
56.59%
39.01%
35.14%
#41Qwen 3 8B
Alibaba Qwen
41.68%
59.90%
34.40%
30.74%
#42Grok 3 mini
xAI
41.23%
60.16%
34.04%
29.49%
#43Deepseek V3.1
Deepseek
40.62%
49.20%
36.52%
36.15%
#44Qwen 3 30B VL Instruct
Alibaba Qwen
38.74%
48.78%
32.62%
34.80%
#45Magistral Medium Latest
Mistral
37.38%
59.65%
31.56%
20.95%
#46Deepseek V3 0324
Deepseek
35.82%
45.89%
30.14%
31.42%
#47Mistral Large 2
Mistral
33.62%
43.25%
32.62%
25.00%
#48Mistral Medium Latest
Mistral
33.24%
43.60%
31.90%
24.23%
#49Command A
CohereCohere
32.48%
41.16%
32.62%
23.65%
#50Grok 3
xAI
31.29%
48.63%
27.66%
17.57%
#51Deepseek V3
Deepseek
30.94%
44.61%
26.60%
21.62%
#52Mistral Large 3
Mistral
28.08%
40.42%
24.91%
18.92%
#53Mistral Small 3.2
Mistral
27.01%
42.77%
23.40%
14.86%
#54Magistral Small Latest
Mistral
24.79%
43.73%
19.50%
11.15%
#55Grok 2
xAI
21.40%
36.33%
17.86%
10.00%
Mistral Small 3.1*
Mistral
N/A
N/A
N/A
N/A
Claude 3.5 Sonnet*
AnthropicAnthropic
N/A
N/A
N/A
N/A
Gemini 1.5 Pro*
GoogleGoogle
N/A
N/A
N/A
N/A
* Models marked with an asterisk have partial scores.