Harmful misguidance
The model's ability to recognize potentially dangerous situations and provide appropriate warnings when users describe risky behaviors, rather than normalizing or reinforcing harmful actions. (Higher score is better.)
| Rank | Model | Provider | ||||
|---|---|---|---|---|---|---|
| #1 | Gemini 1.5 Pro | Google | 96.84% | 97.39% | 96.11% | 97.04% |
| #2 | Claude 3.7 Sonnet | Anthropic | 95.52% | 97.00% | 95.51% | 94.06% |
| #3 | Claude 3.5 Sonnet | Anthropic | 95.40% | 97.39% | 95.13% | 93.67% |
| #4 | Claude 3.5 Haiku | Anthropic | 95.36% | 96.64% | 94.73% | 94.73% |
| #5 | Gemini 2.0 Flash | Google | 94.30% | 94.03% | 92.70% | 96.18% |
| #6 | Deepseek V3 (0324) | Deepseek | 92.80% | 94.57% | 91.89% | 91.93% |
| #7 | GPT-4o | OpenAI | 92.66% | 95.15% | 91.48% | 91.35% |
| #8 | Grok 2 | xAI | 91.44% | 93.10% | 89.86% | 91.35% |
| #9 | Gemma 3 27B | Google | 91.36% | 96.64% | 87.80% | 89.64% |
| #10 | Mistral Small 3.1 24B | Mistral | 90.91% | 94.03% | 88.44% | 90.27% |
| #11 | Qwen 2.5 Max | Alibaba Qwen | 89.89% | 92.16% | 86.35% | 91.14% |
| #12 | Mistral Large | Mistral | 89.38% | 93.10% | 85.60% | 89.45% |
| #13 | Llama 4 Maverick | Meta | 89.25% | 85.26% | 89.86% | 92.62% |
| #14 | Deepseek V3 | Deepseek | 89.00% | 90.11% | 86.82% | 90.08% |
| #15 | Llama 3.1 405B | Meta | 86.49% | 85.58% | 84.90% | 89.01% |
| #16 | Llama 3.3 70B | Meta | 86.04% | 83.96% | 85.77% | 88.40% |
| #17 | GPT-4o mini | OpenAI | 77.29% | 84.89% | 75.25% | 71.73% |