Leaderboard

    Best AI for Agentic Tasks 2026.

    Find the best AI for agentic tasks and tool use. Ranked by OSWorld, ToolAthlon, MCP Atlas, tau-bench, BrowseComp, and more agent benchmarks.

    Claude Opus 4.6
    Anthropic
    0.8%0.7%0.6%
    Claude Sonnet 4.6
    Anthropic
    0.7%0.7%0.6%
    A
    Qwen3 VL 235B A22B InstructOSS
    Alibaba Cloud / Qwen Team
    0.6%0.7%
    Claude Opus 4.5
    Anthropic
    0.7%0.6%
    Claude Sonnet 4.5
    Anthropic
    0.6%0.9%
    Claude Haiku 4.5
    Anthropic
    0.5%
    A
    Qwen3 VL 235B A22B ThinkingOSS
    Alibaba Cloud / Qwen Team
    0.6%0.4%
    A
    Qwen3 VL 8B ThinkingOSS
    Alibaba Cloud / Qwen Team
    0.5%0.3%
    A
    Qwen3 VL 8B InstructOSS
    Alibaba Cloud / Qwen Team
    0.5%0.3%
    A
    Qwen3 VL 4B ThinkingOSS
    Alibaba Cloud / Qwen Team
    0.5%0.3%
    A
    Qwen3 VL 30B A3B ThinkingOSS
    Alibaba Cloud / Qwen Team
    0.6%0.3%
    A
    Qwen3 VL 30B A3B InstructOSS
    Alibaba Cloud / Qwen Team
    0.6%0.3%
    A
    Qwen3 VL 4B InstructOSS
    Alibaba Cloud / Qwen Team
    0.6%0.3%
    ChatGPT-4o Latest
    OpenAI
    Claude 3 Haiku
    Anthropic
    Claude 3 Opus
    Anthropic
    Claude 3 Sonnet
    Anthropic
    Claude 3.5 Haiku
    Anthropic
    0.5%
    Claude 3.5 Sonnet
    Anthropic
    0.7%
    Claude 3.7 Sonnet
    Anthropic
    0.8%
    Claude Opus 4
    Anthropic
    0.8%
    Claude Opus 4.1
    Anthropic
    0.8%
    Claude Sonnet 4
    Anthropic
    0.8%
    C
    Command R+OSS
    Cohere
    DeepSeek R1 Distill Llama 70BOSS
    DeepSeek
    DeepSeek R1 Distill Qwen 32BOSS
    DeepSeek
    DeepSeek-R1OSS
    DeepSeek
    DeepSeek-R1-0528OSS
    DeepSeek
    0.1%
    DeepSeek-V2.5OSS
    DeepSeek
    DeepSeek-V3OSS
    DeepSeek
    DeepSeek-V3 0324OSS
    DeepSeek
    DeepSeek-V3.1OSS
    DeepSeek
    0.3%
    DeepSeek-V3.2 (Non-thinking)OSS
    DeepSeek
    DeepSeek-V3.2-ExpOSS
    DeepSeek
    0.4%
    Devstral Medium
    Mistral AI
    Devstral Small 1.1OSS
    Mistral AI
    B
    ERNIE 4.5
    Baidu
    Gemini 1.0 Pro
    Google
    Gemini 1.5 Flash
    Google
    Gemini 1.5 Flash 8B
    Google
    Gemini 1.5 Pro
    Google
    Gemini 2.0 Flash
    Google
    Gemini 2.0 Flash-Lite
    Google
    Gemini 2.5 Flash
    Google
    Gemini 2.5 Flash-LiteOSS
    Google
    Gemini 2.5 Pro
    Google
    Gemini 2.5 Pro Preview 06-05
    Google
    Gemini 3 Flash
    Google
    0.7%0.5%0.6%
    Gemini 3 Pro
    Google
    0.7%
    Gemini 3.1 Flash-Lite
    Google
    Showing 150 of 174 models

    Building with these APIs?

    Get 10+ Next.js AI templates with auth, payments, and more.

    Get Templates — $249

    All Large Language Models

    Amazon

    3 models

    Baidu

    1 models

    Cohere

    1 models

    Inception

    1 models

    LG AI Research

    1 models

    Nous Research

    1 models

    StepFun

    1 models

    Xiaomi

    1 models
    Best AI for Agentic Tasks 2026 — AI Agent Leaderboard | AnotherWrapper