LLM cheat sheet for Hermes + OpenClaw agents (Apr 12, 2026)

Tier 1 — FrontierComplex reasoning · Strategy · Planning · External dev only

Claude Opus 4.6

Anthropic · Feb 2026

Cost (1M in/out): $5 in · $25 out
Context: 1M
Avg / min: —
Key specs: Chain-of-thought, expert level
Top benchmarks: 93.4%
84.6%
53.0%
Best for: Complex terminal coding, multi-step planning
Why this model: Highest-tier reasoning for the hardest agent plans and code paths.

GPT-5.4

OpenAI · Mar 2026

Cost (1M in/out): $2.50 in · $12.50 out
Context: 1.28M
Avg / min: —
Key specs: Dynamic MCT, superhuman desktop control
Top benchmarks: 92.7%
75.6%
62.8%
Best for: Autonomous execution, high-level agency
Why this model: Built for end-to-end autonomy with strong tool and desktop control.

GLM-5.1

Zhipu AI · Apr 2026

Cost (1M in/out): $1.40 in · $4.40 out
Context: 200K
Avg / min: —
Key specs: 7b–60 total / 400 active MoE, Huawei chips
Top benchmarks: 91.4%
85.3%
68.7%
Best for: Long-horizon agentic routing
Why this model: MoE scale plus hardware-aware routing for sustained agent runs.

Tier 2 — ExecutionAgent execution · Tool calls · Long task chains · Multi-step pipelines

MiniMax N2.7

MiniMax

Cost (1M in/out): $0.30 in · $1.20 out
Context: 200K
Avg / min: —
Key specs: Self-evolving CoT, multi-agent loops
Top benchmarks: 88.2%
85.1%
57.0%
Best for: OpenClaw execution backbone
Why this model: Reliable execution layer for chained tools and agent loops.

Kimi K2.5

Moonshot

Cost (1M in/out): $0.60 in · $3.00 out
Context: 256K
Avg / min: —
Key specs: 31 experts, 384 active, parallel agentic vision
Top benchmarks: 93.2%
73.4%
75.6%
Best for: Multi-source browsing
Why this model: Wide context and parallel vision for research-heavy agents.

Grok 4.20

xAI

Cost (1M in/out): $2.00 in · $6.00 out
Context: 2M
Avg / min: —
Key specs: 8-agent parallel system, real-time X data
Top benchmarks: 90.0%
82.0%
70.1%
Best for: Real-time research
Why this model: Massive context plus live signal for attention-sensitive research.

DeepSeek V3.2

DeepSeek

Cost (1M in/out): $0.27 in · $0.41 out
Context: 164K
Avg / min: —
Key specs: Multi-head latent attention, MLA optimized
Top benchmarks: 73.0%
71.2%
Best for: Open-source power-user
Why this model: Efficient attention stack for heavy execution without frontier cost.

Tier 3 — BalancedContext · Code · Research · Day-to-day tasks

Claude Sonnet 4.6

Anthropic

Cost (1M in/out): $3 in · $15 out
Context: 1M
Avg / min: —
Key specs: Adaptive thinking, 40–60 active
Top benchmarks: 72.6%
65.0%
54.0%
Best for: Daily coding, content automation
Why this model: Default “always on” balance of quality, speed, and cost.

GPT-5.4 mini

OpenAI

Cost (1M in/out): $0.15 in · $4.50 out
Context: 400K
Avg / min: —
Key specs: Native vision, sub-agent optimized
Top benchmarks: 55.6%
71.4%
72.7%
Best for: High-speed chat, layer-2 chains
Why this model: Fast passes and sub-agents where full GPT-5.4 is overkill.

Gemini 1.1 Pro

Google

Cost (1M in/out): $2 in · $12 out
Context: 1M
Avg / min: —
Key specs: Native multimodal, video-audio-action
Top benchmarks: 72.3%
84.3%
81.2%
Best for: Multi-modal agents, video/audio analysis
Why this model: First-class media understanding for multimodal agent stacks.

Qwen 3.6 Plus

Alibaba · OpenRouter

Cost (1M in/out): $0 in · $0 out
Context: 1M
Avg / min: via OpenRouter
Key specs: Hybrid MoE, 3.5 on steroids
Top benchmarks: 78.8%
Best for: Agent routing, Tier 3 tasking
Why this model: Free-tier routing workhorse with strong MoE throughput.

Llama 4 Maverick

Mistral Small 4

Mistral

Cost (1M in/out): $0.15 in · $0.60 out
Context: 256K
Avg / min: —
Key specs: Apache 2.0, reasoning-gated
Top benchmarks: 71.7%
74.0%
67.0%
Best for: Modern commerce, scaling
Why this model: License-friendly, low-latency scaling for product workloads.

Tier 4 — Local / MicroSummaries · Routing · Classification · Always-on loops · $0 cost

Qwen 3.6-8B

Local

Cost (1M in/out): $0.00
Context: 252K
Avg / min: Local
Key specs: Thinking toggle, multimodal
Top benchmarks: 81.7%
73.3%
57.7%
Best for: Summarization, routing
Why this model: Tiny footprint for 24/7 summarization and intent routing.

Qwen 3.6-27B