Resource · April 2026

LLM cheat sheet for Hermes + OpenClaw agents

April 2026 reference—tier colors, benchmark bars, and full copy. On phones you get stacked cards; from tablet up, the full table with horizontal scroll when needed.

Multimodal routing guide

LLM model stack

18 models · 4 tiers · Hermes · OpenClaw · Claude · Code · Codex — a multi-model routing guide for agent builders, start to finish.

SystemPrompts
April 2026
Tier 1: FrontierTier 2: ExecutionTier 3: BalancedTier 4: Local
Tier 1 — FrontierComplex reasoning · Strategy · Planning · External dev only

Claude Opus 4.6

Anthropic · Feb 2026

Cost (1M in/out)
$5 in · $25 out
Context
1M
Avg / min
Key specs
Chain-of-thought, expert level
Top benchmarks
93.4%
84.6%
53.0%
Best for
Complex terminal coding, multi-step planning
Why this model
Highest-tier reasoning for the hardest agent plans and code paths.

GPT-5.4

OpenAI · Mar 2026

Cost (1M in/out)
$2.50 in · $12.50 out
Context
1.28M
Avg / min
Key specs
Dynamic MCT, superhuman desktop control
Top benchmarks
92.7%
75.6%
62.8%
Best for
Autonomous execution, high-level agency
Why this model
Built for end-to-end autonomy with strong tool and desktop control.

GLM-5.1

Zhipu AI · Apr 2026

Cost (1M in/out)
$1.40 in · $4.40 out
Context
200K
Avg / min
Key specs
7b–60 total / 400 active MoE, Huawei chips
Top benchmarks
91.4%
85.3%
68.7%
Best for
Long-horizon agentic routing
Why this model
MoE scale plus hardware-aware routing for sustained agent runs.
Tier 2 — ExecutionAgent execution · Tool calls · Long task chains · Multi-step pipelines

MiniMax N2.7

MiniMax

Cost (1M in/out)
$0.30 in · $1.20 out
Context
200K
Avg / min
Key specs
Self-evolving CoT, multi-agent loops
Top benchmarks
88.2%
85.1%
57.0%
Best for
OpenClaw execution backbone
Why this model
Reliable execution layer for chained tools and agent loops.

Kimi K2.5

Moonshot

Cost (1M in/out)
$0.60 in · $3.00 out
Context
256K
Avg / min
Key specs
31 experts, 384 active, parallel agentic vision
Top benchmarks
93.2%
73.4%
75.6%
Best for
Multi-source browsing
Why this model
Wide context and parallel vision for research-heavy agents.

Grok 4.20

xAI

Cost (1M in/out)
$2.00 in · $6.00 out
Context
2M
Avg / min
Key specs
8-agent parallel system, real-time X data
Top benchmarks
90.0%
82.0%
70.1%
Best for
Real-time research
Why this model
Massive context plus live signal for attention-sensitive research.

DeepSeek V3.2

DeepSeek

Cost (1M in/out)
$0.27 in · $0.41 out
Context
164K
Avg / min
Key specs
Multi-head latent attention, MLA optimized
Top benchmarks
73.0%
71.2%
Best for
Open-source power-user
Why this model
Efficient attention stack for heavy execution without frontier cost.
Tier 3 — BalancedContext · Code · Research · Day-to-day tasks

Claude Sonnet 4.6

Anthropic

Cost (1M in/out)
$3 in · $15 out
Context
1M
Avg / min
Key specs
Adaptive thinking, 40–60 active
Top benchmarks
72.6%
65.0%
54.0%
Best for
Daily coding, content automation
Why this model
Default “always on” balance of quality, speed, and cost.

GPT-5.4 mini

OpenAI

Cost (1M in/out)
$0.15 in · $4.50 out
Context
400K
Avg / min
Key specs
Native vision, sub-agent optimized
Top benchmarks
55.6%
71.4%
72.7%
Best for
High-speed chat, layer-2 chains
Why this model
Fast passes and sub-agents where full GPT-5.4 is overkill.

Gemini 1.1 Pro

Google

Cost (1M in/out)
$2 in · $12 out
Context
1M
Avg / min
Key specs
Native multimodal, video-audio-action
Top benchmarks
72.3%
84.3%
81.2%
Best for
Multi-modal agents, video/audio analysis
Why this model
First-class media understanding for multimodal agent stacks.

Qwen 3.6 Plus

Alibaba · OpenRouter

Cost (1M in/out)
$0 in · $0 out
Context
1M
Avg / min
via OpenRouter
Key specs
Hybrid MoE, 3.5 on steroids
Top benchmarks
78.8%
Best for
Agent routing, Tier 3 tasking
Why this model
Free-tier routing workhorse with strong MoE throughput.

Llama 4 Maverick

Meta

Cost (1M in/out)
$0.15 – $0.45
Context
1M
Avg / min
provider-dependent
Key specs
400B total, 1.2T parameters
Top benchmarks
85.5%
68.0%
Best for
Self-hosted Tier 3
Why this model
On-prem option that still feels like a mid-tier frontier model.

Mistral Small 4

Mistral

Cost (1M in/out)
$0.15 in · $0.60 out
Context
256K
Avg / min
Key specs
Apache 2.0, reasoning-gated
Top benchmarks
71.7%
74.0%
67.0%
Best for
Modern commerce, scaling
Why this model
License-friendly, low-latency scaling for product workloads.
Tier 4 — Local / MicroSummaries · Routing · Classification · Always-on loops · $0 cost

Qwen 3.6-8B

Local

Cost (1M in/out)
$0.00
Context
252K
Avg / min
Local
Key specs
Thinking toggle, multimodal
Top benchmarks
81.7%
73.3%
57.7%
Best for
Summarization, routing
Why this model
Tiny footprint for 24/7 summarization and intent routing.

Qwen 3.6-27B

Local

Cost (1M in/out)
$0.00
Context
252K
Avg / min
Local
Key specs
32B dense, 255 languages
Top benchmarks
83.6%
77.1%
72.5%
Best for
Local reasoning, micro-classification
Why this model
Step up in logic depth while staying entirely on-device.

Gemma 4 (31B)

Google · local

Cost (1M in/out)
$0.00
Context
256K
Avg / min
Local
Key specs
31B dense, Gemini 2.0 — QFT quantized
Top benchmarks
83.2%
85.0%
76.9%
Best for
Local agentic sub-tasks
Why this model
Gemini-family behavior in a compact, quant-friendly package.

DeepSeek R1 Distill

DeepSeek · local

Cost (1M in/out)
$0.00
Context
128K
Avg / min
Local
Key specs
32B dense distilled from R1
Top benchmarks
54.2%
72.0%
62.3%
Best for
Reasoning-heavy, logic-based tasks
Why this model
Distilled reasoning traces without calling the full R1 endpoint.

GLM-4.5-Air

Zhipu · SiliconFlow

Cost (1M in/out)
Low
Context
128K
Avg / min
via SiliconFlow
Key specs
Multi-purpose, agent-focused
Top benchmarks
71.0%
68.0%
55.0%
Best for
Lightweight agentic sub-tasks
Why this model
Near-free edge tier for browser helpers and micro-tools.
ModelCost (1M In/Out)ContextAvg / MinKey specsTop benchmarksBest forWhy this model
Tier 1 — FrontierComplex reasoning · Strategy · Planning · External dev only
Claude Opus 4.6Anthropic · Feb 2026$5 in · $25 out1MChain-of-thought, expert level
93.4%
84.6%
53.0%
Complex terminal coding, multi-step planningHighest-tier reasoning for the hardest agent plans and code paths.
GPT-5.4OpenAI · Mar 2026$2.50 in · $12.50 out1.28MDynamic MCT, superhuman desktop control
92.7%
75.6%
62.8%
Autonomous execution, high-level agencyBuilt for end-to-end autonomy with strong tool and desktop control.
GLM-5.1Zhipu AI · Apr 2026$1.40 in · $4.40 out200K7b–60 total / 400 active MoE, Huawei chips
91.4%
85.3%
68.7%
Long-horizon agentic routingMoE scale plus hardware-aware routing for sustained agent runs.
Tier 2 — ExecutionAgent execution · Tool calls · Long task chains · Multi-step pipelines
MiniMax N2.7MiniMax$0.30 in · $1.20 out200KSelf-evolving CoT, multi-agent loops
88.2%
85.1%
57.0%
OpenClaw execution backboneReliable execution layer for chained tools and agent loops.
Kimi K2.5Moonshot$0.60 in · $3.00 out256K31 experts, 384 active, parallel agentic vision
93.2%
73.4%
75.6%
Multi-source browsingWide context and parallel vision for research-heavy agents.
Grok 4.20xAI$2.00 in · $6.00 out2M8-agent parallel system, real-time X data
90.0%
82.0%
70.1%
Real-time researchMassive context plus live signal for attention-sensitive research.
DeepSeek V3.2DeepSeek$0.27 in · $0.41 out164KMulti-head latent attention, MLA optimized
73.0%
71.2%
Open-source power-userEfficient attention stack for heavy execution without frontier cost.
Tier 3 — BalancedContext · Code · Research · Day-to-day tasks
Claude Sonnet 4.6Anthropic$3 in · $15 out1MAdaptive thinking, 40–60 active
72.6%
65.0%
54.0%
Daily coding, content automationDefault “always on” balance of quality, speed, and cost.
GPT-5.4 miniOpenAI$0.15 in · $4.50 out400KNative vision, sub-agent optimized
55.6%
71.4%
72.7%
High-speed chat, layer-2 chainsFast passes and sub-agents where full GPT-5.4 is overkill.
Gemini 1.1 ProGoogle$2 in · $12 out1MNative multimodal, video-audio-action
72.3%
84.3%
81.2%
Multi-modal agents, video/audio analysisFirst-class media understanding for multimodal agent stacks.
Qwen 3.6 PlusAlibaba · OpenRouter$0 in · $0 out1Mvia OpenRouterHybrid MoE, 3.5 on steroids
78.8%
Agent routing, Tier 3 taskingFree-tier routing workhorse with strong MoE throughput.
Llama 4 MaverickMeta$0.15 – $0.451Mprovider-dependent400B total, 1.2T parameters
85.5%
68.0%
Self-hosted Tier 3On-prem option that still feels like a mid-tier frontier model.
Mistral Small 4Mistral$0.15 in · $0.60 out256KApache 2.0, reasoning-gated
71.7%
74.0%
67.0%
Modern commerce, scalingLicense-friendly, low-latency scaling for product workloads.
Tier 4 — Local / MicroSummaries · Routing · Classification · Always-on loops · $0 cost
Qwen 3.6-8BLocal$0.00252KLocalThinking toggle, multimodal
81.7%
73.3%
57.7%
Summarization, routingTiny footprint for 24/7 summarization and intent routing.
Qwen 3.6-27BLocal$0.00252KLocal32B dense, 255 languages
83.6%
77.1%
72.5%
Local reasoning, micro-classificationStep up in logic depth while staying entirely on-device.
Gemma 4 (31B)Google · local$0.00256KLocal31B dense, Gemini 2.0 — QFT quantized
83.2%
85.0%
76.9%
Local agentic sub-tasksGemini-family behavior in a compact, quant-friendly package.
DeepSeek R1 DistillDeepSeek · local$0.00128KLocal32B dense distilled from R1
54.2%
72.0%
62.3%
Reasoning-heavy, logic-based tasksDistilled reasoning traces without calling the full R1 endpoint.
GLM-4.5-AirZhipu · SiliconFlowLow128Kvia SiliconFlowMulti-purpose, agent-focused
71.0%
68.0%
55.0%
Lightweight agentic sub-tasksNear-free edge tier for browser helpers and micro-tools.

Prices as of April 2026. Active in Green / Hermes / OpenClaw. Red = Frontier / strategy. Blue = Balanced. Orange = Local.

Pricing and model IDs are illustrative; verify with providers before production routing.