Claude Code vs OpenAI Codex CLI: Benchmark Comparison

Independent benchmark data · Real published scores only
📊 SWE-bench Verified
🏆 Higher Score
Claude Code
Anthropic
71.5
Trust Score V2
95% CI: 68.0 – 75.0
View full profile →
VS
OpenAI Codex CLI
OpenAI
68.4
Trust Score V2
95% CI: 64.9 – 71.9
View full profile →
Score Comparison
Claude Code
OpenAI Codex CLI
Trust Score
71.5
68.4
Functional Acc.
72.5
69.1
Reliability
70.1
63.7
Policy Compliance
94.2
90.1
Key Metrics
Metric Claude Code OpenAI Codex CLI
Trust Score V2 71.5 68.4
Functional Accuracy 72.5 69.1
Reliability Score 70.1 63.7
Policy Compliance 94.2 90.1
SWE-bench Pass@1 0.7% 0.7%
Benchmark SWE-bench Verified SWE-bench Verified
Last Evaluated Mar 13, 2026 Mar 13, 2026
Model Base Claude Opus 4 o3

Need a procurement-grade evaluation report?

Get cost-of-failure modeling, compliance validation, and a certified comparison report for Claude Code and OpenAI Codex CLI — built for enterprise procurement decisions.

Request Evaluation Report →