Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Codex Daily Benchmarks for Degradation Tracking (Marginlab.ai) (marginlab.ai)
1 point by wendgeabos 44 days ago | past
Claude Code daily benchmarks for degradation tracking (marginlab.ai)
760 points by qwesr123 44 days ago | past | 355 comments
No one is evaluating AI coding agents in the way they are used (marginlab.ai)
1 point by qwesr123 60 days ago | past
Claude Code Daily Degradation Tracker (marginlab.ai)
3 points by qwesr123 64 days ago | past | 3 comments
Anatomy of a Coding Agent: A step-by-step illustration (marginlab.ai)
3 points by qwesr123 82 days ago | past
How are coding assistants evaluated? SWE-Bench Pro Explorer (marginlab.ai)
2 points by qwesr123 84 days ago | past
SWE-Bench: The $500B Benchmark (marginlab.ai)
5 points by qwesr123 86 days ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: