---
title: "UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do — Stuff That Spins"
description: "In a study covering seven benchmarks, the UK's AI Security Institute shows that standard AI evaluations systematically underestimate agent capabilities by capping the compute budget. On software engineering tasks, success rates jumped about 25 percent when the token budget was increased tenfold. Ne…"
	canonical: "https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do"
html: "https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do"
json: "https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do.json"
markdown: "https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do.md"
keywords: ["SpinGraph", "spin analysis", "GEO"]
date: "2026-07-03T16:14:44+00:00"
modified: "2026-07-03T17:03:15.554527+00:00"
json_ld: |
  {"@context":"https://schema.org","@graph":[{"@type":"NewsArticle","@id":"https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do#article","headline":"UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do","description":"In a study covering seven benchmarks, the UK's AI Security Institute shows that standard AI evaluations systematically underestimate agent capabilities by capping the compute budget. On software engineering tasks, success rates jumped about 25 percent when the token budget was increased tenfold. Ne…","datePublished":"2026-07-03T16:14:44+00:00","dateModified":"2026-07-03T17:03:15.554527+00:00","url":"https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do","mainEntityOfPage":{"@type":"WebPage","@id":"https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do"},"isAccessibleForFree":true,"inLanguage":"en-US","articleSection":"ai","author":{"@type":"Organization","name":"Stuff That Spins"},"publisher":{"@id":"https://stuffthatspins.com/#organization"},"citation":"https://the-decoder.com/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do/","about":[],"mentions":[]},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Stuff That Spins","item":"https://stuffthatspins.com/"},{"@type":"ListItem","position":2,"name":"UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do","item":"https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do"}]}]}
---

# UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do

**Source:** Unknown  
**Published:** July 3, 2026  
**Original:** https://the-decoder.com/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do/  

---
*HTML version: https://stuffthatspins.com/spin/uks-ai-security-institute-finds-standard-benchmarks-systematically-underestimate-what-ai-agents-can-actually-do*