Two independent studies found that advanced AI cybersecurity models, including Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5, have exceeded previous benchmarks for autonomous cyberattack capability. Researchers from the UK AI Security Institute (AISI) and Palo Alto Networks said the models are now capable of chaining together complex multi-stage attack paths and identifying vulnerabilities at rates that significantly outpace earlier systems.
The UK AI Security Institute said Claude Mythos Preview and GPT-5.5 became the first models to fully complete a simulated enterprise intrusion scenario without human intervention. According to the findings, the models successfully executed tasks including credential theft, privilege escalation, lateral movement, persistence, and protected system access during controlled testing. Researchers said the models consistently outperformed previous-generation systems on autonomous cyber capability benchmarks designed to measure real-world offensive potential.
Separately, Palo Alto Networks said its internal testing showed advanced AI cyber models increased vulnerability discovery rates by more than seven times compared to traditional manual research workflows. Researchers said the models were particularly effective at identifying exploitable weaknesses in enterprise software, cloud configurations, and authentication systems, raising concerns that AI-assisted vulnerability discovery could dramatically accelerate exploit development timelines for both defenders and threat actors.
Josh Marpet, Senior Product Security Consultant, Finite State:
“Unfortunately, this is about as surprising as saying that the sun rises. Nobody was not expecting it. The question is not, can an AI find and run an exploit? We know they can. The question is, can an AI find vulnerable code in a device or application with very little instruction given, write or find the exploit for that vulnerability, and successfully prosecute the exploit through to completion? If the answer is yes, then we are having a bad day.
“The one interesting item is that the quality of the exploits, the discovery, the entire process, is still fairly dependent on the caliber of the person sitting behind the keyboard and directing that AI. For now.
Damon Small, Board of Directors, Xcape, Inc.:
“The emergence of GPT-5.5 and Claude Mythos marks a paradigm shift where autonomous attack-path chaining moves from a theoretical lab risk to a quantifiable operational reality. When an AI can compress a twelve-hour expert reverse-engineering task into ten minutes for less than two dollars, the traditional economics of cyber defense collapse. This capability will inevitably commoditize the high-margin, bespoke manual testing currently sold by security consultancies, forcing a market pivot toward high-level strategy and remediation.
“While these models currently demonstrate low reliability, succeeding in only 20% to 30% of end-to-end attempts, that failure rate is irrelevant to a persistent attacker with near-zero marginal costs. Security leaders must move beyond patching individual vulnerabilities and focus on time-to-break-chain, assuming attackers will use these models to identify and exploit multi-stage paths at machine speed. The priority is no longer just preventing the initial foothold, but ensuring that every compromised node is a dead end through aggressive segmentation and just-in-time access.
- “Death of the Boutique Pen-Test: The automation of complex, multi-stage attack chains will rapidly drive down the cost and delivery time for offensive engagements, turning premium security services into a baseline commodity.
- “Asymmetry of Persistence: A 20% success rate is a failure for a human consultant but a triumph for an AI that never sleeps and costs pennies to restart, allowing attackers to “brute-force” complex architectural flaws.
- “Architectural Resilience vs. Patching: As vulnerability discovery outpaces human remediation capacity, the focus must shift from the “patching treadmill” to building environments that are structurally resistant to lateral movement.
“If your security posture relies on a $500-an-hour consultant to find the “bespoke” vulnerabilities that a $20-a-month chatbot just discovered in bulk, you aren’t paying for security; you are paying for an expensive PDF.”
Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs:
“Palo Alto’s advisory data puts real operational weight behind the AISI benchmarks. Going from fewer than five CVEs per month to 26 in a single advisory cycle, with the majority found by AI scanning, is a preview of what every software vendor will face once these models are widely deployed. The bottleneck has shifted from discovery to remediation, and most organizations are not built to patch at the rate AI finds vulnerabilities.
“Palo Alto estimates a three to five month window before AI driven exploits become the norm. That window is the planning figure security leaders should be working against. This capability is the new baseline, and because different models surface different vulnerability classes, the total volume of findings will only grow as more models reach this tier. Organizations running vulnerability management programs built for five CVEs a month need to start planning for a world where that number is measured in dozens.”
Tom Yates, Product SME, Ridge Security Technology Inc.:
“These findings highlight the urgent and critical need for security companies to be at the leading edge of Gen AI technology. Security tooling must match the capabilities hackers use or your infrastructure will look like swiss cheese to the bad guys. But security buyers need to beware, an avalanche of AI-washing has already hit the market. Buyers must spend more time digging into product claims to ensure that AI is a first-class citizen of the solution, not a “bolt-on” to satisfy marketing needs.”
This is another example of AI welcoming us to the new reality of cybersecurity. Were the time to get pwned has been reduced so much that humans are simply not even in the game. That should scare anyone on that side of the fence.
Trump’s AI oversight order exposes a gap: consumer social AI is flying under the radar
Posted in Commentary with tags AI, USA on May 21, 2026 by itnerdAs President Donald Trump moves to sign an executive order on AI oversight, the policy conversation is dominated by national security and enterprise risk — but consumer-facing AI platforms, where users are trusting AI with something as personal as their social lives and relationships, are barely part of the debate. The order raises a critical question: who sets the standard for emotional safety, transparency, and user consent in AI that mediates human connection?
Gidi Cohen, CEO & Co-founder, Bonfy.AI had this to say:
“The reported shift toward federal oversight of frontier AI models reflects something the security community has been watching develop for some time: the recognition that AI systems are no longer just productivity tools — they are infrastructure.
What’s notable about this moment isn’t the regulatory instinct. It’s what’s driving it. Reports of AI models autonomously discovering software exploits and scaling cyber operations aren’t abstract risks. They’re demonstrations of the same challenge we see playing out inside enterprises every day: AI systems that behave in ways their deployers didn’t anticipate, at speeds that outpace human review.
At Bonfy, we call this the “Shady AI” problem — not unauthorized AI, but sanctioned AI behaving in ways that violate policy or intent. The national security version of this problem is just the frontier model at civilizational scale.
The instinct to require pre-release government review of frontier models makes sense if you frame it the way Washington now appears to: as dual-use technology with offensive capability, not software. But a 90-day review window won’t solve the underlying challenge. The risk isn’t just in what a model can do before deployment — it’s in how it behaves when embedded in workflows, connected to tools and data, and operating semi-autonomously at machine speed.
That’s the architectural reality facing enterprise security teams today, and it’s why data security can no longer rely on perimeter controls and metadata. When AI agents are the actors, you need visibility into the data flowing through them — not just the permissions around them.
The government is arriving at a conclusion that security practitioners have been working through in parallel: that AI requires a different kind of oversight, one grounded in behavior and context, not just access configuration.”
For measures to be effective, they have to cover as many use cases as possible. This measure doesn’t do that, which means it may not have the intended effect at the end of the day.
Leave a comment »