Two independent studies found that advanced AI cybersecurity models, including Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5, have exceeded previous benchmarks for autonomous cyberattack capability. Researchers from the UK AI Security Institute (AISI) and Palo Alto Networks said the models are now capable of chaining together complex multi-stage attack paths and identifying vulnerabilities at rates that significantly outpace earlier systems.
The UK AI Security Institute said Claude Mythos Preview and GPT-5.5 became the first models to fully complete a simulated enterprise intrusion scenario without human intervention. According to the findings, the models successfully executed tasks including credential theft, privilege escalation, lateral movement, persistence, and protected system access during controlled testing. Researchers said the models consistently outperformed previous-generation systems on autonomous cyber capability benchmarks designed to measure real-world offensive potential.
Separately, Palo Alto Networks said its internal testing showed advanced AI cyber models increased vulnerability discovery rates by more than seven times compared to traditional manual research workflows. Researchers said the models were particularly effective at identifying exploitable weaknesses in enterprise software, cloud configurations, and authentication systems, raising concerns that AI-assisted vulnerability discovery could dramatically accelerate exploit development timelines for both defenders and threat actors.
Josh Marpet, Senior Product Security Consultant, Finite State:
“Unfortunately, this is about as surprising as saying that the sun rises. Nobody was not expecting it. The question is not, can an AI find and run an exploit? We know they can. The question is, can an AI find vulnerable code in a device or application with very little instruction given, write or find the exploit for that vulnerability, and successfully prosecute the exploit through to completion? If the answer is yes, then we are having a bad day.
“The one interesting item is that the quality of the exploits, the discovery, the entire process, is still fairly dependent on the caliber of the person sitting behind the keyboard and directing that AI. For now.
Damon Small, Board of Directors, Xcape, Inc.:
“The emergence of GPT-5.5 and Claude Mythos marks a paradigm shift where autonomous attack-path chaining moves from a theoretical lab risk to a quantifiable operational reality. When an AI can compress a twelve-hour expert reverse-engineering task into ten minutes for less than two dollars, the traditional economics of cyber defense collapse. This capability will inevitably commoditize the high-margin, bespoke manual testing currently sold by security consultancies, forcing a market pivot toward high-level strategy and remediation.
“While these models currently demonstrate low reliability, succeeding in only 20% to 30% of end-to-end attempts, that failure rate is irrelevant to a persistent attacker with near-zero marginal costs. Security leaders must move beyond patching individual vulnerabilities and focus on time-to-break-chain, assuming attackers will use these models to identify and exploit multi-stage paths at machine speed. The priority is no longer just preventing the initial foothold, but ensuring that every compromised node is a dead end through aggressive segmentation and just-in-time access.
- “Death of the Boutique Pen-Test: The automation of complex, multi-stage attack chains will rapidly drive down the cost and delivery time for offensive engagements, turning premium security services into a baseline commodity.
- “Asymmetry of Persistence: A 20% success rate is a failure for a human consultant but a triumph for an AI that never sleeps and costs pennies to restart, allowing attackers to “brute-force” complex architectural flaws.
- “Architectural Resilience vs. Patching: As vulnerability discovery outpaces human remediation capacity, the focus must shift from the “patching treadmill” to building environments that are structurally resistant to lateral movement.
“If your security posture relies on a $500-an-hour consultant to find the “bespoke” vulnerabilities that a $20-a-month chatbot just discovered in bulk, you aren’t paying for security; you are paying for an expensive PDF.”
Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs:
“Palo Alto’s advisory data puts real operational weight behind the AISI benchmarks. Going from fewer than five CVEs per month to 26 in a single advisory cycle, with the majority found by AI scanning, is a preview of what every software vendor will face once these models are widely deployed. The bottleneck has shifted from discovery to remediation, and most organizations are not built to patch at the rate AI finds vulnerabilities.
“Palo Alto estimates a three to five month window before AI driven exploits become the norm. That window is the planning figure security leaders should be working against. This capability is the new baseline, and because different models surface different vulnerability classes, the total volume of findings will only grow as more models reach this tier. Organizations running vulnerability management programs built for five CVEs a month need to start planning for a world where that number is measured in dozens.”
Tom Yates, Product SME, Ridge Security Technology Inc.:
“These findings highlight the urgent and critical need for security companies to be at the leading edge of Gen AI technology. Security tooling must match the capabilities hackers use or your infrastructure will look like swiss cheese to the bad guys. But security buyers need to beware, an avalanche of AI-washing has already hit the market. Buyers must spend more time digging into product claims to ensure that AI is a first-class citizen of the solution, not a “bolt-on” to satisfy marketing needs.”
This is another example of AI welcoming us to the new reality of cybersecurity. Were the time to get pwned has been reduced so much that humans are simply not even in the game. That should scare anyone on that side of the fence.
Finite State CEO Matt Wyckhouse to Lead Expert Panel on “Designing Connected Devices with Security Built In” at IoT Tech Expo North America
Posted in Commentary with tags Finite State on May 15, 2026 by itnerdFinite State today announced that Founder and CEO Matt Wyckhouse will lead a panel on “Designing Connected Devices with Security Built In” at 2:35–3:20 p.m. PT, May 19, 2026, at IoT Tech Expo North America. The conference, one of the industry’s largest gatherings focused on IoT, AI, cybersecurity, edge computing, and digital transformation, will be held at the San Jose McEnery Convention Center in San Jose, California, on May 18–19, 2026.
Leaders from Boeing, SmartTech Research, and EVRaid will join Wyckhouse in examining the evolving security risks facing connected devices, industrial systems, and software-driven products as AI adoption and regulatory pressure accelerate across the IoT ecosystem. Key focus areas of the panel include:
Attendees will gain new insight into how manufacturers and technology providers can improve visibility into device software, manage vulnerabilities across product lines, and prepare for rapidly expanding global cybersecurity regulations impacting connected products.
Finite State Live Demonstrations
At Booth #47, the Finite State team will run live demonstrations of artifact-backed workflows that turn shipped software into audit-ready evidence:
Securing What Ships, at Portfolio Scale
The panel reflects a market reality: for connected-device manufacturers, securing what ships has become a portfolio-scale problem. Vendored components embedded in firmware rarely appear in source manifests, vulnerability volume outpaces manual triage, and the EU Cyber Resilience Act now enforces incident disclosure timelines as short as 24 hours.
The Finite State Product Security OS closes that gap by analyzing shipped firmware directly, prioritizing reachable risks over raw CVE counts, and maintaining one evidence trail per product version. Teams across medical, automotive, industrial, and consumer IoT use it to keep security and compliance current at release cadence. To discuss these capabilities directly, attendees interested in meeting with the Finite State team during the event can book a meeting on the Finite State IoT Tech Expo event page.
Leave a comment »