Facebook

Why AI Testing Programs Stall - And What the 15% Who Scale Actually Do Differently

Last Updated: June 5th 2026

If you’re in QA or engineering leadership right now, you’ve probably said some version of this in the last twelve months: “We’re working on AI-powered testing.”

You’re not wrong. According to Katalon’s 2025 State of Software Quality Report, 72% of QA teams actively use AI for test generation or script optimisation. 82% say AI will be critical to their future. The intention is nearly universal.

But the World Quality Report 2025–26 tells a different story about what’s happening at scale: only 15% of organisations have operationalised AI across their QA function. The rest are stuck somewhere in between – running pilots, evaluating tools, or using AI in one corner of the process while everything else runs the same way it did three years ago.

That gap – between 72% experimenting and 15% scaled – is worth taking seriously. The teams stuck in it aren’t failing for lack of effort. They’re failing because the foundations weren’t built for what AI-powered QA actually requires.

Table of Contents

The Real Reason Automation Programs Stall

Most people assume the barrier to scaling AI in QA is skill. Prompt engineering, ML expertise, people who understand LLMs. That’s part of it.

But when you talk to the engineering managers and QA leads still stuck in the gap, a different answer comes up far more often: maintenance.

Not test creation. Maintenance.

Tests break. A developer changes a button label. A designer refactors the nav. A backend team renames an API endpoint. Suddenly 40 tests are failing – not because anything is wrong with the product, but because the selectors the tests were built on no longer match reality.

Someone has to fix them. That someone is usually the same engineer who was supposed to be expanding test coverage, writing new automation, and evaluating AI tools. They’re now spending Tuesday and Wednesday keeping last quarter’s tests alive.

This is the maintenance trap. And it’s why 82% of testers still do manual testing daily even while automation budgets double. They’re not choosing manual over automated. They’re doing manual because the automated suite is broken again.

What CI/CD Velocity Does to This Problem

The maintenance problem isn’t new. But CI/CD has made it dramatically worse.

When teams shipped monthly, a broken test suite was an inconvenience. When teams ship weekly or daily – which is now the norm per DORA’s Accelerate State of DevOps 2024 – a brittle test suite is a blocker. Tests that break on every other commit don’t get fixed. They get disabled. Coverage quietly shrinks while the dashboard still shows green.

33% of organisations plan to embed QA engineers directly into Agile teams over the next two years. That’s the right direction. But embedding QA into fast-moving teams only works if the QA infrastructure can run at that speed. A test suite built for weekly maintenance cycles will not survive daily deploys.

Why the Shift to Codeless Isn’t About No-Code

The codeless testing market is projected to reach $8.6B by 2034, growing at 20.5% CAGR. 39% of companies are actively evaluating codeless tools right now.

The narrative around codeless has always focused on accessibility – letting non-engineers write tests. That’s real, but it’s not the primary reason engineering teams are switching.

The main reason is self-healing.

Codeless platforms don’t just abstract away script syntax. The better ones use multiple locator strategies simultaneously – ID, class, XPath, text, position – so that when one locator breaks because a developer renamed a class, the test engine finds the element another way and updates automatically. The test doesn’t fail. No one gets paged. No one spends Wednesday fixing selectors.

At CI/CD velocity, that’s the difference between a test suite that survives contact with your development team and one that doesn’t.

AI-Generated Code Is About to Make This Harder

There’s one more factor most QA teams haven’t fully priced in: a growing share of production code is now written with AI assistance. And AI-generated code has a specific failure profile.

It’s syntactically clean. It passes obvious tests. It fails at edge cases, boundary conditions, and integration points – because the model generating it doesn’t have full context of the application it’s being dropped into.

This changes what QA needs to prioritise. Boundary testing, contract testing, and security scanning all become more important when you can’t assume the code was written with complete awareness of the surrounding system. 94% of organisations already review production data to inform testing – but nearly half say they struggle to act on those insights fast enough.

The teams that scale AI successfully in 2026 won’t just be the ones using AI to generate tests. They’ll be the ones whose testing infrastructure was built to handle what AI-generated code actually breaks.

What the 15% Are Doing Differently

The teams that have moved from experimenting to scaled share a few things in common. They’re not using more tools or more advanced AI. They made different infrastructure decisions earlier.

  • They stopped treating test maintenance as a backlog item and treated it as a reliability problem to architect around
  • They adopted platforms with self-healing locators so maintenance overhead doesn’t grow linearly with test coverage
  • They built tests as reusable components, not individual scripts – so a UI change means fixing one module, not forty tests
  • They integrated testing natively into CI/CD so tests run on every commit, not when someone remembers to trigger them
  • They gave non-engineers the ability to contribute to coverage, so QA isn’t the bottleneck for the teams shipping around them

None of this is exotic. It’s infrastructure hygiene that gets deprioritised when teams are busy shipping. The irony is that skipping it is exactly what makes shipping harder.

The Honest Summary

AI in QA is not overhyped. The potential is real and the adoption numbers prove the industry believes it. The problem is that AI tools require reliable, maintainable test infrastructure to run on top of. Most teams don’t have that yet – and they’re trying to scale AI on a foundation that was already cracking under manual maintenance overhead.

The gap between 72% experimenting and 15% scaled will close. But it won’t close by adding more AI. It’ll close when teams fix the maintenance problem that’s been there since before AI entered the conversation.

“The teams pulling ahead aren’t the ones with the most AI. They’re the ones who stopped fighting their test suite.”

Go Deeper

This article draws from CloudQA’s recent whitepaper, which synthesises findings from Katalon, the World Quality Report, DORA, and market research from MarketsandMarkets and Future Market Insights.

Download the full report →

Sumant Mehta is co-founder of CloudQA, a codeless QA automation platform used by engineering teams to build, maintain, and scale test coverage without script overhead.

Share this post if it helped!

RECENT POSTS
Guides
Price-Performance-Leader-Automated-Testing

Switching from Manual to Automated QA Testing

Do you or your team currently test manually and trying to break into test automation? In this article, we outline how can small QA teams make transition from manual to codeless testing to full fledged automated testing.

Agile Project Planing

Why you can’t ignore test planning in agile?

An agile development process seems too dynamic to have a test plan. Most organisations with agile, specially startups, don’t take the documented approach for testing. So, are they losing on something?

Testing SPA

Challenges of testing Single Page Applications with Selenium

Single-page web applications are popular for their ability to improve the user experience. Except, test automation for Single-page apps can be difficult and time-consuming. We’ll discuss how you can have a steady quality control without burning time and effort.