Rethinking Quality in the World of AI

14 minute read

How agentic workflows shift quality engineering from script execution to shared understanding.

When people hear the word agent, they may still think of espionage, secrecy, or someone in a dark coat exchanging envelopes in a train station. That is not the kind of agent I am talking about here. I mean software agents: AI-assisted workflows that can inspect, reason, generate, execute tools, ask questions, and help move work forward across the development lifecycle.

These agents are still immature in many ways, and there is plenty of hype around them. But even with that caveat, they force us to look again at something software teams have often treated as settled: what quality work is actually for.

Software development has a habit of clinging to practices long after the environment around them has changed. Once something works well enough, it becomes familiar. Once it becomes familiar, it becomes part of the process. And once it becomes part of the process, it can be surprisingly difficult to question, even when the original reason for doing it has weakened.

Testing is no exception. We have already seen several major shifts in how teams think about quality: from waterfall test phases to agile teams, from manual checking to automation, from isolated QA departments to shared ownership, from late verification to continuous feedback. AI is not the first disruption to quality engineering. But it is a different kind of disruption, because it changes the speed and shape of development itself.

Volume Is Not Value

AI-assisted development does not merely help teams write more code. It helps them produce more artifacts of all kinds: code, tests, summaries, documentation, pull request descriptions, risk analyses, review comments, and implementation plans. The problem is that volume has never been the same thing as value. A team can create more tests and still understand the product less. It can generate more documentation and still make worse decisions. It can move faster and still drift further away from the real intent of the work.

Quality comes from the value we create, not the volume we create.

This is easy to say, but difficult to live by. Many teams still measure quality activity by the number of test cases written, the amount of regression coverage maintained, or the visible busyness of testing near the end of a sprint. Those measures are not meaningless, but they are incomplete. They tell us something about activity. They do not necessarily tell us whether the team understands the product, the risks, the user, or the consequences of a change.

Agentic workflows make this distinction harder to ignore.

01Testing the Idea

Idea Iteration X
AI Refinement
- AI Feasibility Analysis and Refinement
- Idea Iteration X.Y
Human Refinement

feedback
idea survives contact with questions
02Agentic Coding

Agentic Coding
Agentic Testing
- AI Risk Analysis
- AI Explorative Testing
- AI Result Review
Agentic TA scripting

feedback
generated work becomes reviewable evidence
03Implementation Review

AI Change Summary and Clarifying notes

Human Review

Update Document / Spec

review loop

shared understanding returns upstream

Agentic development is useful when the loop increases shared understanding, not just output volume.

The Acceleration Trap

A common way to describe the current AI shift is to say that code generation has outpaced validation. That is partly true, but it does not describe the deeper problem accurately enough. Local validation is often not the hardest part anymore. AI can help produce code that compiles, satisfies static analysis, follows style rules, and passes tests. Tooling can generate unit tests, run checks repeatedly, and catch many obvious mistakes faster than a human reviewer could.

The harder question is whether the implementation is right in context. Does it match the intent? Does it solve the actual user problem? Does it fit the surrounding system? Does it preserve the right trade-offs? Does it introduce behavior that looks correct locally but becomes harmful across a wider flow?

In AI-assisted development, the most dangerous failure mode is not always broken code. It is cleanly implemented misunderstanding.

This is the acceleration trap. AI can accelerate implementation faster than organizations can maintain shared understanding. The debt that accumulates is not only technical debt or validation debt. It is cognitive debt, specification debt, and decision debt. The system changes, but the team becomes less able to explain why it changed, what assumptions were made, what evidence supports the change, and which risks remain unresolved.

That is where modern quality engineering becomes more important, not less. Its role is not simply to check outputs after they are produced. It is to help generate understanding throughout the work: understanding of intent, risk, behavior, evidence, and impact.

Repetition Belongs To Machines

This also changes how we should think about regression testing. Traditional manual regression testing has always had an uncomfortable truth at its core: it often uses human intelligence as a poor substitute for machinery. Asking people to repeat the same scripts sprint after sprint is not a good way to use human attention. Humans are not good machines, and machines are not good humans. Repetition belongs to machines. Understanding belongs to people.

When teams depend too heavily on manual regression, they gradually consume the attention that should be used for curiosity, exploration, risk analysis, and deeper thinking. But the answer is not simply to automate everything and declare victory. Automated test suites can become their own burden. If they grow without design, produce unclear signals, require constant maintenance, and lose the trust of the team, they also stop creating value.

The point is not manual versus automated testing. The point is whether the work improves the team’s ability to understand product risk and make better decisions.

This is where agentic workflows become interesting. Their value is not just that they can generate tests faster. Their value is that they can help shift the center of quality work from script production to risk understanding.

Imagine a developer finishing a change. In a traditional workflow, the next steps may involve waiting for review, manually thinking through test cases, updating regression scripts, asking someone else what might be affected, and hoping the right risks are noticed in time. In a better AI-assisted workflow, the change can immediately trigger a broader quality conversation. The system can explain the diff, identify areas that may be affected, generate or update unit and integration tests, propose end-to-end flows, check accessibility concerns, look for missing requirements, and summarize the risks it believes the change introduces.

None of this means the AI is right by default. That would be the wrong lesson. The point is that the human now receives structured material to inspect, challenge, and improve. The workflow gives the developer and the quality engineer something to think with. It turns automation from a separate maintenance burden into a source of earlier feedback.

At its best, this is the first time test automation feels less like additional work and more like genuine assistance. It does not replace human judgment. It gives human judgment more context, earlier.

Seeing More Of The System

One of AI’s most useful strengths is its ability to absorb large amounts of context quickly and provide a natural interface back to that context. A human can read documentation, inspect tickets, review code, compare screens, study logs, and remember previous incidents, but doing all of that across a large system takes time and attention. An agent can be designed to gather much of that surrounding context and present it as questions, summaries, comparisons, or warnings.

That makes it useful for finding problems that are not obvious from a single test case. It can help surface cross-page inconsistencies, design decisions that conflict with a specification, accessibility issues, performance-impacting patterns, unclear acceptance criteria, missing constraints, and contradictions between different parts of a system.

One small example captures the value well. Three separate pages may each look correct when tested individually. Their fields are present, their validation works, and their happy paths pass. Yet together they may form a chain that traps a user in a narrow but real scenario. A human tester might find this through experience, suspicion, or luck. An agent with the right context, rules, and triggers can be made to inspect across those boundaries and raise the concern earlier.

The agent is not valuable because it is somehow wiser than the team. It is valuable because it can help the team see more of the system at once.

There is also a misconception that AI-assisted workflows simply execute instructions without question. Poorly designed workflows do exactly that, and they are risky. But the problem is not that machines cannot ask questions. They can be designed to ask very useful ones. An agent can identify ambiguity, request missing constraints, challenge incomplete specifications, surface conflicting assumptions, and explain which interpretation it is about to use before implementation begins.

The real problem is that many organizations have not designed their AI workflows to ask the right questions at the right time. That is a quality problem. If the workflow rewards speed over clarity, it will produce fast ambiguity. If it rewards output over understanding, it will produce polished uncertainty. If it allows AI to act without making assumptions visible, it will scale decisions the organization has not consciously made.

This means the quality of the loop matters. Human-to-AI clarity matters. AI-to-human explainability matters. Human-to-human alignment matters. The important question is not only whether the code passes checks, but whether the whole workflow improves the quality of information moving between people, tools, and decisions.

Fears Worth Taking Seriously

The fears around AI in testing are understandable. One fear is that AI-generated tests will be flaky. This can happen. A poorly governed system may create brittle selectors, shallow assertions, or tests that look impressive but do not provide stable evidence. But this is not an argument against AI-assisted testing. It is an argument against treating generated output as automatically trustworthy.

Agents can also help reduce flakiness when they are used well. They can compare failing runs, identify unstable waits, propose better selectors, distinguish between product instability and test instability, and explain why a test is unreliable. The goal is not to accept AI-generated tests blindly. The goal is to use AI to make the test system more understandable and maintainable than it was before.

Another fear is that AI will replace testers. I do not think that is the right framing. AI is more likely to remove the need for humans to behave like machines. It will put pressure on shallow testing rituals, because those are precisely the things machines can imitate or automate. But it increases the importance of the work that has always required judgment: understanding intent, asking better questions, assessing risk, noticing gaps, connecting system behavior to user value, and helping teams make decisions.

The identity crisis in quality engineering did not begin with AI. Manual test case authorship, regression test maintenance, and test planning as a documentation exercise have been under pressure for years. Agile, DevOps, CI/CD, automation, and shift-left practices already challenged the idea that quality work happens mostly after implementation.

AI extends that shift. The interesting change is not merely that old QA tasks are being automated. The more important change is that quality is moving into the design of the development pipeline itself.

Build Quality Into The Workflow

Previously, teams often spoke about building quality into the product. That remains true. But now quality also has to be built into the agentic workflow that helps produce the product. That includes the instructions agents receive, the tools they can use, the constraints they operate under, the evidence they collect, the points where humans review decisions, the quality signals that are trusted, and the way output variance is monitored.

In this world, quality engineers start to look more like systems designers, platform thinkers, and governance builders. Their task is not only to assure individual artifacts. It is to help engineer the conditions under which good artifacts are more likely to emerge.

This is also why we need to be careful when we talk about automated analysis. Deterministic automation and AI-assisted analysis are not the same thing. Static analysis, schema validation, fixed policy checks, and repeatable CI rules are consistent by design. LLM-based analysis is different. It can support consistency at scale, but only when it is constrained, governed, and evaluated appropriately.

The quality of AI-assisted analysis depends on instructions, context, model behavior, tool access, evaluation criteria, review requirements, and the amount of freedom the system has to improvise. An unmanaged LLM workflow can be as unreliable as unmanaged human judgment, with the added risk that it can produce confident-looking output very quickly.

So the goal is not to replace disciplined engineering with AI. The goal is to apply disciplined engineering to AI-assisted work.

Governance As Delivery Reality

This is where governance becomes practical rather than abstract. It is not enough to say that an organization needs AI governance or governance architecture. Those phrases are too vague unless they can be translated into delivery reality. In an agentic development workflow, governance should answer concrete questions: What is logged? What is controlled? What is isolated? What is reviewable? Which prompts and outputs are traceable? Which tools can agents use? Which environments can they access? Which actions require approval? What evidence exists for compliance, security, or later review?

These are not bureaucratic concerns. They are the mechanisms that allow a team to use AI without losing control of the development process. Good governance makes the workflow safer, but it also makes it more understandable.

That point is important because early quality is not only about catching defects earlier. It is also about making the development pipeline more legible. In AI-assisted development, speed is no longer the main bottleneck in the same way it used to be. Understanding becomes the bottleneck. Teams need to know what was intended, what was implemented, what assumptions were made, what risks were identified, what evidence exists, what remains uncertain, and what trade-offs were accepted.

A good agentic workflow should make those things easier to see. It should not only produce output faster. It should help the team understand the output better.

Adoption Starts With Reality

This also affects how organizations should adopt AI. There is no single maturity curve that all customers or teams follow. For one organization, AI-assisted development may mean occasional prompt use in a coding assistant. For another, it may mean partially autonomous workflows that inspect repositories, run tests, open pull requests, and summarize risks. Some teams are experimenting deliberately. Others are being pushed toward AI because leadership expects visible adoption before the organization has developed a clear understanding of what AI can and cannot solve.

That uneven reality matters. It is not useful to sell or adopt AI as if every organization has the same problem. The first task is to map where the team actually is: what tools they use, where the pain is, what decisions are already AI-assisted, what risks are increasing, what quality signals are trusted, and where delivery speed has started to exceed shared understanding.

Only after that does it make sense to choose tools or design workflows.

This is why the value story should not be reduced to saving money with AI. That framing is too small. The more interesting value is that AI can make quality capability easier to scale safely. It can help existing expertise reach further across the delivery process. It can help teams handle more complexity and change without scaling cost and coordination overhead in the same linear way as before.

But that only works if the organization builds capability, not just tooling. Buying an AI tool is not the same as improving quality engineering. A tool does not understand the product, the risks, the delivery culture, or the user need by itself. The lasting value comes from learning how to use changing tools and models in a way that supports the organization’s own work.

In practice, this means starting from real needs, solving the highest-pain problems first, designing workflows the team can sustain, making quality signals visible, and helping people understand the process well enough to improve it themselves. Otherwise, AI becomes another layer of technology that produces activity without enough meaning.

The Rethink

The change agentic workflows bring to testing is deeper than faster test generation. They challenge the script mindset. They push quality work toward understanding, risk awareness, design clarity, explainability, evidence, and shared decision-making.

They also force us to be honest about what parts of our current testing practices create value and what parts merely create volume. More tests do not automatically mean more confidence. More automation does not automatically mean better quality. More AI output does not automatically mean more progress.

The question is whether the workflow improves the team’s ability to understand the product and make better decisions.

That is the rethink we need.

AI does not remove the need for testers, developers, product people, or human responsibility. It changes where human expertise is most valuable. It frees people from some machine-like work and makes the human parts of quality more visible: intent, judgment, risk, evidence, governance, alignment, and understanding.

Quality does not come from the volume we create. It comes from the value we create.

Agentic workflows are useful only when they help us create more of that value.

Volume Is Not Value

01Testing the Idea

AI Refinement

02Agentic Coding

Agentic Testing

03Implementation Review