AI in Software Delivery beyond Copilot: reimagining software delivery

From Typing to Value: A First Aha and Today’s Shifting Ground

Sarah contrasts early-career pride in “real programming” (C++ headers, pointers, and memory management) with a later ThoughtWorks epiphany: real programming is delivering value early to customers. She draws a parallel to the present, describing a second aha as AI reshapes software delivery and forces teams to rethink what matters. Setting expectations, she stresses how fast the landscape shifts—slides were updated the night before—and likens adoption to working in quicksand. This frames the talk’s theme: focus less on keystrokes and more on outcomes amid rapid AI-driven change.

Framing the Conversation: Three AI Loops and Why the Inner Loop Matters

Sarah introduces a simple structure to avoid “everything everywhere all at once” discussions: three loops of AI. She defines the inner loop (builder productivity in software development), the middle loop (business process optimization across functions like call centers and finance), and the outer loop (AI inside products: personalization, chatbots, agentic offerings). She narrows the talk to the inner loop to keep a clear focus on how AI changes day-to-day developer work. This framing anchors the rest of the presentation in practical, delivery-oriented impacts.

Why Copilot Isn’t Sticking: Adoption Frictions and the Human Change Curve

Sarah addresses a common leadership question—“We gave teams Copilot, why aren’t they using it?”—by unpacking barriers. Early-stage tool maturity, training-data gaps for niche stacks, developer experience, problem complexity, information overload, inconsistent outputs, and tolerance for errors all shape uptake. She pushes back on replacing collaboration by highlighting pair programming’s enduring value, especially for juniors. The segment pivots from tools to change management: leaders can accelerate individuals’ change curves by guiding, not just provisioning software.

Hype vs. Data: Productivity Claims and the Toolchain’s Evolution

Sarah examines bold productivity claims, noting current evidence is largely perceptual (e.g., positive signals in the DORA survey) and tied to early capabilities like “autocomplete on steroids.” Using a train-versus-faster-horse analogy, she argues we’re only a few iterations into realizing true gains. She traces the trajectory from autocomplete and chat toward agents and richer context providers, citing emerging IDEs (Cursor, Windsurf, Klein) and agents that can raise PRs. This context situates inner-loop gains as real but uneven, with more evolution to come.

Demo Deconstructed: Prompt-to-Code on the Mars Rover Kata

Sarah describes a prompt-to-code demo using Klein to solve ThoughtWorks’ classic “Mars Rover” interview exercise. With a brief problem statement and constraints (Java/JavaScript, Maven, tests), the agent planned the solution, scaffolded the project, and wrote code and tests with human approvals at each step. As a seasoned reviewer, she says the resulting solution would merit an immediate interview—her second big aha on AI’s coding potential. She tempers the excitement by noting repeatability gaps: similar tasks achieved 97% success one day and failed the next, underscoring volatility.

Autonomous Agents and the Quality Question: Repeatability, Duplication, and Refactoring

Reporting on repeated autonomous agent trials, Sarah notes agents consistently produced working solutions yet missed important quality issues—like duplicated code—in most runs. She highlights broader signals from industry data: code volume surged after code assistants, while moved (refactored) code dropped to near zero. The takeaway is clear: AI accelerates adding code but not stewarding it, risking rising complexity and total cost of ownership. This segment raises the central inner-loop challenge—go faster without eroding maintainability.

Leading Through Quicksand: Guardrails, Rituals, and Measurement

Sarah outlines concrete leadership practices to harness AI responsibly: maintain human code reviews (don’t offload to peers), monitor code quality, shift-left on testing, and run “AI gone wrong” rituals to share failures and lessons. Psychological safety is essential so teams can experiment and learn in uncertain terrain. She recommends measuring both sentiment and flow with developer experience tools (blending qualitative surveys with repo and ticket data), including modules for AI productivity. The message aligns with the talk’s theme: disciplined engineering amplifies AI’s benefits.

Beyond Coding: Eliminating Waste and Extending AI Across the SDLC

Faster coding shifts the bottleneck to backlog, testing, and tech debt, so Sarah urges attacking system-wide waste—often reducing cycle time dramatically before AI even enters. She then broadens the aperture: apply AI across the entire software delivery lifecycle—planning, requirements, design, testing, deployment, and operations. Borrowing from education, she offers three questions to guide teams: what thinking is fundamental and must not be outsourced, what is mechanical and can be, and which AI tools fit best. This reframes adoption as end-to-end productivity, not just faster typing.

Team-Level Augmentation: From ChatGPT One-Offs to Haven’s Shared Context

Sarah maps GenAI “superpowers” (translation, knowledge retrieval, brainstorming, summarization and clustering) to SDLC activities, noting the tool market skews heavily toward coding assistants. Because many teams default to individual ChatGPT usage, she introduces Haven, an open-source prompt collection designed to bring shared context to team workflows. Haven supports ideation, pessimistic scenario analysis, requirements breakdown, and threat modeling so groups can explore and decide together. This segment shows how to scale augmentation from individuals to teams, consistent with the inner-loop theme.

The Road Ahead: AI-Native Development, Legacy Modernization, and Phoenix Code

Looking forward, Sarah explores how AI can tackle hard engineering problems: modernization, understanding COBOL at scale via CodeConcise (ASTs, GraphQL, RAG), and even reverse-engineering black boxes from binaries. She envisions AI-native software development—AI at the center, seamless human–agent collaboration, responsible practices, and leadership literacy—already visible in startups using agentic tools like Cursor and Lovable. The near term keeps humans in the loop; longer term, end-to-end prompting could enable autonomous delivery and “Phoenix” code that self-heals and adapts to library upgrades. She closes with a call for empathetic leadership to build windmills, not walls, as teams cross the adoption hump.

Thank you, Andrea, and thank you room.

It's pretty exciting time to be in tech right now, I've got to say. But do you remember your first 'aha' moment?

I'll tell you mine. I was working as a grad developer in an organization. We were doing C++ coding, and I was doing real programming. I was writing header files. I was dealing with pointers and references and garbage collection.

Now, we had other teams in our organization that did C-Sharp.

So my boss one day sent me across to a C-Sharp training course.

Oh, what did I find in that C-Sharp training course? I'll tell you. They weren't doing real programming.

They were tab, tab, tabity tab, and all of a sudden, their signature blocks appeared. They didn't need to think about references and pointers. They didn't even know about garbage collection.

They were not real programmers.

So I went back to my boss, and we were all pretty excited. Now, I'll tell you what, it wasn't until I joined ThoughtWorks and in the first eight weeks of working on a project, I delivered far more software - working software to our customer than I ever had in my three years working as a C++ developer.

And that was the first time I had an epiphany.

Maybe real programming wasn't about the typety, typety, typety. Maybe real programming was about getting value to the customers and doing that quickly and early so that they could actually make use of the software that we were producing.

And I'm having that same epiphany right now when we're thinking about the role that AI plays in software delivery.

What I'm seeing through the tool space and through lots of people experimenting, is some really interesting innovation going on in our industry right now. And I think it's the second time in my career that I've had such a huge 'aha' moment. So today I want to talk to you a little bit about what those movements are and what the future might look like.

Now, before I begin though, I'm going to tell you that everything that I say today is wrong. Well, not wrong, but it'll be outdated before I sit down. And I can guarantee that because in the time that it takes me to write these slides and write this deck, I have to continuously change it and update it all the time. In fact, as late as last night, I was adding new information to it. So everything that I say, please take it with a grain of salt and don't quote me in three weeks down the track when I'm wrong.

Because really, we're working in quicksand.

This is the thing that I'm feeling the most. We're working in quicksand.

We're trying to get a nice stable base for our teams to join this revolution, but we're fundamentally working on shifting sands. Every three months, a new breakthrough comes out, new tool sets, new ways to use this. And so not only are we trying to change hearts and minds within our organizations, what we're trying to teach them, how they need to go about it, that advice fundamentally changes too. So who feels like this at the moment? When every time a conversation about AI comes up, It's everything all at once, everywhere all at once. I get that.

And so the more I think about it and the more conversations that I have, I feel like we need to have just a little bit of structure around what we're talking about. Because if we don't, we kind of go off in lots of different directions. So here's the framework that I tend to use when I talk about this with organizations. There are three loops of AI. The first loop is Inner Loop, the builder productivity. That's how we build and develop software. The second loop is our Middle Loop. It's our business process optimization.

That's the role that AI plays in things like call centers or improving claims processing or working in finance teams or even working between business units and between different parts of our business. Then there's the third Outer Loop, which talks about the role that AI plays in our products.

So it might be through full personalization.

It might be through chat bots. Or it might be a new agentic ecosystem offering that you have.

Today, I want to focus in on the inner loop. So those other two are really interesting spaces to be in as well, and I can definitely talk a lot about them. But we're going to focus and zero in on this Inner Loop, this builder productivity.

When we think about this change curve that we're having, and when I talk to organizations, most organizations are really in this maturity at the moment or at this level of adoption when it comes to AI and software delivery.

There's either an awareness or there is a focus on AI assisted code and the focus is more so on it being assisted and driving adoption throughout the organization.

That's a really good place to start.

But when I speak to a lot of tech leaders who are in this part of the adoption cycle, the most common question that I'm getting is: "I've given my teams GitHub Copilot, why aren't they using them? Why aren't we getting better adoption?" So this is what I tell them. There are many factors affecting the adoption of GitHub Copilot in teams right now. Well, the first one is the LLMs and tools. We're really in the early stages with lots of promise and maybe less maturity. But it's still quite early in the tool development right now.

Some other factors that are involved is the prevalence of your tech stack in the training data. If you're working in a very proprietary language, I can almost guarantee you that won't be present in the training data. If you're working in Java, JavaScript, Python, it's more relevant. The experience of your developers also changes. I've heard many stories about junior developers going for a lot longer with a code assistant than they would normally before they had to interrupt a senior developer. I'm going to pause right there because I fundamentally disagree with that being someone who has promoted pair programming for such a long time. If nothing else, that is the very reason why we have advocated pair programming since at least 18 years.

The complexity of the problem. It's just too big. It's not a boilerplate solution.

We haven't solved this before, and therefore the code assistants just aren't helping us.

There's so much information. It's just really difficult.

It's like drinking from a fire hose right now.

The repeatability of results, like it's - you run thing one - you run and prompt one day and you don't get the same answers the next. And often it comes down to our tolerance for errors and also our tolerance for a slowdown as people learn to use these tools.

So these are just very wide and generalized factors that are limiting adoption. This is what I'm telling other tech leads about that. But more importantly, it's not about tools.

More importantly, it's about how - rewiring how we work and how we think. Now, we know about change management. We've gone through that when we brought Agile to the industry. You will go through a change curve, and it's not just about giving a tool and expecting someone to use it. Every individual goes through their own version of this change curve, and how they go through it is largely determined by how you as a tech leader help them navigate through it. If they're doing it by themselves, they will go through that at their own pace. If you do it together as an organization, you can help accelerate that change curve.

Now, let's face it, with all the hype that's out there, there's some pretty bold claims. But are they actually fact or fiction? And so this is the number one reason why I hear developers tell me that they're slow to adopt these, because they're hearing the hype and it's not playing out.

Let's unpack the most significant one that you've probably had hurdled your way.

This is a stat that has loved to do the rounds.

55% productivity gained through GitHub Copilot. Who believes that?

Let's look into it. All right, so Dora Report. I love the Dora Report. It gives us a snapshot on on how teams are feeling about different aspects.

Last year, one of the questions that they included in the survey was people's perception about AI. Still, we're working very much in a perception area right now rather than a measured result. But perception is a large part of what we do anyway. So we're seeing 75% of respondents reported positive productivity gains. So people who are using tools do actually seem to find them to be useful.

So that's good. And I wanted to take you back to the place where we were in the industry with a tool chain when that survey went out. Copilot was really all about autocomplete. It was a glorified tab, tab, tab. So it was helping on very small parts where the unit assistance was really at that method level.

And it was really just a chat.

But that really just got us bit faster horses. So what we're really doing right now is rethinking and trying to imagine what the new car is or what the new train is. And a side tangent on trains and where - the development of the train, it set out to be a fast - a better horse, a faster horse, more productive, but a human could walk faster than it in its first version. It wasn't until about the third or the fourth iteration of what the train was that you actually started to see the productivity improvements. And that's where we kind of are right now with the AI tool chain.

We can see the promise. We've still got to go through a couple of iterations before we can actually live up to productivity gains. We're on this trajectory and we started with autocomplete on steroids and chat. Now we're working towards agents and context providers, but through new IDEs like Cursor, Windsurf, Cline. And now in the latest development, the latest new announcements, we're now moving to a lot more autonomous agents like the OpenAI Codex, which go and just create PRs for you. So this is where we are at the moment. So we've got 'prompt to code'. This should mean this starts playing...

I want to just briefly look at what this might be. So this is a video of someone writing - solving a problem with Cline, and Claude as a backdrop. Again, this was taken in January, February timeframe. Things have moved on since then.

But this is when my second 'aha' moment came.

Because what this program is doing right now is solving one of ThoughtWorks' well-known interview questions.

So as part of our interview processes, we gave candidates a problem to solve, the Mars Rover problem.

I can say this with freedom now because it's been exposed on the internet so many times and we don't use it anymore. But it was a really neat program to show the level of quality that a programmer put into it. I have been on the reviewing end of many, many of these coding assignments, and it's very easy for me to assess them now. I can look at them and very quickly see how well written the code actually is. This has just prompted with a problem statement that we want to use Maven, that we want to use Java or JavaScript and that we want tests for it. And then the client has taken over, created a plan, created the folder structure that it was going to have, created the code snippets, and all the while just prompting to say, "do you accept this change or do you need to alter what we're doing?" And in the end, the code it produced was so impressive as a blind reviewer, I would have said, "I want to see that person straight away." That is how impressive this thing is.

Now, as impressive as it is, the problem starts to come with repeatability, because another team has used Claude Code to port a tool that we have called CodeConcise into a different language. And they found that 97%-- it saved us 97% of the work. Then they went to do it the next day, and it failed completely. You can read that report. We've got it. There's a QR code on there. It's just - we've published this on martinfowler.com.

Oh, no, this one was published on thoughtworks.com, I think.

But this is one thing that we want to do as Thoughtworks - continuously publish what we're seeing and what we're finding so you can follow it along and follow our experiments along. So now we're getting to the autonomous - so we wanted to test out some of these coding assistants, these autonomous coding assistants. And so Bergitta, who heads up our AI FSD experiment team, had a look at - actually a lot of - all of the autonomous agents that are out there right now, but specifically at Codex. And she got it to do the same task over and over again, repeatedly ran it to see what the results were like, to see how repeatable it was. The good news is that the agents came up with a working solution every time. So that's very good news. But unfortunately, six runs of it, only two times did the respective agent find this piece of code which led to duplicated code. So, the bold claim about being productive. I think this is true. I think it's making us going faster. But now the question that we've got to answer is about the quality of the code.

Because this is another report that Gitclear put out towards the end of last year. What they found and they measured code over the years.

They have found at the point in time when Copilot and other code assistants came about, the number of code added into code bases increased.

However, at the same time, the number of moved code decreased to 0%. What does that tell us? Tells us refactoring is not taking place.

It tells us coding assistants are really great at adding new code, but because they're really fast at doing that, our code bases are increasing at a really fast and rapid speed.

And we know what happens when code bases get too big and too unwieldy. We have a total cost of ownership problem.

It's really easy to see typical AI missteps where AI is not doing the right job at the commit level because it just doesn't commit. It just doesn't pass. Tests don't pass. It doesn't compile. Or it doesn't do what - it doesn't solve the problem. It's a lot harder to see the overall complexity that it's introducing to your code base. The lack of reuse that you're getting through your system.

The verbose or redundant tests that it's creating.

So, as we go through the adoption curve and the adoption cycle of moving it through, as tech leaders, this is something that you need to be the most acutely aware about.

Your teams will yell at you when the things aren't working.

They'll know how to do a change at an individual level.

You need to make sure certain practices are still taking place, like reviewing code. Don't offload that to other team members. if you're going to be using code assistants to generate the code, that's great, but you have to be the reviewer. Don't offload that to your colleagues. But at the team level, you need to really focus in right now on monitoring your code quality, shifting left, testing, doing 'AI gone wrong' rituals. So bring the teams around and say, this is where it works and this is where it didn't. But most importantly, introducing psychological safety into your team. Letting the teams fail and go through this. We're working in quicksand.

No one knows the solutions right now. We're all... someone described it to me as a bit like Emperor's New Clothes, and I can see that. GenAI is an indiscriminate amplifier. It's giving us gold or garbage. It cranks both to 11. And so here are some things that you can actually do within your teams.

We really like GetDX as a product. It's done by the same people that wrote the Accelerate book and came up with Dora metrics initially, Dr. Nicole Forsgren.

They have a tool now which helps teams look at quality measures through their system. It looks at things like speed and throughput, effectiveness, quality, and impact, and does that through both survey qualitative and answers and quantitative by actually looking at Jira and your Git repos as well. They've also introduced a new module in there that actually starts to track AI productivity.

Keep an eye on quality as your team don't just measure the speed and the productivity, because otherwise we're going to be building a problem for us to clean up later on. And also, engineering practices still matter, if not more. So all of the engineering practices that made XP good, things like clean code, fast feedback, simplicity and repeatability, they actually make AI even better for you.

And all of these other things like vertical slices, simple design, pair programming, test-driven development, these are all things that will help you on your AI journey.

All right, but once we start looking at faster coding, we're going to start pushing a lot of pressure on the rest of the system. Because if we can code faster and have a higher throughput, how can we fill the backlog faster? And if we can code faster, how can we test faster? and how can we also make sure our technical debt is in check?

Because what we know is even without AI, we have such a huge amount of waste within systems. Most organizations that I go to have only about 30% of value-added delivery in their systems. And we've been working hard with organizations to get them to a part of like 60% value adds. and if you can do that, you're actually introducing about a 30 to 50% increase in productivity.

So you can go from having a 26-day cycle time to a 16 to 20-day cycle time just by removing the waste within your systems. And this is with the next stage that organizations go through on this adoption journey. So once you've got over the adoption hurdle, now teams are starting to think about how can we accelerate productivity - not accelerate productivity for coders - do it for the whole team. So now we're looking at the role that AI can play within the whole software delivery lifecycle. So it could be through planning, requirements design, other aspects of software engineering, testing, deployment, and then operations.

I want to pause a second. because I think now's the time to introduce some really interesting questions that are not created by people in tech.

This is - my sister is a teacher at a private girls school in Brisbane - and she has a background with a PhD in archeology, and she's now their ancient history teacher. And so she of all people are tasked with finding out the role of AI in education for their students. Which I find very ironic, I will say. But because she is so academic and because she has such a strong focus in understanding the past, she's come up with these three questions that every subject is asked. The first one is: What thinking is fundamental to my subject and should not be outsourced?

What thinking is mechanical and can be outsourced to expedite learning?

and then: What is the best AI tool to use? And I love these questions because I think they're universal across all professions, including our own. So the first one: What is fundamental to software engineering or software delivery that should not be outsourced? What is thinking in mechanical and can be outsourced to expedite software delivery? And: What is the best AI tool to use. So I think these are really great framing questions that you can ask your teams, because I think everyone's struggling with the question of: Will my job disappear?

And I don't believe your jobs will disappear, but I do think that they will change. And so this is a great framing device that teams can think about in order to help them through this adoption journey. So then we can have a look at: What are the superpowers of Gen AI? We've got translation finding knowledge, brainstorming ideation and summarization and clustering. And we can look at the parts of our software delivery lifecycle and work out which of these parts of our activities can be augmented with AI. It's really hard right now because - I mean we're keeping track of all the tools that are out there that help offer delivery - and an overwhelming majority are code-based tools. So these are coding assistants and there are far and fewer tools that help other people of our team or other activities in our team. And so the tool landscape is pretty thin.

Which means there are a couple out there like Figma UI helps with wireframing, Lovable helps with prototyping and Gen AI, and lots of people in the startup game are using this to go test their ideas and thinking they're really impressive to see the things that they can do. But most teams are actually reaching out to ChatGPT to get going with this. The problem with doing that is that it's on an individual level and not really at a team level. So we've kind of got an answer for that right now. It's - we've created a tool - well, it's not a tool. We've created a collection of prompts, stuck it on the internet. It's an open source.

You can go grab a copy of Haven. But it It aims to help teams have a collective context as they're working through and chatting and exploring things like ideation. So coming up with ideas to a new product, a new - if you want to know a pessimistic realistic example of what might go wrong with your code, it can help you with a requirements analysis. So breaking down stories and epics into the majority of the story written for you; help with threat modeling, so thinking through the different go-wrongs that could happen, and then come up with scenarios for how you might treat them.

And so that is how some teams are getting through the accelerated.

Now, I want to touch really briefly on what the future could look like. So once we're working well with each of the tools, and once the tools are kind of caught up with what we're trying to do, this is where we start to get augmented. And this is where we can start asking the question: How can Gen AI help us solve those hard engineering problems we've been unable to tackle manually? Things like modernization.

Things like being able to understand COBOL code bases in a way that we can chat to it, using ChatGPT on the front. So we've solved that problem. We wrote about it on martinfowler.com with a tool called CodeConcise, which takes COBOL abstract syntax trees, puts it into a GraphQL, puts a ChatGPT and a RAG-based model on top of that. So you can actually start Inquirer and have a look at the code. So now you can start to understand legacy code bases. We've also been thinking about things like reverse engineering a black box because we've heard lots of clients and one in particular tell us that they've lost the code of something that's running in production. So they've got the binaries and they've got the code - no code. So how can AI accelerate reverse engineering that black box? I've had some, we've been running these experiments. We've got some really interesting results - enough for us to want to move from a dummy project into actually working with a client to see.. We believe it is possible.

But there are many other things that we can solve in this engineering space. That's where our focus right now is on. Because what we want to get to is a position where we've got native AI software development, where it's not just an upgrade, it's a reinvention. A fundamental shift in the way we conceive, develop, and deliver software.

What might that look like?

Teams that where AI is the center of creation, integrated AI deeply into the core, emphasize seamless collaboration between humans and AI agents, ethical and responsible development of AI, enhance AI literacy across the leadership teams. We might be sitting there thinking that future is very far ahead of us, but it's not. And I know it's not because I know this is what startups are doing out of the gate. They're using Cursor and Lovable and a whole bunch of other agentic tools that are around and they're just getting on with the job. It's just us enterprises that I feel a lot slower kind of moving on this journey. So what that might look like is where you're just prompting through each of the stages. At the moment, it could be guarded outputs, human intervention, humans still in the loop looking through. But it's not too hard to imagine a world where once we solved all of the different problems along this track, that we can actually go a lot faster through end-to-end quite autonomously. So go from requirements all the way through to deployment. If we get there, we might be able to get 'Phoenix' software, so self-healing code.

Every time a new version comes out, you don't have to spend a whole bunch of time factoring a library upgrade into it. It just automatically happens.

So that, we think, is what the future might look like.

We've just got to get there. We've got to get there with different layers in our organization, different thinking, and most importantly, through empathetic leadership. And I'm hoping that we can talk about that a little bit more on the panel afterwards.

Because the future is bright, the future is interesting, but we just have to get through this hump right now of adoption.

And lastly, and this is the last thing I'll leave you with, this lovely quote: "When the winds of change blow, some build walls, others build windmills." Build the windmill.

Warning: all information in this deck is wrong outdated

Everything, everywhere, all at once

Surreal collage illustrating overwhelm: a central person holds their head while surrounded by numerous people and everyday objects floating in midair. A professional video camera at the right records the scene. The background shows a sweeping sky and distant horizon, emphasizing chaos and motion.

The Three Loops of AI (Execution Model)

This model conceptualizes AI adoption in three concentric loops, each representing increasing impact, complexity, and business visibility.

  • Inner Loop (Builder Productivity)

    Fit: Automate low-value, high-volume engineering tasks to reclaim capacity for innovation and resiliency.

  • Middle Loop (Business Process Optimization)

    Fit: Automate & continuously improve cross-functional workflows to hit cost, resiliency, and compliance targets.

  • Outer Loop (CX and Product Experience)

    Fit: Deliver adaptive, AI-driven experiences that deepen engagement and unlock new revenue.

Reimagining Software Delivery for the Age of AI

The Next "Agile"

Change curve

  • AwareAI Aware focus on some exploratory pilots, an awareness of tools in market, building AI literacy.
  • AssistedAI assisted code focus on tool adoption, improving AI literacy.

Why isn’t the adoption higher?

Diagram labeled "Change curve" illustrates a curved progression divided into two segments: Aware and Assisted. Each segment has a callout summarizing its focus areas. A footer question asks why adoption is not higher.

Why isn’t the adoption higher?

Factors affecting adoption limitations

  • LLMs & Tools are still in early stages, with lot of promise and less maturity

Factors affecting adoption limitations

  • LLMs & Tools are still in early stages, with lot of promise and less maturity
  • Prevalence of the tech stack in the training data
  • Experience of the developer(s)
  • Complexity of the problem
  • So much information, it's hard to keep up
  • Repeatability of results

Factors affecting adoption limitations

  • LLMs & Tools are still in early stages, with lot of promise and less maturity
  • Prevalence of the tech stack in the training data
  • Experience of the developer(s)
  • Complexity of the problem
  • Information overload: so much information, it’s hard to keep up
  • Repeatability of results
  • Tolerance for errors and temporary slowdown

It’s not about the tools — it’s rewiring how we work

Change Curve: Emotional response to change

Axes: Morale and competence (vertical) vs time (horizontal).

  1. Shock — Surprise or shock at the event
  2. Denial — Disbelief; looking for evidence that isn’t true
  3. Frustration — Recognition that things are different; sometimes angry
  4. Depression — Low mood; lacking energy
  5. Experiment — Initial engagement with the new situation
  6. Decision — Learning how to work in the new situation; feeling more positive
  7. Integration — Changes integrated; a renewed individual
Diagram of a change curve: a wavy line starts slightly up, dips to a low point, then rises to a higher endpoint with an arrow. Stages along the path, left to right: Shock, Denial, Frustration, Depression (lowest point), Experiment, Decision, Integration. The vertical axis represents morale and competence and the horizontal axis represents time.

There’s some bold claims out there - are they fact or fiction?

developers who used GitHub Copilot completed the task significantly faster—55% faster than the developers who didn’t use GitHub Copilot

Research: Quantifying GitHub Copilot’s impact on code quality — GitHub Copilot

Perceptions of productivity due to AI

Seventy-five% of respondents reported positive productivity gains from AI in the three months preceding our survey, which was fielded in early 2024.

DORA report 2024

Chart labels

Y-axis (Answer):

  • Extremely increased my productivity
  • Moderately increased my productivity
  • Slightly increased my productivity
  • No impact on my productivity
  • Slightly decreased my productivity
  • Moderately decreased my productivity
  • Extremely decreased my productivity

X-axis: Percentage of respondents

Error bar represents 89% uncertainty interval.

Figure 4: Respondents’ perceptions of AI’s impacts on their productivity.

Horizontal bar chart comparing respondents’ perceptions of AI’s impact on their productivity. The largest share reports “Slightly increased my productivity,” followed by “Moderately increased.” “No impact” is also a substantial portion. A smaller segment reports “Extremely increased,” and only small minorities report any level of decrease (slight, moderate, or extreme). Error bars are shown for each category to indicate an 89% uncertainty interval.

AI autocomplete: first wave of AI

  • AI fills in the details
  • Unit of AI assistance is the method
Screenshot of a code editor showing Python code for a class named PromptList and an autocomplete prompt that suggests “exclude files named ‘README.md’,” illustrating method-level code completion inside the __init__ function.
Historical black-and-white photograph showing an early motorized vehicle concept: a small self-propelled platform with four wheels and a driver at a steering wheel pulls a traditional horse-drawn carriage body carrying two passengers. The scene appears on a street with a wrought-iron fence and building in the background, illustrating a transitional design between carriage and automobile.

From autocomplete to agents

Autocomplete on steroids

  • GitHub Copilot et al.
  • Suggests next line or block based on limited local context
  • Pattern-driven
  • Surprisingly effective

Chat

  • IDE sidebar or standalone
  • Avoids context switching to browser, but integration is limited

Deeper IDE integration

  • “Fix using Copilot”
  • “Explain using Copilot”
  • Direct prompting or ‘command’ to the assistant, rather than comments or method signatures

Chat with the codebase

  • “Can you see if there are tests for … ?”
  • “Let’s introduce a test that …”
  • Automatic (ish) understanding of structure, dependencies, broader codebase

Agents and context providers

  • Semi-autonomous supervised agents
  • Agents pursue goals via multi-step workflows, tool use, and ‘reasoning’
  • Model Context Protocol (MCP) adds files, folders, docs, web, Git, API ecosystem…

Autonomous agents

  • Headless
  • Creates PR
  • e.g., OpenAI Codex, Google Jules, Devin

Unit of AI assistance: method

Unit of AI assistance: problem

Vibe coding!!

A horizontal, multi-segment arrow timeline depicts the evolution of AI coding tools from “Autocomplete on steroids” and “Chat,” through “Deeper IDE integration” and “Chat with the codebase,” to “Agents and context providers” and finally “Autonomous agents.” Brackets beneath label the shift in focus from “Unit of AI assistance: method” on the left to “Unit of AI assistance: problem” on the right. A starburst callout reads “Vibe coding!!”.

Prompt to Code: multi file editing

  • ✅ The AI wrote a fully functional solution
  • ✅ It generated unit tests
  • ✅ It passed a code review with flying colors

Prompt to Code

multi file editing

  • The AI wrote a fully functional solution
  • It generated unit tests
  • It passed a code review with flying colors
Screenshot of Visual Studio Code showing the editor welcome screen alongside an AI assistant/chat panel used for coding assistance and multi-file editing.

Prompt to code: Claude example

“Claude Code saved us 97% of the work — then failed utterly
  • Goal: Speed up adding support for new programming languages in CodeConcise — requires an AST parser to be built and integrated.
  • Used Claude Code, a new terminal-based supervised coding agent from Anthropic, to build support for Python (and later, JavaScript) in CodeConcise.
  • Amazing first result: Adding Python support took just 3 minutes of agent time, with high quality output.
  • Total failure on other languages: Asking for JavaScript support used the wrong libraries, introduced broken assumptions and unverifiable code.
  • Conclusion: Claude Code can massively accelerate routine implementation — when the stars align.

Autonomous coding agents

Codex example

Autonomous background coding agents

Headless agents that you send off to work autonomously through a whole task. Code gets created in an environment spun up exclusively for that agent, and usually results in a pull request. Some of them also are runnable locally though.

Tool examples: OpenAI Codex, Google Jules, Cursor background agents, Devin, …

Solution quality

I ran the same prompt 3 times in OpenAI Codex, 1 time in Google’s Jules, 2 times locally in Claude Code (which is not fully autonomous though, I needed to manually say ‘yes’ to everything). Even though this was a relatively simple task and solution, turns out there were quality differences between the results.

Good news first, the agents came up with a working solution every time (leaving breaking regression tests aside, and to be honest I didn’t actually run every single one of the solutions to confirm). I think this task is a good example of the types and sizes of tasks that GenAI agents are already well positioned to work on by themselves. But there were two aspects that differed in terms of quality of the solution:

  • Discovery of existing code that could be reused: In the log here you’ll find that Codex found an existing component, the “dynamic data renderer”, that already had functionality for turning technical keys into human readable versions. In the 6 runs I did, only 2 times did the respective agent find this piece of code. In the other 4, the agents created a new file with a new function, which led to duplicated code.
  • Discovery of an additional place that should use this logic: The team is currently working on a new feature that also displays category names to the user, in a dropdown. In one of the 6 runs, the agent actually discovered that and suggested to also change that place to use the new functionality.

Autonomous coding agents

Codex example

Autonomous background coding agents

Headless agents that you send off to work autonomously through a whole task. Code gets created in an environment spun up exclusively for that agent, and usually results in a pull request. Some of them also are runnable locally though.

Tool examples: OpenAI Codex, Google Jules, Cursor background agents, Devin, …

Solution quality

I ran the same prompt 3 times in OpenAI Codex, 1 time in Google’s Jules, 2 times locally in Claude Code (which is not fully autonomous though, I needed to manually say “yes” to everything). Even though this was a relatively simple task and solution, there were quality differences between the results.

Good news first: the agents came up with a working solution every time (leaving breaking regression tests aside, and to be honest I didn’t actually run every single one of the solutions to confirm). I think this task is a good example of the types and sizes of tasks that GenAI agents are already well positioned to work on by themselves. But there were two aspects that differed in terms of quality of the solution:

  • Discovery of existing code that could be reused: In the log you’ll find that Codex found an existing component, the dynamic data renderer, that already had functionality for turning technical keys into human readable versions. In the 6 runs I did, only 2 times did the respective agent find this piece of code. In the other 4, the agents created a new file with a new function, which led to duplicated code.
  • Discovery of an additional place that should use this logic: The team is currently working on a new feature that also displays category names to the user in a dropdown. In one of the 6 runs, the agent actually discovered that and suggested also changing that place to use the new functionality.

AI-assisted codebases are getting bigger, faster

  • Duplicate code
  • Corrections within 2 weeks
  • Refactoring

Code Operations and Code Churn by Year

  • Added code increases
  • Churn increases
  • Moved code goes to 0%

Source: GitClear Code Quality Research 2025

Line chart titled “Code Operations and Code Churn by Year” showing relative distribution of code operations from 2020 to 2025, with 2025 marked as a projection. The trends highlighted on the chart: added code rises over time; churn rises sharply beginning around 2022; moved code declines to nearly zero by 2025. The legend indicates categories such as added, deleted, updated, moved, copy/pasted, find/replaced, and churn. A side bullet list uses arrows to indicate increases in duplicate code and in corrections within two weeks, and a decrease in refactoring.

Typical AI missteps and their impact radius

Mitigation

Individual level
  • Review, review, review
  • Recognise when to stop
  • Sunk cost fallacy
Team level
  • Code quality monitoring
  • Shift-left
  • "AI-gone-wrong" rituals
  • Psychological safety

Impact radius

  • Commit: Slows down time to commit, instead of boosting it.
    • No working code
    • Misdiagnosis of problems
  • Iteration: Creates friction for the team.
    • Too much upfront work
    • Misunderstood requirements
    • Brute force fixes instead of root cause analysis
    • Complicating the developer workflow
  • Codebase lifetime: Negatively impacts long term maintainability of the code.
    • Overly complex implementations
    • Lack of reuse
    • Verbose or redundant tests
Diagram with three concentric rings labeled, from center outward: Commit, Iteration, Codebase lifetime. Arrows call out effects at each radius:
  • Commit → slows down time to commit; results: no working code, misdiagnosis of problems.
  • Iteration → creates friction for the team; results: too much upfront work, misunderstood requirements, brute force fixes instead of root cause analysis, complicating the developer workflow.
  • Codebase lifetime → negatively impacts long‑term maintainability; results: overly complex implementations, lack of reuse, verbose or redundant tests.

Typical AI missteps and their impact radius

Mitigation

Individual level
  • Review, review, review
  • Recognise when to stop
  • Sunk cost fallacy
Team level
  • Code quality monitoring
  • Shift-left
  • “AI-gone-wrong” rituals
  • Psychological safety

Impact radius (from center outward)

  • Commit
  • Iteration
  • Codebase lifetime

Impact notes

  • Negatively impacts long term maintainability of the code
  • Creates friction for the team
  • Slows down time to commit, instead of boosting it
Resulting issues
  • Overly complex implementations
  • Lack of reuse
  • Verbose or redundant tests
  • Too much upfront work
  • Misunderstood requirements
  • Brute force fixes instead of root cause analysis
  • Complicating the developer workflow
  • No working code
  • Misdiagnosis of problems
Concentric ring diagram showing an impact radius: center labeled “Commit,” middle ring “Iteration,” and outer ring “Codebase lifetime.” Arrows connect rings to callouts: top notes long-term maintainability harm; right notes team friction with a vertical stack of issues; bottom notes slower time to commit with callouts “No working code” and “Misdiagnosis of problems.”

GenAI is an indiscriminate amplifier — give it gold or garbage, and it cranks both to 11

Don’t just ship faster — ship better

Track quality as relentlessly as velocity

CORE 4

Speed
  • Key metric: Diffs per engineer (PRs or MRs) — not at individual level
  • Secondary metrics:
    • Lead time
    • Deployment frequency
    • Perceived rate of delivery
  • Data collection: Systems; Self-report
Effectiveness
  • Key metric: Developer Experience Index (DXI) — a predictive benchmark of developer experience, developed by DX
  • Secondary metrics:
    • Time to 10th PR
    • Ease of delivery
    • Regrettable attrition — only at organizational level
  • Data collection: Systems; Self-report; Experience sampling
Quality
  • Key metric: Change failure rate
  • Secondary metrics:
    • Failed deployment recovery time
    • Number of incidents per engineer
    • Security-related metrics
  • Data collection: Systems; Self-report
Impact
  • Key metric: % of time spent on new capabilities
  • Secondary metrics:
    • Initiative progress and ROI
    • Revenue per Engineer — only at organizational level
    • R&D as % of revenue — only at organizational level
  • Data collection: Systems; Self-report

Four-column framework titled “CORE 4” outlining delivery metrics categories: Speed, Effectiveness, Quality, and Impact. For each category, the table lists a key metric, supporting secondary metrics, and how data is collected.

Engineering practices still matter, if not more.

Good practices mitigate the GenAI risks, and help manage the quality of more code.

Core values

  • Clean code
  • Fast feedback
  • Simplicity
  • Repeatability

Practices by context

  • Web and Mobile Development
    • Vertical slices
    • Automate repetitive tasks
  • Data Engineering
    • Simple design
    • Collective code ownership
  • Cloud and DevOps
    • Pair programming
    • Refactoring
  • Microservice Architecture
    • Test driven design
    • Continuous integration
Circular diagram showing four core values at the center—Clean code, Fast feedback, Simplicity, and Repeatability—surrounded by eight supporting practices. The practices are grouped into four quadrants labeled Web and Mobile Development, Data Engineering, Cloud and DevOps, and Microservice Architecture.

Higher coding throughput is here — and putting pressure on the system.

  • If you can code faster, can you fill the backlog faster?
  • If you can code faster, can you review faster? Can you test faster? Can you ship faster?
  • If you can produce more code, can you also keep your technical debt in check?

Higher coding throughput

Conceptual flow diagram showing work items moving from a backlog (left) through a coding stage labeled “Higher coding throughput,” then into downstream stages for review, testing, and shipping (right). The center section is widened to indicate faster coding, which creates pressure both on backlog intake and on downstream capacities. A final segment highlights the need to control technical debt despite increased output.

Flow and productivity suffer from substantial waste, regardless of code assistants.

Team effectiveness

  • Current state: Overhead / Waste > 70%; Value-add delivery < 30%
  • Future state: Value-add delivery 60%
Sources of friction
  • Finding information
  • Slow feedback
  • Cognitive friction
  • DX friction
  • Operating model friction
Diagram comparing team effectiveness across two segmented bars labeled Current state and Future state. The current bar is mostly overhead/waste (>70%) with a smaller value‑add segment (<30%). The future bar increases value‑add to 60%. Segments correspond to friction categories: finding information, slow feedback, cognitive friction, DX friction, and operating model friction.

Waste reduction can deliver a potential 30-50% increased productivity

This translates into a potential opportunity of optimizing cost of engineering upto 20% annually.

Illustration of how reduction in waste leads to increased productivity*

Team productivity

Common Impediments to Flow & Their Cost

  • Current state: Overhead/waste ~70%; Actual Development Time ~30%; Cycle Time - 26 days.
  • After waste reduction: Overhead/waste ~58%; Actual Development Time ~42%; Cycle Time - 16-20 days.
Impediment categories (legend)
  • Manual Deployment
  • Performance Testing
  • Grooming Time
  • Dependency Time
  • Capability Issues
  • Defects
  • No Testing Automation
  • Ideal Development Time
  • Meeting Time

*based on experience with other clients

Comparison stacked-bar diagram showing two scenarios of team productivity. The top bar (current state) indicates about 70% overhead/waste and 30% actual development time with a 26-day cycle time. The bottom bar (after waste reduction) shows about 58% overhead/waste and 42% actual development time with a 16–20 day cycle time. A legend identifies categories contributing to overhead/waste and development time.

Reimagining Software Delivery for the Age of AI

The Next "Agile"

Change curve

  • AwareAI Aware: focus on some exploratory pilots, an awareness of tools in market, building AI literacy.
  • AssistedAI assisted code: focus on tool adoption, improving AI literacy.
  • AcceleratedAccelerated productivity: focus on accelerated productivity across the whole lifecycle, on a team basis.

How does this transformation manifest in teams and processes?

Diagram of a three-stage change curve shown as an arc with labeled segments: Aware, Assisted, and Accelerated. Each segment has a callout describing the stage: exploratory pilots and literacy (Aware), tool adoption and improved literacy (Assisted), and team-based productivity gains across the lifecycle (Accelerated).

Which software delivery tasks that can be boosted by the superpowers of GenAI?

AI Native Delivery

  • Planning
    • Lean Value Tree / Scenario Design*
  • Requirements & Design
    • Stories / Journeys / ACs*
  • Software Engineering
    • Architecture / Threat models*
    • Agents / RAG powered workflows*
  • Testing
    • Testing plan*
    • Test data*
  • Deployment
    • Prompt Powered Pipelines*
  • Operations
    • AI Operations and Support*

*Sample activities which are accelerated by using GenAI/Agents

Circular lifecycle diagram titled “AI Native Delivery.” Six stages—Planning, Requirements & Design, Software Engineering, Testing, Deployment, and Operations—are arranged around the center with a continuous flow indicator. Each stage has a callout listing example tasks accelerated by GenAI (e.g., scenario design, stories/ACs, threat models, agent/RAG workflows, testing plans and data, prompt-powered pipelines, and AI operations/support).
  1. What thinking is fundamental to my subject and should not be outsourced?
  2. What thinking is mechanical and can be outsourced to expedite learning?
  3. What is the best AI tool to use?

Rashna Taraporewalla

What are the superpowers of GenAI?

Translation

  • Requirements to code
  • Code to code
  • Language to queries
  • Standard to standard

Finding knowledge

  • Remembering details and learning
  • Understanding errors
  • Providing organisational context
  • Amplifying and socialising knowledge within a team

Brainstorming ideation

  • Product ideation
  • More comprehensive requirements
  • More comprehensive testing
  • Architecture
  • Exploratory testing

Summarisation and clustering

  • Change logs
  • Incident management: Run books
  • Research
  • Documentation
Diagram showing a four-column matrix of GenAI superpowers with category headers—Translation, Finding knowledge, Brainstorming ideation, and Summarisation and clustering—each accompanied by example use cases.

Why is the adoption focus on coding assistants?

Overall number of tools collected and tracked: 128

Number of tools relevant per task area

Stats from our internal tool tracking spreadsheet

Bar chart titled “Number of tools relevant per task area,” with categories: analysis, architecture, coding, data engineering, design, documentation, infrastructure, product, testing, research, security. The coding bar is by far the highest (about 75). Testing is the next largest (around 35–40). Data engineering and a few others are in the low teens; the remaining areas are in single digits. A banner above the chart states a total of 128 tools tracked. Source noted as an internal tool-tracking spreadsheet.

Design with GenAI

beyond wireframing

Design with GenAI beyond wireframing

Screenshot of a YouTube video demo showing an AI-powered design system tool generating mobile UI elements and screens, including a primary button, a sign-up form, a collection list with radio selections, and a comments screen.

Reaching out to ChatGPT

Photo of a laptop displaying a screenshot of the ChatGPT interface. The visible response outlines a product backlog for a fintech app, including a heading "Product Backlog (High-Level)" with "Epic 1: Onboarding and User Management" and several story items (e.g., sign-up process, user profile management, two-factor authentication).

Accelerate GenAI value beyond chat & coding with a team assistant

Thoughtworks Haiven™

Screenshot of the Thoughtworks Haiven team assistant interface on a laptop, showing a dashboard of prompt templates for software delivery activities such as user research, persona creation, design thinking exercises, creative matrix, scenario design, epic breakdown, user journey stories, and story refinement.

Ideation with GenAI

harnessing the divergent thinking

  • Non-linear
  • Finding unexpected connections
Screenshot of a scenario-generation interface: options to generate 5 scenarios on a 5-year horizon with Pessimistic and Realistic settings, an option to add details (signals, threats, opportunities), a text box containing the prompt “more and more elderly people are using electric vehicles, what does that mean for the design of our charging stations?” and a Send button.

Ideation with GenAI

harnessing the divergent thinking

  • Non-linear
  • Finding unexpected connections
Screenshot of an AI scenario generation tool showing controls for number of scenarios and time horizon, options for pessimistic and realistic outlooks, a prompt about elderly people using electric vehicles and implications for charging station design, and a grid of generated scenario cards related to EV charging solutions for elderly drivers.

Ideation with GenAI

harnessing the divergent thinking

  • Non-linear
  • Finding unexpected connections
Screenshot of a scenario-generation tool. The interface lets the user specify parameters (number of scenarios, time horizon, pessimistic/realistic) and submit a prompt: “more and more elderly people are using electric vehicles, what does that mean for the design of our charging stations?”. Below, a grid of generated scenario cards appears with titles such as “Electric vehicles adapted to the needs of elderly drivers,” “Charging stations introduced automated docking for effortless connection,” “Charging stations expanded rest areas with health facilities,” and “Smartphone apps enabled remote monitoring of charging status.”

Analysis with GenAI

requirements analysis

Screenshot of a web application page for validating a user story, with sections titled "Context" and "Instructions" containing bullet points and paragraphs. Several passages are highlighted, indicating selected or emphasized text.

Analysis with GenAI: threat modeling

Screenshot of a team-assistant web application showing a "Threat Modelling" page. A left sidebar lists workflow areas such as Research, Ideate, Analyse, Coding, Testing, and Architecture. The main panel contains a text description of a user scenario, a Contexts field set to "crm," and a Generate button.

Reimagining Software Delivery for the Age of AI

AI-Native Engineering: The Next "Agile"

Change curve

  • Aware — focus on exploratory pilots, awareness of tools in the market, and building AI literacy.
  • AssistedAI assisted code with a focus on tool adoption and improving AI literacy.
  • AcceleratedAccelerated productivity across the whole lifecycle on a team basis.
  • AugmentedAugmented Humans with AI Agents shifting to less deterministic software and embracing the “fuzzy” or non-deterministic nature of AI outputs.

What might the future look like?

Semi-circular “change curve” diagram with four adjacent segments labeled, from left to right: Aware, Assisted, Accelerated, and Augmented. Each segment represents a stage of AI adoption in software delivery, progressing from initial awareness to fully augmented teams working with AI agents.

How can GenAI help us solve the hard engineering problems we’ve been unable to tackle manually

Maintenance with GenAI CodeConcise

“there is as much, if not more, value in understanding existing code
  • LLMs
  • Legacy Assistant
  • Modernization workbench
    • On demand
    • Modernization artifacts
  • SME
  • Ingestion and Comprehension Pipelines
  • Knowledge Graph (Neo4J)
  • User question: “How is authentication implemented?”
Architecture diagram showing a workflow for a legacy modernization assistant. Ingestion and comprehension pipelines produce program graphs that feed a Knowledge Graph (Neo4J). A Legacy Assistant within a modernization workbench interfaces with both the knowledge graph and LLMs. An SME poses a question (“How is authentication implemented?”) to the assistant, which leverages the LLMs and the knowledge graph to respond. A QR code is shown for additional resources.

Reverse engineering a black box

How might AI help to reverse engineer and rebuild an application without accessing the code?

Could be helpful for…

  • Business-critical application where the code is lost (e.g., because of breaking up with a vendor under unfriendly circumstances)
  • Targeting a slice of an application with a very chaotic and entangled codebase → clean slate

Reimagining Software Delivery for the Age of AI

AI-Native Engineering: The Next "Agile"

Change curve

  1. Aware — AI Aware focus on some exploratory pilots, an awareness of tools in market, building AI literacy
  2. Assisted — AI assisted code focus on tool adoption, improving AI literacy
  3. Accelerated — Accelerated productivity focus on accelerated productivity across the whole lifecycle, on a team basis
  4. Augmented — Augmented Humans with AI Agents shifting to less deterministic software and embracing the "fuzzy" or non-deterministic nature of AI outputs
  5. Native — AI-Native delivery systems where autonomous agents adjust their own path from real-time feedback—like self-healing code

What might the future look like?

Diagram of a semi-circular change curve divided into five sequential segments—Aware, Assisted, Accelerated, Augmented, Native—with callouts summarizing each stage, illustrating progression toward AI-native delivery with autonomous agents.

AI Native delivery isn’t an upgrade. It’s a reinvention.

A fundamental shift in the way we conceive, develop, and deliver software.

Beyond automation, AI-Native reimagines workflows

Teams beginning their AI journey:

  • Focus on adding AI tools to existing processes
  • Treat AI as an afterthought, an optional enhancement
  • Primarily deliver incremental upgrades by incorporating AI features

AI-Native Teams:

  • Position AI at the center of the creation process, driving innovation
  • Integrate AI deeply into the core of their development workflow
  • Emphasize seamless collaboration between humans and AI agents as a fundamental principle
  • Prioritize the ethical and responsible development and deployment of AI solutions
  • Enhance AI literacy across leadership to align strategy and decisions with business impact
A central icon of an upward-trending arrow inside a circle sits between two bullet lists, symbolizing a progression from early AI adoption to AI-native teams.

Enabling an Agentic Future

Experience Layer

  • Individual Augmentation (Co-Pilots)
    • Augmenting productivity & capability
  • Workflow & Process Agents
    • End-to-end task execution (+/- human in the loop)
  • Multi-Agent Systems
    • Replace/Reimagine products, platforms and services

Agent Platform

  • Orchestrator
    • Goal decomposition & processing
    • Assignment, state management, quality judgement
  • Specialized Agents
    • Search
    • Repos
    • Scripts
    • Daemon/Models
  • Memory & Evaluation
    • Short-/Long-term context stores
    • Guardrails (bias, policy, weights, etc.)
  • Tools & API Gateway
    • APIs
    • Business platforms
    • RBAC
    • Enterprise systems access

AI Platform

  • Foundation Models
    • Broad reasoning/planning
    • Code/text generation LLMs (GPTx, Claude x, Gemini x.x)
  • Specialized Models
    • Domain-specific LLMs
    • ML model continuous pre-/post-trained
  • Model Operations Fabric
    • Service
    • Observability
    • CI/CD
    • PaC

Foundation Data & Knowledge Platform

  • Operational data stores
  • Data products
  • Document stores
  • Real-time event streams
  • Vector/embedding indices (purpose-built, CSP-managed, RAG)
  • Governance, lineage, quality rules and security policies
Diagram of a layered stack showing four tiers: Foundation Data & Knowledge Platform at the base, above it the AI Platform, then the Agent Platform, and at the top the Experience Layer. Each tier is depicted with icons representing data stores and streams, model operations, agents/tools, and end-user applications, illustrating the flow from data to user-facing experiences.

Reimagining Software Delivery for the Age of AI

AI-Native Engineering: The Next "Agile"

Change curve

  • Aware — AI Aware focus on some exploratory pilots, an awareness of tools in market, building AI literacy.
  • Assisted — AI assisted code focus on tool adoption, improving AI literacy.
  • Accelerated — Accelerated productivity focus on accelerated productivity across the whole lifecycle, on a team basis.
  • Augmented — Augmented Humans with AI Agents shifting to less deterministic software and embracing the "fuzzy" or non-deterministic nature of AI outputs.
  • Native — AI-Native delivery systems where autonomous agents adjust their own path from real-time feedback—like self-healing code.

What might the future look like?

Semicircular change-curve diagram divided into five labeled stages (Aware, Assisted, Accelerated, Augmented, Native) with callout notes describing each stage, illustrating progression from basic awareness to fully AI‑native delivery systems.

When the winds of change blow, some people build walls and others build windmills.

— Chinese proverb