Harness Engineering
February 19, 2026

It was very interesting to read OpenAI’s recent write-up on “Harness engineering” which describes how a team used “no manually typed code at all” as a forcing function to build a harness for maintaining a large application with AI agents. After 5 months, they’ve built a real product that’s now over 1 million lines of code.
The consensus among people who’ve been working with code generation tools for some time is that something happened in November and December of 2025. Certainly, a big part of that was the arrival of Opus 4.5 from Anthropic and OpenAI’s GPT-5.2 models.
But we also saw, over the couple of months since then, the emergence of the system now known as OpenClaw, which itself relies on third-party models. It also relies on a brand new desktop application for coding from OpenAI, as well as the integration of Claude code into their desktop apps and into the systems those apps are running on. Finally, there was the emergence of Claude Cowork.
The term harness has increasingly come to be applied to systems like this. They are logic tools and more, which then tap into the capability of large language models, but certainly stream to improve those models.
And while models, particularly major step function improvements in them, take months or sometimes even years, these harnesses can be nearly continuously improved and indeed improve themselves by installing new skills or the equivalent.
It seems highly logical that we won’t simply rely on third-party harnesses. Many organisations or individuals may well build their own, perhaps tailored for specific use cases. One I’m exploring right now is the equivalent of Claude Cowork but for learning.
So I definitely think this is something developers should be paying attention to, and this piece from Birgitta Böckeler explores some of the implications, particularly within the enterprise software engineering practise.







