There are two categories of vibe coding. One is when you delegate little tasks to a coding LM while keeping yourself as the human “real programmer” fully informed and in control.
The second type of vibe coding is what I am interested in. It is when you use a coding agent to build towers of complexity that go beyond what you have time to understand in any detail. I am interested in what it means to cede cognitive control to an AI. My friend David Maymudes has been building some serious software that way, and he compares the second type of vibe coding to managing a big software team. Like the stories you’ve heard of whole startups being created completely out of one person vibe coding.
The term "vibe coding" is often used disparagingly.
And debates or discussions about the practise feel very much centred on the capability of models today. Something which is changing incredibly rapidly. From personal experience having worked with these models for more than three years now, in terms of code generation, the improvements over that time are almost unimaginable.
There are many concerns about simply prompting a code generation model to produce an output and letting it run, not being overly concerned or if concerned at all about the quality of code and then using that output without checking it, particularly. Without understanding it and without even inspecting the code.
Here David Bau writes in praise of vibe coding, more or less in this sense, although he does suggest providing guard rails and comprehensive testing to ensure the quality of code.
Believe the Checkbook
Everyone’s heard the line: “AI will write all the code; engineering as you know it is finished.”Boards repeat it. CFOs love it. Some CTOs quietly use it to justify hiring freezes and stalled promotion paths.
The Bun acquisition blows a hole in that story.
Here’s a team whose project was open source, whose most active contributor was an AI agent, whose code Anthropic legally could have copied overnight. No negotiations. No equity. No retention packages.Anthropic still fought competitors for the right to buy that group.
Publicly, AI companies talk like engineering is being automated away. Privately, they deploy millions of dollars to acquire engineers who already work with AI at full tilt. That contradiction is not a PR mistake. It is a signal.
One thing I've heard repeatedly over the last year or two, when people are critical of code generation using large language models, is something along the lines of: "But writing the code is not the bottom line when it comes to software engineering." And there's some validity to that. The question is: well what is the bottleneck? People might say testing. People might say architectural decisions. Quality assurance. All those are clearly choke points in delivering software. But here Robert Greiner observes that "The bottleneck isn’t code production, it is judgment."
I certainly think there's something to this, but I think sometimes what we do is we stop with an observation like that or the observation that the code generation is not the bottleneck. I think it's really important here is to think through the next steps and the consequences. So if judgement is the bottleneck, not code generation, then what are the implications for engineering leaders, which Robert Griner explores here? For software engineers themselves, whether junior, mid-career, or senior? For companies and organisations, and more broadly?
And is this true only of code or is it true of other outputs of generative AI?
My working hypothesis would be that it is, and so organisations and individuals should be developing and encouraging the development of judgement, what some people might call taste. Because it's that discernment, that judgement, that taste which is certainly valuable in software development, but I think in other fields will become increasingly valuable, because the models will be able to, are already able to, generate a lot of code, a lot of copy, a lot of images, a lot of legal advice.
A key question will be "what is the value of any particular generation from a model?"
That's where expertise comes in, that's where taste comes in, that's where discernment and judgement come in. So develop those, continue to develop those. What has long differentiated a person in terms of capability, in many respects, is not the ability to recite vast bodies of knowledge; it is the ability to know among all the vast knowledge what is the appropriate knowledge to deploy in a particular situation.
Your job is to deliver code you have proven to work
Your job is to deliver code you have proven to work
In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.
This is rude, a waste of other people’s time, and is honestly a dereliction of duty as a software developer.
Your job is to deliver code you have proven to work.
As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.
There's not much more to add to this observation by Simon Willison. Software engineers have a responsibility to deliver tested, verified, quality-assured code. If we use code generation to YOLO it, then what we're doing is not software engineering.
There is a time and place for such code. I use it extensively myself because what matters is the job that it does. It doesn't have to necessarily be particular secure or performance or even bug-free because I'm using it internally within a sandbox environment to achieve a productivity gain.
But it's entirely another thing to create something that's public-facing, that people rely on, that manages people's details and YOLO that.
What happens when the coding becomes the least interesting part of the work
That judgment is the job of a senior engineer. As far as I can tell, nobody is replacing that job with a coding agent anytime soon. Or if they are, they’re not talking about it publicly. I think it’s the former, and one of the main reasons is that there is probably not that much spelled-out practical knowledge about how to do the job of a senior engineer in frontier models’ base knowledge, the stuff that they get from their primary training by ingesting the whole internet.
Thoughts by an experienced software engineer on working with large language models. It's an irony that as we become experienced software engineers, traditionally we've written less and less software.
This is a trend that is perhaps changing as large language models become increasingly capable of generating code.
Writing a full HTML5 parser is not a short one-shot problem. I have been working on this project for a couple of months on off-hours.Tooling: I used plain VS Code with Github Copilot in Agent mode. I enabled automatic approval of all commands, and then added a blacklist of commands that I always wanted to approve manually. I wrote an agent instruction that told it to keep working, and don’t stop to ask questions. Worked well!Here is the 17-step process it took to get here:
A few weeks back, Simon and Wilson coined the term "vibe engineering" trying to create a distinction between the use of large language models to generate code that we simply run as-is, against the use of large language models as part of the software engineering process. This example he links to is an excellent example of vibe engineering.
Emil Stenström he's written an HTML parser, which, if you know anything about HTML, is much more complex than it might initially appear. Here, Emil details his approach to working with large language models to produce a very complex piece of software. Emil is a software engineer, but he observes that:
Yes. JustHTML is about 3,000 lines of Python with 8,500+ tests passing. I couldn't have written it this quickly without the agent.
But "quickly" doesn't mean "without thinking." I spent a lot of time reviewing code, making design decisions, and steering the agent in the right direction. The agent did the typing; I did the thinking.
That's probably the right division of labor.
Junior developer—obsolete accessory or valuable investment? How does the genie change the analysis?
Folks are taking knee-jerk action around the advent of AI—slowing hiring, firing all the juniors, cancelling internship programs. Instead, let’s think about this a second.
The standard model says junior developers are expensive. You pay senior salaries for negative productivity while they learn. They ask questions. They break things. They need code review. In an augmented development world, the difference between juniors & seniors is just too large & the cost of the juniors just too high.Wrongo. That’s backwards. Here’s why.
Kanban Beck is around in the world of software engineering, the originator of XP (Extreme Programming) and very well known and highly regarded when it comes to design patterns
Here he addresses the issue that has been of concern to many people, and that is, what impact will AI have on junior developers? Will they simply not exist anymore? And then, will we ever get senior developers if we haven't got any new junior developers? Ken has a different take, and I think it's well worth considering.
UX Is Your Moat (And You’re Ignoring It) – Eleganthack
If you’re building an AI product, your interface isn’t a nice-to-have. It’s your primary competitive advantage.
Here’s what that means in practice:
Make the first five minutes seamless. Users decide whether they’re staying or leaving almost immediately. If they have to think about where to click, you’ve already lost. Netflix auto-plays. TikTok starts scrolling. What does your product do the moment someone opens it?
Technologists often default to the idea that the best technology always wins. Over the years, we see endless debates about the technical specifications of a product and why that makes that product better. But what we should have learned by now is that technology is only one part of why something becomes successful. Category defining. Dominant.
Here, Christina Wodtke brings her many years of experience to the question of what will make AI products successful, with lessons not just for the biggest technology companies, but any company, whether they use AI or not.
How to Run a 90-Minute AI Design Sprint (with prompts)
Most teams still run ideation sessions with a whiteboard, a problem statement, and a flurry of post-its. To be honest, I’ve always loved a good Design sprint, especially in person and I hope those don’t go away for anyone because they’re an awesome way to learn and connect together.
But with AI, the way we generate, evaluate, and shape ideas has fundamentally shifted. You can collapse days of thinking into a focused 90-minute sprint if you know how to structure it well.
This is the format designed to move fast without losing the depth. It blends design thinking, systems thinking, and agent-era AI capabilities into a repeatable flow you can run any time your team needs clarity.
Here’s the 90-minute AI Design Sprint, step by step with prompts you can copy, paste, and use today.
As we've recently observed elsewhere, while a lot of the focus on generative AI and LLMs is on customer-facing features or generated content (be that text, images, or video), there is one place in which large language models can have a really valuable impact: on processes. Here M.C. Dean reimagines the design sprints, a staple of the design process, using large language models with some suggested prompts that she uses.
What I learned building an opinionated and minimal coding agent
I’ve also built a bunch of agents over the years, of various complexity. For example, Sitegeist, my little browser-use agent, is essentially a coding agent that lives inside the browser. In all that work, I learned that context engineering is paramount. Exactly controlling what goes into the model’s context yields better outputs, especially when it’s writing code. Existing harnesses make this extremely hard or impossible by injecting stuff behind your back that isn’t even surfaced in the UI.
I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past year, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.
One incredibly valuable use case for code generation, and a good way to explore, experiment, develop intuitions and capabilities with them is by building little utility tools for your own use, as Simon Wilson has been doing for several years.
I too have been doing this. I've taken spreadsheets, Bash scripts, Java's little pieces of JavaScript that I had cobbled together over years to help in the production of our sites and content and even printing for our conferences and built special purpose mini web applications to solve the same problems much more efficiently and enjoyably.
So I highly recommend it's something you try for yourself if you're not doing already. Here Simon lists a whole bunch of patterns that he has gleaned from his extensive development of such tools.
We propose adding a /llms.txt markdown file to websites to provide LLM-friendly content. This file offers brief background information, guidance, and links to detailed markdown files.llms.txt markdown is human and LLM readable, but is also in a precise format allowing fixed processing methods (i.e. classical programming techniques such as parsers and regex).We furthermore propose that pages on websites that have information that might be useful for LLMs to read provide a clean markdown version of those pages at the same URL as the original page, but with .md appended. (URLs without file names should append index.html.md instead.)
LLMs.txt it's one of a number of proposals for how best to expose the content of a web page, site, or app to large language models.
Llms.txt is a proposal initially from Jeremy Howard, well-known in the Python and AI communities, founder of FastAPI and now FastAI (as well as FastMail).
AI and variables: Building more accessible design systems faster
When people talk about AI in design, they often picture flashy visuals or generative art. But my own lightbulb moment happened at a less glamorous place: in an effort to solve this accessibility challenge under pressure.
At UX Scotland earlier this year, I shared how AI helped me transform a messy, time-consuming process into something lean, structured, and scalable. Instead of spending weeks tweaking palettes and testing contrast, I had an accessible design system up and running in just a few days. In this article, I’ll explain how I did it and why it matters.
When it comes to AI, we overindex on output and user facing features, and I think we're somewhat asleep on workflow and process. These can be made more efficient using much language.
Here's a great case study from Tobi Olowu on how he and his team used LLMs to help streamline the process of improving the accessibility of an existing design system.
A ChatGPT prompt equals about 5.1 seconds of Netflix
In June 2025 Sam Altman claimed about ChatGPT that “the average query uses about 0.34 watt-hours”.In March 2020 George Kamiya of the International Energy Agency estimated that “streaming a Netflix video in 2019 typically consumed 0.12-0.24kWh of electricity per hour” – that’s 240 watt-hours per Netflix hour at the higher end.
I found that 95% of all AI implementations had no ROI. I haven't really read the study, and I don't think many the people who quoted it have read it either.
We also see numbers bandied about about the amount of water used by large language models in particular, at times, single queries, and similarly, the amount of energy required for a single query.
And then yesterday I saw on a toilet that the average flush from that toilet used 3.4 litres of water.
It's good to see things like this from Simon Wilson where he tries to provide some broader context for the energy and the environmental impact of large language models. It would be even better to see more solid figures from OpenAI, Google, and the other hyperscalers, but at least it's a start.
We’re at the same inflection point we saw with mobile and cloud, except AI is more sensitive to context quality. Loose contracts, missing examples, and ambiguous guardrails don’t just cause bugs. They cause agents to confidently explore the negative space of your system.
The companies that win this transition will be the ones that treat their specs as executable truth, ship golden paths that agents can copy verbatim, and prove zero-trust by default at every tool boundary.
Your tech stack doesn’t need to be rebuilt for AI. But your documentation, contracts, and boundaries? Those need to level up.
So my question is this: Why vibe code with a language that has human convenience and ergonomics in view? Or to put that another way: Wouldn’t a language designed for vibe coding naturally dispense with much of what is convenient and ergonomic for humans in favor of what is convenient and ergonomic for machines? Why not have it just write C? Or hell, why not x86 assembly?
It may seem like a facetious or ironic question, but why stop with vibe coding? If we're going to develop software with large language models, why not use C? Or, more specifically, why use a specific language? Here, Stephen Ramsey observes that programming languages are designed for human convenience, i.e., developer convenience. But if a large language model is generating the code, why generate a language that is essentially an intermediary that humans rarely have ever actually going to read?
This is the question that Brett Taylor asks in a podcast that we linked to a few months back. It's one that really interests me. Just the other day, Geoff Huntley in another piece that we linked to talks about working with rather than against the grain of large language models. So, I think this fits into that way of thinking. If we are going to increasingly rely on large language models to do tasks for us, even if we restrict our focus to programming, it makes sense, it would seem, to find what they are best at rather than try to get them, as Geoff Huntley observes, to conform to approaches that humans have developed for our convenience.
AI companies want a new internet — and they think they’ve found the key
Over the past 18 months, the largest AI companies in the world have quietly settled on an approach to building the next generation of apps and services — an approach that would allow AI agents from any company to easily access information and tools across the internet in a standardized way. It’s a key step toward building a usable ecosystem of AI agents that might actually pay off some of the enormous investments these companies have made, and it all starts with three letters: MCP.
In 12 months or so, MCP is gone from an internal project at Claude at Anthropic to being extremely widely used and now has found a home at the Linux Foundation alongside other related technologies such as Goose.
This Verge story will give you an overview of the set of technologies and what's happening next.
However, it has several problems that make it less suitable to develop Dillo anymore. The most annoying problem is that the frontend barely works without JavaScript, so we cannot open issues, pull requests, source code or CI logs in Dillo itself, despite them being mostly plain HTML, which I don’t think is acceptable. In the past, it used to gracefully degrade without enforcing JavaScript, but now it doesn’t. Additionally, the page is very resource hungry, which I don’t think is needed to render mostly static text.
GitHub has been undertaking a long process of re-implementing their front-end using React. This is not the only story I've read where that turns out, perhaps not to have been the best decision. Many people have observed that with large repos, it becomes unworkably slow, even in state-of-the-art MacBook Pros.
This was eminently predictable and is one of the many reasons why I found myself, of late, pessimistic about the future of front-end as a vibrant, dynamic ecosystem.
Before diving into the subject at hand – and having read a great deal about it in preparation – I want to start with a point of clarification. Everyone asks, “Is there a bubble in AI?” I think there’s ambiguity even in the question. I’ve concluded there are two different but interrelated bubble possibilities to think about: one in the behavior of companies within the industry, and the other in how investors are behaving with regard to the industry. I have absolutely no ability to judge whether the AI companies’ aggressive behavior is justified, so I’ll try to stick primarily to the question of whether there’s a bubble around AI in the financial world.
Not infrequently in my conversations with people does this issue of whether or not we are in a bubble come up. Is there an AI bubble? Will we have an AI bubble? It's probably something we should think about. Even if there's this little of anything that at least we as individuals can do; perhaps we can make different decisions about how and where we invest or what might happen if we had a significant downturn of the nature of early 2000s or after the global financial crisis.
In future, I think I'll just point people to this. It's a very solid read, but it's not only a thoughtful thesis; it draws on quite a range of historical experiences.
How I Shipped 100k LOC in 2 Weeks with Coding Agents
When we onboard developers, we give them documentation, coding standards, proven workflows, and collaboration tools. When we “deploy” AI agents, we give them nothing. They start fresh every time. No project context, no memory of patterns, no proven workflows.
So I compiled AI Coding Infrastructure, the missing support layer that agents need. Five components:
Autonomous Execution (Ralph): Continuous loops for overnight autonomous development
Project Memory (AGENTS.md): Your tech stack, patterns, conventions that agents read automatically before every response
I think we're very much in the early stages of developing patterns, practises, and approaches to working with agentic systems. I think too that different systems will likely have at least somewhat different approaches that tend to get the best from them.
In the meantime, I'm finding interesting to read about how various individuals and teams go about working with these systems. I hope you might find that valuable too.
A couple of years back, Mark Pesce gave a fantastic keynote at our summit using the analogy of the history of steam power for trying to understand where we were at and what was happening when it came to large language models and generative AI.
While historical analogies can be misleading, they can also be useful in helping us to get some sense of transformation. Humans are really not intuitively great at understanding exponential change. I often quote a line from Hemingway where someone asks another character how did you go bankrupt, and the reply is, "Two ways: slowly, then suddenly." We saw during the initial outback of COVID that humans really weren't great at exponential reasoning, especially when we look at logarithmic graphs.
But what this piece tries to get at is how transformations, such as the transformation from human and animal to steam power which essentially drove the Industrial Revolution, take time. In the case of that transformation, it took a century or so from the mid-18th to the mid-19th century. And for a lot of that time, if the growth is exponential, there's seemingly very little apparent change. But then something occurs, some tipping point, and something happens. Perhaps around 1820 in the UK, and between 1820 and 1850, we saw this enormous increase in the productive output of Britain's industrial capability.
So I really recommend reading this article. It's relatively short. It's very entertaining and engaging. To try and develop this intuition about how the growing capability of generative AI may impact various kinds of human endeavour.
10 Years of Let’s Encrypt Certificates – Let’s Encrypt
On September 14, 2015, our first publicly-trusted certificate went live. We were proud that we had issued a certificate that a significant majority of clients could accept, and had done it using automated software. Of course, in retrospect this was just the first of billions of certificates. Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites.
A decade ago, very few websites in the scheme of things used HTTPS. At that stage, I'd had websites for more than 20 years and never had a secure website in that way. Why was this the case? Well, it was typically expensive and, above all, technically really painful to provision certificates for a website. So unless you were very large or conducting commerce directly and required a secure connection, you almost certainly didn't implement it.
In the last decade, that's completely changed, you can now provision a certificate for a site at no cost, probably without even thinking about it. So ubiquitous are secure connections that when occasionally you visit one in a modern browser, it will provide copious warnings about the insecurity of that site. And all this is thanks to Let's Encrypt, a project that made it much easier and most importantly free to create HTTPS for any web page. So happy anniversary, and if anything, I thought it had been longer.
Here’s what we’re building: two MCP servers that work together to handle all our social media promotion automatically.
MCP Server #1: Content Fetcher This one goes out and grabs all our content from:
YouTube videos
Blog posts
GitHub release notes
Then it compares everything to a last_seen.json file to figure out what’s actually new. If nothing is new it proceeds to check an evergreen.json file and randomly pick old content to socialize.
MCP Server #2: Sprout Social Integration Once we have new content, this server takes over and:
Generates captions for each platform
Uploads media (videos, images, or just links)
Creates draft posts in Sprout Social
The goal? Wake up to social posts ready to go, without lifting a finger. Well, almost, more on that later.
If like me you find the best way to learn how something is to build it, then this tutorial from Ebony Louis at Block might be the best way for you to get up to speed with building your own MCP server.
Verification Engineering is the next Context Engineering
AI can only reliably run as fast as we check their work. It’s almost like a complexity theory claim. But I believe it needs to be the case to ensure we can harvest the exponential warp speed of AI but also remain robust and competent, as these technologies ultimate serve human beings, and us human beings need technology to be reliable and accountable, as we humans are already flaky enough 😉
This brings out the topic of Verification Engineering. I believe this can be a big thing after Context Engineering (which is the big thing after Prompt Engineering). By cleverly rearranging tasks and using nice abstractions and frameworks, we can make verification of AI performed tasks easier and use AI to ship more solid products the world. No more slop.
Interesting thoughts on the role of software engineers when building complex systems with large language models, introducing the idea of verification engineering.
The purpose of this curriculum is to help new Elicit employees learn background in machine learning, with a focus on language models. I’ve tried to strike a balance between papers that are relevant for deploying ML in production and techniques that matter for longer-term scalability.
Want to go deeper into your understanding of machine learning and large language models, but not quite sure where to start? Well, the folks at Elicit have a reading list, and it's pretty comprehensive that they give their new hires.
A lot of these are lectures you can find on YouTube, so it's not all dense reading.
Has the cost of building software just dropped 90%?
So where does that leave us? Right now there is still enormous value in having a human ‘babysit’ the agent – checking its work, suggesting the approach and shortcutting bad approaches. Pure YOLO vibe coding ends up in a total mess very quickly, but with a human in the loop I think you can build incredibly good quality software, very quickly.
This then allows developers who really master this technology to be hugely effective at solving business problems. Their domain and industry knowledge becomes a huge lever – knowing the best architectural decisions for a project, knowing which framework to use and which libraries work best.
Layer on understanding of the business domain and it does genuinely feel like the mythical 10x engineer is here. Equally, the pairing of a business domain expert with a motivated developer and these tools becomes an incredibly powerful combination, and something I think we’ll see becoming quite common – instead of a ‘squad’ of a business specialist and a set of developers, we’ll see a far tighter pairing of a couple of people.
This combination allows you to iterate incredibly quickly, and software becomes almost disposable – if the direction is bad, then throw it away and start again, using those learnings. This takes a fairly large mindset shift, but the hard work is the conceptual thinking, not the typing.
I've made reference to the Yogi Berra quote: "Predictions are hard, particularly about the future," more than once in my career.
Why is predicting the future so challenging? It's because of not the first-order effects but the second-order effects, and in particular, the economic impacts of change, which is extremely hard to envision.
If the cost of building software is dramatically reducing due to AI, and it's a reasonable size, one that I'd be willing to back up, then then what happens when the price of producing software is massively less? Do software engineers no longer have a job, or does a lot more software get produced? History would suggest it's more likely to be the latter. And then what's our role? What's our opportunity? What's our challenge? What's the risk?
This is a really good essay that I think anyone who works in software engineering should read and then take on board
Prediction: AI will make formal verification go mainstream
Much has been said about the effects that AI will have on software development, but there is an angle I haven’t seen talked about: I believe that AI will bring formal verification, which for decades has been a bit of a fringe pursuit, into the software engineering mainstream.
When I studied computer science at university in the 1980s, formal verification was a quite an area of research and interest.
What doesn't really occur to most people is that computer science, in many ways, is a branch of mathematics. There are many mathematical approaches to programming and programming languages. There are famous theorems, like Turing's work on the halting problem, there are questions about P vs NP.
None of this occurred to me, either, as a naive teenager who'd done some programming in Pascal and Forth and, of course, BASIC, and then I largely forgot about it for the last 30 or 40 years, but turns out verification is making a comeback.
Formal verification is different from things like debugging. It's about proving mathematically the correctness of a piece of code against its specifications.
As Martin Kleppmann observes here, formal verification is really hard, time-intensive, and expensive and there are a handful of experts in the entire world. It's very valuable for systems that essentially cannot fail, but years of work can go into verifying a few hundred lines of code.
But it turns out large language models might be really good at this work. And we might see a renaissance of formal verification of software.
MCPs for Developers Who Think They Don’t Need MCPs | goose
MCPs weren’t built just for developers.They’re not just for IDE copilots or code buddies. At Block, we use MCPs across everything, from finance to design to legal to engineering. I gave a whole talk on how different teams are using goose, an AI agent. The point is MCP is a protocol. What you build on top of it can serve all kinds of workflows.But I get it… let’s talk about the dev-specific ones that are worth your time.
AI coding tools were supposed to change everything. And they did! But maybe just not how we expected. The first wave was chaos. Vibe coding. Let the AI write whatever it wants and hope for the best. It worked well for prototypes, but fell apart for anything real.
So the community course-corrected. The answer was structure in form of Spec-driven development. Generate requirements, then a design doc, then a task list, then let the agent execute. Tools like Kiro and spec-kit promised to keep agents on track with meticulous planning.It sounded smart. It felt responsible. And it’s a trap.
Many if not most of today's software developers will never have heard of waterfall development or might think it's something to do with performance tooling.
When I studied software engineering many many years ago, Waterfall was the state of the art because prior to that, it had simply been chaos.
The idea behind waterfall development is there would be strict phases of software engineering from requirements gathering through the specification, coding, testing, delivery, and maintenance. Something like that from memory, and this was then meant to ensure that we had software quality.
Waterfall has long since been abandoned for agile methods. The Agile Manifesto was all about moving away from waterfall.
Something curious has happened in the last year or so when it comes to AI and software engineering. Spec-driven development has gained some real interest, that many consider it to be somewhat like waterfall. This particular piece treats spec-driven development as a kind of structure. Draw man. But I think it's worth considering the point being made.
My view: compared to last year, AI is much more impressive but not proportionally more useful. They improved on some things they were explicitly optimised for (coding, vision, OCR, benchmarks), and did not hugely improve on everything else. Progress is thus (still!) consistent with current frontier training bringing more things in-distribution rather than generalising very far.
A rather comprehensive look at how frontier light-language models have evolved and improved over the last twelve months or so across a number of different aspects.
We are now one year in where a new category of companies has been founded whereby the majority of the software behind that company was code-generated.From here on out I’m going to call to these companies as model weight first. This category of companies can be defined as any company that is building with the data (“grain”) that has been baked into the large language models.
Model weight first companies do not require as much context engineering. They’re not stuffing the context window with rules to try attempt to override and change the base models to fit a pre-existing corporate standard and conceptualisation of how software should be.
My instinct is, and this will be a seminal observation, as we evolve the way we work with large language models as software engineers. As Jeff Huntley observes here, one approach is to bend the models to our approach to software engineering. That's largely what we've been doing for the last three years, whether it's begging them to output in JSON through to filling their context with Agents.md files. But I think Jeff was really onto something here with his observation that there's a different approach, and that is to go with the flow of how an LLM wants to work rather than work against its instincts.
This brought to mind a great interview with Brett Taylor some months ago now at Latent Space, where he talked about the AI architect and how the role of software engineers will increasingly be less and less about writing the code and more and more about guiding the outcomes.
A Software Engineer’s Guide to Agentic Software Development
I’ve cracked the code on breaking the eternal cycle – features win, tech debt piles up, codebase becomes ‘legacy’, and an eventual rewrite. Using coding agents at GitHub, I now merge multiple tech debt PRs weekly while still delivering features. Tickets open for months get closed. ‘Too hard to change’ code actually improves. This is the story of the workflow.
One word that keeps cropping up when I talk with software engineers who build large language model (LLM)-based solutions is “evals”. They use evaluations to verify that LLM solutions work well enough because LLMs are non-deterministic, meaning there’s no guarantee they’ll provide the same answer to the same question twice. This makes it more complicated to verify that things work according to spec than it does with other software, for which automated tests are available.
Evals feel like they are becoming a core part of the AI engineering toolset. And because they are also becoming part of CI/CD pipelines, we, software engineers, should understand them better — especially because we might need to use them sooner rather than later! So, what do good evals look like, and how should this non-deterministic-testing space be approached?
E-vals are a core part of debugging LLM-based systems, managing non-determinism and ensuring quality of output. This is a really good introduction to the concept and some of the key ideas based on real-world case studies.
And so, we find ourselves at this crossroads. Regardless of which path we choose, the future of computing will be hyper-personalized. The question is whether that personalization will be in service of keeping us passively glued to screens—wading around in the shallows, stripped of agency—or whether it will enable us to direct more attention to what matters.
In order to build the resonant technological future we want for ourselves, we will have to resist the seductive logic of hyper-scale, and challenge the business and cultural assumptions that hold it in place. We will have to make deliberate decisions that stand in the face of accepted best practices—rethinking the system architectures, design patterns, and business models that have undergirded the tech industry for decades.
It's no surprise that this manifesto on resonant computing resonates with me (sorry, not sorry). Two of its drafters are past speakers at our conference. And people whose thinking and work I very much admire, Maggie Appleton and Simon Wilson.
Those of us in my expense drawn to the early web and its promise were drawn to principles like this and the hope the web could connect us in positive, uplifting ways.
The last 20 years or so have gone rather differently for all kinds of reasons. We can get into elsewhere. That doesn't mean we can't take a deep breath, take stock, and commit to doing something better, as this manifesto challenges us to do. Many of the signatories have also spoken at our conferences. I invite you to join in signing it, too.
Architecture, Specification, Execution: A Paradigm for AI-Accelerated Development
In this post, I’m going to share a paradigm that’s been working for me. To be clear: I’m not advocating for any particular products – Copilot, Kiro, Cursor, they’re all amazing. What I’m offering is an approach that works regardless of which tools you choose, delivering the compounding returns vibe coding never reaches.
Here’s the core principle: carve the path for AI to follow, don’t walk it yourself.
Your job as the engineer is to set direction, establish constraints, and define success. AI’s job is to execute within those boundaries. Mix these roles and you’ll just muddy the waters.
This paradigm builds on spec-driven development, and it consists of three pillars:
Architecture – Document the decisions that shape your system
Specification – Define the features within those constraints
We've been collecting approaches to developing with AI and large language models as software engineers, not because we necessarily think a specific approach is the right one, but because we're at such an early stage, it's interesting to see these patterns emerge.
Here Anthony Martinović shares his approach.
Categories: How Are People Using LLMs?Understanding the distribution of tasks that users perform with LLMs is central to assessing real-world demand and model–market fit. As described in the Data and Methodology section, we categorized billions of model interactions into high-level application categories. In the Open vs. Closed Source Models section, we focused on open source models to see community-driven usage. Here, we broaden the lens to all LLM usage on OpenRouter (both closed and open models) to get a comprehensive picture of what people use LLMs for in practice.
OpenRouter is a service that unifies APIs across different large language model API providers. One thing they have is deep insight into which models are being used and how they're being used. In this pretty detailed paper, they outline, based on the traffic they see, how different models are being used and which models are used and to what extent.
One thing I think well worth noticing here is that code generation, or use for software engineering, accounts for over 50% on a per token basis of all large language model use. And, perhaps a little bit more surprising, but not so much if you work with these technologies, Claude's models are 60% of token usage in this category.
Other use cases fall away pretty quickly. Role play is one that got early traction but which seems to be fading somewhat as an overall percentage of token use. Although I imagine growing in absolute terms. And then other areas which gained initial early traction like marketing automation and legal applications. While they get quite a bit of attention, they get quite a bit less use than above all the software engineering use case.
Design Systems for AI: Introducing the Context Engine | by Diana Wolosin | Design Systems | Nov, 2025 | Design Systems Collective
For years, design systems have served one primary purpose: humans. They document patterns, components, decisions, and principles, all presented in formats meant for designers and engineers to read, interpret, and translate into products.
However, the moment AI entered your workflow, one truth became painfully clear. Your tokens, guidelines, accessibility rules, and UX patterns don’t matter if the LLM consuming them can’t read them as structured, meaningful context.This is why AI prototypes often fail: they feature off-brand UI, inconsistent layouts, vague flows, and content that doesn’t align with the intended personality. It’s not hallucination, it’s missing context.
As Diana Wolosin observes design systems were created by humans for humans, which made a lot of sense until LLMs came around.
Here she asks: "What happens to design systems when AI becomes a our new user?"
Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl
After looking over the usages of the term, and some of the tools that claim to be implementing SDD, it seems to me that in reality, there are multiple implementation levels to it:
Spec-first: A well thought-out spec is written first, and then used in the AI-assisted development workflow for the task at hand.
Spec-anchored: The spec is kept even after the task is complete, to continue using it for evolution and maintenance of the respective feature.
Spec-as-source: The spec is the main source file over time, and only the spec is edited by the human, the human never touches the code.
All SDD approaches and definitions I’ve found are spec-first, but not all strive to be spec-anchored or spec-as-source. And often it’s left vague or totally open what the spec maintenance strategy over time is meant to be.
Spec-Driven Development (SDD) revives the old idea of heavy documentation before coding — an echo of the Waterfall era. While it promises structure for AI-driven programming, it risks burying agility under layers of Markdown. This post explores why a more iterative, natural-language approach may better fit modern development.
Spec-driven development is an approach to developing software with large language models that has gained some traction in recent months. Here, François Zaninotto explores the why and how of this approach.
Introducing AI, the Firefox way: A look at what we’re working on and how you can help shape it
With AI becoming a more widely adopted interface to the web, the principles of transparency, accountability, and respect for user agency are critical to keeping it free, open, and accessible to all. As an independent browser, we are well positioned to uphold these principles.
While others are building AI experiences that keep you locked in a conversational loop, we see a different path — one where AI serves as a trusted companion, enhancing your browsing experience and guiding you outward to the broader web.We believe standing still while technology moves forward doesn’t benefit the web or humanity. That’s why we see it as our responsibility to shape how AI integrates into the web — in ways that protect and give people more choice, not less.
AI in one form or another has been in our browsers for many years with the speech APIs that predate large-language models. More recently with Chrome, we've seen general and specific APIs being experimented with. Firefox 2 has started with similar experiments. Here the Firefox AI team talk about their philosophy of why and how they are implementing AI in the browser.