davidbau.com Vibe Coding

    Two Kinds of Vibe Coding

    There are two categories of vibe coding. One is when you delegate little tasks to a coding LM while keeping yourself as the human “real programmer” fully informed and in control.

    The second type of vibe coding is what I am interested in. It is when you use a coding agent to build towers of complexity that go beyond what you have time to understand in any detail. I am interested in what it means to cede cognitive control to an AI. My friend David Maymudes has been building some serious software that way, and he compares the second type of vibe coding to managing a big software team. Like the stories you’ve heard of whole startups being created completely out of one person vibe coding.

    Source: davidbau.com Vibe Coding

    The term "vibe coding" is often used disparagingly. And debates or discussions about the practise feel very much centred on the capability of models today. Something which is changing incredibly rapidly. From personal experience having worked with these models for more than three years now, in terms of code generation, the improvements over that time are almost unimaginable. There are many concerns about simply prompting a code generation model to produce an output and letting it run, not being overly concerned or if concerned at all about the quality of code and then using that output without checking it, particularly. Without understanding it and without even inspecting the code. Here David Bau writes in praise of vibe coding, more or less in this sense, although he does suggest providing guard rails and comprehensive testing to ensure the quality of code.  

    Believe the Checkbook

    Black and white sketch of a seesaw with a stack of papers on one end and a megaphone on the other, illustrating the balance between written content and amplified speech.

    Everyone’s heard the line: “AI will write all the code; engineering as you know it is finished.”Boards repeat it. CFOs love it. Some CTOs quietly use it to justify hiring freezes and stalled promotion paths.

    The Bun acquisition blows a hole in that story.

    Here’s a team whose project was open source, whose most active contributor was an AI agent, whose code Anthropic legally could have copied overnight. No negotiations. No equity. No retention packages.Anthropic still fought competitors for the right to buy that group.

    Publicly, AI companies talk like engineering is being automated away. Privately, they deploy millions of dollars to acquire engineers who already work with AI at full tilt. That contradiction is not a PR mistake. It is a signal.

    Source: Believe the Checkbook | Robert Greiner

    One thing I've heard repeatedly over the last year or two, when people are critical of code generation using large language models, is something along the lines of: "But writing the code is not the bottom line when it comes to software engineering." And there's some validity to that. The question is: well what is the bottleneck? People might say testing. People might say architectural decisions. Quality assurance. All those are clearly choke points in delivering software. But here Robert Greiner observes that "The bottleneck isn’t code production, it is judgment." I certainly think there's something to this, but I think sometimes what we do is we stop with an observation like that or the observation that the code generation is not the bottleneck. I think it's really important here is to think through the next steps and the consequences. So if judgement is the bottleneck, not code generation, then what are the implications for engineering leaders, which Robert Griner explores here? For software engineers themselves, whether junior, mid-career, or senior? For companies and organisations, and more broadly? And is this true only of code or is it true of other outputs of generative AI? My working hypothesis would be that it is, and so organisations and individuals should be developing and encouraging the development of judgement, what some people might call taste. Because it's that discernment, that judgement, that taste which is certainly valuable in software development, but I think in other fields will become increasingly valuable, because the models will be able to, are already able to, generate a lot of code, a lot of copy, a lot of images, a lot of legal advice. A key question will be "what is the value of any particular generation from a model?" That's where expertise comes in, that's where taste comes in, that's where discernment and judgement come in. So develop those, continue to develop those. What has long differentiated a person in terms of capability, in many respects, is not the ability to recite vast bodies of knowledge; it is the ability to know among all the vast knowledge what is the appropriate knowledge to deploy in a particular situation.

    Your job is to deliver code you have proven to work

    AI, LLMs, software engineering

    Your job is to deliver code you have proven to work
In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

    Your job is to deliver code you have proven to work

    In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

    This is rude, a waste of other people’s time, and is honestly a dereliction of duty as a software developer.

    Your job is to deliver code you have proven to work.

    As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.

    Source: Your job is to deliver code you have proven to work

    There's not much more to add to this observation by Simon Willison. Software engineers have a responsibility to deliver tested, verified, quality-assured code. If we use code generation to YOLO it, then what we're doing is not software engineering. There is a time and place for such code. I use it extensively myself because what matters is the job that it does. It doesn't have to necessarily be particular secure or performance or even bug-free because I'm using it internally within a sandbox environment to achieve a productivity gain. But it's entirely another thing to create something that's public-facing, that people rely on, that manages people's details and YOLO that.

    What happens when the coding becomes the least interesting part of the work

    AI, LLMs, software engineering

    That judgment is the job of a senior engineer. As far as I can tell, nobody is replacing that job with a coding agent anytime soon. Or if they are, they’re not talking about it publicly. I think it’s the former, and one of the main reasons is that there is probably not that much spelled-out practical knowledge about how to do the job of a senior engineer in frontier models’ base knowledge, the stuff that they get from their primary training by ingesting the whole internet.

    Source: What happens when the coding becomes the least interesting part of the work | by Obie Fernandez | Dec, 2025 | Medium

    Thoughts by an experienced software engineer on working with large language models. It's an irony that as we become experienced software engineers, traditionally we've written less and less software. This is a trend that is perhaps changing as large language models become increasingly capable of generating code.

    How I wrote JustHTML using coding agents

    AI, LLMs, software engineering

    Writing a full HTML5 parser is not a short one-shot problem. I have been working on this project for a couple of months on off-hours.Tooling: I used plain VS Code with Github Copilot in Agent mode. I enabled automatic approval of all commands, and then added a blacklist of commands that I always wanted to approve manually. I wrote an agent instruction that told it to keep working, and don’t stop to ask questions. Worked well!Here is the 17-step process it took to get here:

    Source: How I wrote JustHTML using coding agents – Friendly Bit

    A few weeks back, Simon and Wilson coined the term "vibe engineering" trying to create a distinction between the use of large language models to generate code that we simply run as-is, against the use of large language models as part of the software engineering process. This example he links to is an excellent example of vibe engineering. Emil Stenström he's written an HTML parser, which, if you know anything about HTML, is much more complex than it might initially appear. Here, Emil details his approach to working with large language models to produce a very complex piece of software. Emil is a software engineer, but he observes that:
    Yes. JustHTML is about 3,000 lines of Python with 8,500+ tests passing. I couldn't have written it this quickly without the agent. But "quickly" doesn't mean "without thinking." I spent a lot of time reviewing code, making design decisions, and steering the agent in the right direction. The agent did the typing; I did the thinking. That's probably the right division of labor.

    The Bet On Juniors Just Got Better

    AI, LLMs, software engineering

    Hand-drawn graph labeled "PROFIT" showing three curves starting below the x-axis and rising; one blue curve rises steeply, one red curve rises moderately, and one orange curve levels off.

    Junior developer—obsolete accessory or valuable investment? How does the genie change the analysis?

    Folks are taking knee-jerk action around the advent of AI—slowing hiring, firing all the juniors, cancelling internship programs. Instead, let’s think about this a second.

    The standard model says junior developers are expensive. You pay senior salaries for negative productivity while they learn. They ask questions. They break things. They need code review. In an augmented development world, the difference between juniors & seniors is just too large & the cost of the juniors just too high.Wrongo. That’s backwards. Here’s why.

    Source: The Bet On Juniors Just Got Better – by Kent Beck

    Kanban Beck is around in the world of software engineering, the originator of XP (Extreme Programming) and very well known and highly regarded when it comes to design patterns Here he addresses the issue that has been of concern to many people, and that is, what impact will AI have on junior developers? Will they simply not exist anymore? And then, will we ever get senior developers if we haven't got any new junior developers? Ken has a different take, and I think it's well worth considering.

    UX Is Your Moat (And You’re Ignoring It) – Eleganthack

    AI, Design

    If you’re building an AI product, your interface isn’t a nice-to-have. It’s your primary competitive advantage.

    Here’s what that means in practice:

    Make the first five minutes seamless. Users decide whether they’re staying or leaving almost immediately. If they have to think about where to click, you’ve already lost. Netflix auto-plays. TikTok starts scrolling. What does your product do the moment someone opens it?

    Source: UX Is Your Moat (And You’re Ignoring It) – Eleganthack

    Technologists often default to the idea that the best technology always wins. Over the years, we see endless debates about the technical specifications of a product and why that makes that product better. But what we should have learned by now is that technology is only one part of why something becomes successful. Category defining. Dominant. Here, Christina Wodtke brings her many years of experience to the question of what will make AI products successful, with lessons not just for the biggest technology companies, but any company, whether they use AI or not.

    How to Run a 90-Minute AI Design Sprint (with prompts)

    AI, Design

    3D-rendered coral-like structure in gradient colors from yellow to blue, overlaid on a beige grid background with blue anchor points and outlines indicating selection or manipulation in a design interface.

    Most teams still run ideation sessions with a whiteboard, a problem statement, and a flurry of post-its. To be honest, I’ve always loved a good Design sprint, especially in person and I hope those don’t go away for anyone because they’re an awesome way to learn and connect together.

    But with AI, the way we generate, evaluate, and shape ideas has fundamentally shifted. You can collapse days of thinking into a focused 90-minute sprint if you know how to structure it well.

    This is the format designed to move fast without losing the depth. It blends design thinking, systems thinking, and agent-era AI capabilities into a repeatable flow you can run any time your team needs clarity.

    Here’s the 90-minute AI Design Sprint, step by step with prompts you can copy, paste, and use today.

    Source: How to Run a 90-Minute AI Design Sprint (with prompts)

    As we've recently observed elsewhere, while a lot of the focus on generative AI and LLMs is on customer-facing features or generated content (be that text, images, or video), there is one place in which large language models can have a really valuable impact: on processes. Here M.C. Dean reimagines the design sprints, a staple of the design process, using large language models with some suggested prompts that she uses.

    What I learned building an opinionated and minimal coding agent

    AI, coding agent, LLMs, software engineering

    Table displaying performance metrics for the agent "pi (claude-opus-4-5)" on the "terminal-bench" dataset, including 428 trials, 71 errors, a mean score of 0.479, reward distribution with 213 successes and 215 failures, and a breakdown of exception types and counts.

    I’ve also built a bunch of agents over the years, of various complexity. For example, Sitegeist, my little browser-use agent, is essentially a coding agent that lives inside the browser. In all that work, I learned that context engineering is paramount. Exactly controlling what goes into the model’s context yields better outputs, especially when it’s writing code. Existing harnesses make this extremely hard or impossible by injecting stuff behind your back that isn’t even surfaced in the UI.

    Source: What I learned building an opinionated and minimal coding agent

    Mario Zechner built his own minimal coding agent. Think of a lightweight version of Claude Code or OpenAI's Codex. You can follow along here.

    Useful patterns for building HTML tools

    AI, LLMs

    I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past year, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.

    Source: Useful patterns for building HTML tools

    One incredibly valuable use case for code generation, and a good way to explore, experiment, develop intuitions and capabilities with them is by building little utility tools for your own use, as Simon Wilson has been doing for several years. I too have been doing this. I've taken spreadsheets, Bash scripts, Java's little pieces of JavaScript that I had cobbled together over years to help in the production of our sites and content and even printing for our conferences and built special purpose mini web applications to solve the same problems much more efficiently and enjoyably. So I highly recommend it's something you try for yourself if you're not doing already. Here Simon lists a whole bunch of patterns that he has gleaned from his extensive development of such tools.

    The /llms.txt file – llms-txt

    AI, front end development, LLMs

    Markdown template with example syntax including a title (# Title), optional description in italic blockquote, placeholder text, section headers (## Section name, ## Optional), and link entries using markdown link format with optional details.

    We propose adding a /llms.txt markdown file to websites to provide LLM-friendly content. This file offers brief background information, guidance, and links to detailed markdown files.llms.txt markdown is human and LLM readable, but is also in a precise format allowing fixed processing methods (i.e. classical programming techniques such as parsers and regex).We furthermore propose that pages on websites that have information that might be useful for LLMs to read provide a clean markdown version of those pages at the same URL as the original page, but with .md appended. (URLs without file names should append index.html.md instead.)

    Source: The /llms.txt file – llms-txt

    LLMs.txt it's one of a number of proposals for how best to expose the content of a web page, site, or app to large language models. Llms.txt is a proposal initially from Jeremy Howard, well-known in the Python and AI communities, founder of FastAPI and now FastAI (as well as FastMail).  

    AI and variables: Building more accessible design systems faster

    AI, Design, Design Systems

    Website navigation interface with a dark maroon header featuring menu items: Foundation, Components, Patterns, Resources & tools, and a Search icon; below are colorful content cards labeled Patterns and Resources & Tools.

    When people talk about AI in design, they often picture flashy visuals or generative art. But my own lightbulb moment happened at a less glamorous place: in an effort to solve this accessibility challenge under pressure.

    At UX Scotland earlier this year, I shared how AI helped me transform a messy, time-consuming process into something lean, structured, and scalable. Instead of spending weeks tweaking palettes and testing contrast, I had an accessible design system up and running in just a few days. In this article, I’ll explain how I did it and why it matters.

    Source: AI and variables: Building more accessible design systems faster – zeroheight

    When it comes to AI, we overindex on output and user facing features, and I think we're somewhat asleep on workflow and process. These can be made more efficient using much language. Here's a great case study from Tobi Olowu on how he and his team used LLMs to help streamline the process of improving the accessibility of an existing design system.

    A ChatGPT prompt equals about 5.1 seconds of Netflix

    AI, environmental impact

    In June 2025 Sam Altman claimed about ChatGPT that “the average query uses about 0.34 watt-hours”.In March 2020 George Kamiya of the International Energy Agency estimated that “streaming a Netflix video in 2019 typically consumed 0.12-0.24kWh of electricity per hour” – that’s 240 watt-hours per Netflix hour at the higher end.

    Source: A ChatGPT prompt equals about 5.1 seconds of Netflix

    I found that 95% of all AI implementations had no ROI. I haven't really read the study, and I don't think many the people who quoted it have read it either. We also see numbers bandied about about the amount of water used by large language models in particular, at times, single queries, and similarly, the amount of energy required for a single query. And then yesterday I saw on a toilet that the average flush from that toilet used 3.4 litres of water. It's good to see things like this from Simon Wilson where he tries to provide some broader context for the energy and the environmental impact of large language models. It would be even better to see more solid figures from OpenAI, Google, and the other hyperscalers, but at least it's a start.

    Is your tech stack AI ready?

    AI, architecture, LLMs, software engineering

    We’re at the same inflection point we saw with mobile and cloud, except AI is more sensitive to context quality. Loose contracts, missing examples, and ambiguous guardrails don’t just cause bugs. They cause agents to confidently explore the negative space of your system.

    The companies that win this transition will be the ones that treat their specs as executable truth, ship golden paths that agents can copy verbatim, and prove zero-trust by default at every tool boundary.

    Your tech stack doesn’t need to be rebuilt for AI. But your documentation, contracts, and boundaries? Those need to level up.

    Source: Is your tech stack AI ready? | Appear Blog

    Speaker at our recent Engineering AI conference, Jakub Reidl, looks at some of the key areas of your tech stack to get ready for AI

    If You’re Going to Vibe Code, Why Not Do It in C?

    AI, LLMs, software engineering

    So my question is this: Why vibe code with a language that has human convenience and ergonomics in view? Or to put that another way: Wouldn’t a language designed for vibe coding naturally dispense with much of what is convenient and ergonomic for humans in favor of what is convenient and ergonomic for machines? Why not have it just write C? Or hell, why not x86 assembly?

    Source: If You’re Going to Vibe Code, Why Not Do It in C?

    It may seem like a facetious or ironic question, but why stop with vibe coding? If we're going to develop software with large language models, why not use C? Or, more specifically, why use a specific language? Here, Stephen Ramsey observes that programming languages are designed for human convenience, i.e., developer convenience. But if a large language model is generating the code, why generate a language that is essentially an intermediary that humans rarely have ever actually going to read? This is the question that Brett Taylor asks in a podcast that we linked to a few months back. It's one that really interests me. Just the other day, Geoff Huntley in another piece that we linked to talks about working with rather than against the grain of large language models. So, I think this fits into that way of thinking. If we are going to increasingly rely on large language models to do tasks for us, even if we restrict our focus to programming, it makes sense, it would seem, to find what they are best at rather than try to get them, as Geoff Huntley observes, to conform to approaches that humans have developed for our convenience.

    AI companies want a new internet — and they think they’ve found the key

    AI, LLMs, MCP, open source

    3D illustration of a small robot with a digital face displaying zeros interacting with a laptop, with two speech bubbles between them, set against an orange background.

    Over the past 18 months, the largest AI companies in the world have quietly settled on an approach to building the next generation of apps and services — an approach that would allow AI agents from any company to easily access information and tools across the internet in a standardized way. It’s a key step toward building a usable ecosystem of AI agents that might actually pay off some of the enormous investments these companies have made, and it all starts with three letters: MCP.

    Source: AI companies want a new internet — and they think they’ve found the key | The Verge

    In 12 months or so, MCP is gone from an internal project at Claude at Anthropic to being extremely widely used and now has found a home at the Linux Foundation alongside other related technologies such as Goose. This Verge story will give you an overview of the set of technologies and what's happening next.

    Migrating Dillo from GitHub

    front end development, performance, react

    However, it has several problems that make it less suitable to develop Dillo anymore. The most annoying problem is that the frontend barely works without JavaScript, so we cannot open issues, pull requests, source code or CI logs in Dillo itself, despite them being mostly plain HTML, which I don’t think is acceptable. In the past, it used to gracefully degrade without enforcing JavaScript, but now it doesn’t. Additionally, the page is very resource hungry, which I don’t think is needed to render mostly static text.

    Source: Migrating Dillo from GitHub

    GitHub has been undertaking a long process of re-implementing their front-end using React. This is not the only story I've read where that turns out, perhaps not to have been the best decision. Many people have observed that with large repos, it becomes unworkably slow, even in state-of-the-art MacBook Pros. This was eminently predictable and is one of the many reasons why I found myself, of late, pessimistic about the future of front-end as a vibrant, dynamic ecosystem.

    Is It a Bubble?

    AI

    Table comparing the top TMT (technology, media, and telecom) stocks by S&P 500 weight and next twelve months price-to-earnings (NTM P/E) ratios in December 1999 and the current period, showing a median NTM P/E of 41x in 1999 versus 31x currently, with notable changes in company composition and overall valuations.

    Before diving into the subject at hand – and having read a great deal about it in preparation – I want to start with a point of clarification. Everyone asks, “Is there a bubble in AI?” I think there’s ambiguity even in the question. I’ve concluded there are two different but interrelated bubble possibilities to think about: one in the behavior of companies within the industry, and the other in how investors are behaving with regard to the industry. I have absolutely no ability to judge whether the AI companies’ aggressive behavior is justified, so I’ll try to stick primarily to the question of whether there’s a bubble around AI in the financial world.

    Source: Is It a Bubble?

    Not infrequently in my conversations with people does this issue of whether or not we are in a bubble come up. Is there an AI bubble? Will we have an AI bubble? It's probably something we should think about. Even if there's this little of anything that at least we as individuals can do; perhaps we can make different decisions about how and where we invest or what might happen if we had a significant downturn of the nature of early 2000s or after the global financial crisis. In future, I think I'll just point people to this. It's a very solid read, but it's not only a thoughtful thesis; it draws on quite a range of historical experiences.

    How I Shipped 100k LOC in 2 Weeks with Coding Agents

    AI, AI Native Dev, LLMs, software engineering

    When we onboard developers, we give them documentation, coding standards, proven workflows, and collaboration tools. When we “deploy” AI agents, we give them nothing. They start fresh every time. No project context, no memory of patterns, no proven workflows.

    So I compiled AI Coding Infrastructure, the missing support layer that agents need. Five components:

    Autonomous Execution (Ralph): Continuous loops for overnight autonomous development

    Project Memory (AGENTS.md): Your tech stack, patterns, conventions that agents read automatically before every response

    Proven Workflows (Skills): Battle-tested TDD, debugging, code review patterns agents MUST follow

    Specialization (Sub-Agents): 114+ domain experts working in parallel, not one generalist

    Planning Systems (ExecPlans): Self-contained living docs for complex features

    Source: How I Shipped 100k LOC in 2 Weeks with Coding Agents | Blog

    I think we're very much in the early stages of developing patterns, practises, and approaches to working with agentic systems. I think too that different systems will likely have at least somewhat different approaches that tend to get the best from them. In the meantime, I'm finding interesting to read about how various individuals and teams go about working with these systems. I hope you might find that valuable too.

    Horses

    AI, LLMs

    Line graph titled "Anthropic technical Q&A" showing monthly counts of questions answered from mid-2024 to mid-2025; human-answered questions (pink line) gradually decline over time, while AI-answered questions (black line) rise sharply starting October 2024, surpassing human answers by early 2025.

    Engines, steam engines, were invented in 1700.And what followed was 200 years of steady improvement, with engines getting 20% better a decade.

    For the first 120 years of that steady improvement, horses didn’t notice at all.

    Then, between 1930 and 1950, 90% of the horses in the US disappeared.

    Progress in engines was steady. Equivalence to horses was sudden.

    Source: Horses

    A couple of years back, Mark Pesce gave a fantastic keynote at our summit using the analogy of the history of steam power for trying to understand where we were at and what was happening when it came to large language models and generative AI. While historical analogies can be misleading, they can also be useful in helping us to get some sense of transformation. Humans are really not intuitively great at understanding exponential change. I often quote a line from Hemingway where someone asks another character how did you go bankrupt, and the reply is, "Two ways: slowly, then suddenly." We saw during the initial outback of COVID that humans really weren't great at exponential reasoning, especially when we look at logarithmic graphs. But what this piece tries to get at is how transformations, such as the transformation from human and animal to steam power which essentially drove the Industrial Revolution, take time. In the case of that transformation, it took a century or so from the mid-18th to the mid-19th century. And for a lot of that time, if the growth is exponential, there's seemingly very little apparent change. But then something occurs, some tipping point, and something happens. Perhaps around 1820 in the UK, and between 1820 and 1850, we saw this enormous increase in the productive output of Britain's industrial capability. So I really recommend reading this article. It's relatively short. It's very entertaining and engaging. To try and develop this intuition about how the growing capability of generative AI may impact various kinds of human endeavour.

    10 Years of Let’s Encrypt Certificates – Let’s Encrypt

    HTTPS, security, TLS

    Line chart showing the growth from 2016 to 2025 of active SSL certificates (dotted orange line), fully-qualified domains (solid dark blue line), and registered domains (dashed green line), with fully-qualified domains and certificates rising sharply from 2022 onward, while registered domains grow more gradually.

    On September 14, 2015, our first publicly-trusted certificate went live. We were proud that we had issued a certificate that a significant majority of clients could accept, and had done it using automated software. Of course, in retrospect this was just the first of billions of certificates. Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites.

    Source: 10 Years of Let’s Encrypt Certificates – Let’s Encrypt

    A decade ago, very few websites in the scheme of things used HTTPS. At that stage, I'd had websites for more than 20 years and never had a secure website in that way. Why was this the case? Well, it was typically expensive and, above all, technically really painful to provision certificates for a website. So unless you were very large or conducting commerce directly and required a secure connection, you almost certainly didn't implement it. In the last decade, that's completely changed, you can now provision a certificate for a site at no cost, probably without even thinking about it. So ubiquitous are secure connections that when occasionally you visit one in a modern browser, it will provide copious warnings about the insecurity of that site. And all this is thanks to Let's Encrypt, a project that made it much easier and most importantly free to create HTTPS for any web page. So happy anniversary, and if anything, I thought it had been longer.

    Building a Social Media Agent | goose

    AI, LLMs, MCP

    Screenshot of a Bluesky social media post by user "goose @opensource.block.xyz" with the caption "vibe code with me test" and an image reading "how i used goose to migrate my codebase" on a black background with a white megaphone and colorful button.

    The Game Plan

    Here’s what we’re building: two MCP servers that work together to handle all our social media promotion automatically.

    MCP Server : Content Fetcher
    This one goes out and grabs all our content from:

    • YouTube videos
    • Blog posts
    • GitHub release notes

    Then it compares everything to a last_seen.json file to figure out what’s actually new. If nothing is new it proceeds to check an evergreen.json file and randomly pick old content to socialize.

    MCP Server : Sprout Social Integration
    Once we have new content, this server takes over and:

    • Generates captions for each platform
    • Uploads media (videos, images, or just links)
    • Creates draft posts in Sprout Social

    The goal? Wake up to social posts ready to go, without lifting a finger. Well, almost, more on that later.

    Source: Building a Social Media Agent | goose

    If like me you find the best way to learn how something is to build it, then this tutorial from Ebony Louis at Block might be the best way for you to get up to speed with building your own MCP server.

    AI should only run as fast as we can catch up

    AI, AI Native Dev, LLMs, software engineering

    Verification Engineering is the next Context Engineering

    AI can only reliably run as fast as we check their work. It’s almost like a complexity theory claim. But I believe it needs to be the case to ensure we can harvest the exponential warp speed of AI but also remain robust and competent, as these technologies ultimate serve human beings, and us human beings need technology to be reliable and accountable, as we humans are already flaky enough 😉

    This brings out the topic of Verification Engineering. I believe this can be a big thing after Context Engineering (which is the big thing after Prompt Engineering). By cleverly rearranging tasks and using nice abstractions and frameworks, we can make verification of AI performed tasks easier and use AI to ship more solid products the world. No more slop.

    Source: AI should only run as fast as we can catch up · Higashi.blog

    Interesting thoughts on the role of software engineers when building complex systems with large language models, introducing the idea of verification engineering.

    Elicit Machine Learning Reading List

    AI, LLMs

    The purpose of this curriculum is to help new Elicit employees learn background in machine learning, with a focus on language models. I’ve tried to strike a balance between papers that are relevant for deploying ML in production and techniques that matter for longer-term scalability.

    Want to go deeper into your understanding of machine learning and large language models, but not quite sure where to start? Well, the folks at Elicit have a reading list, and it's pretty comprehensive that they give their new hires. A lot of these are lectures you can find on YouTube, so it's not all dense reading.

    Has the cost of building software just dropped 90%?

    AI, LLMs, software engineering

    Line graph showing the cost of software over time from 2000 to 2025, with key milestones labeled as Open Source, Cloud, and Complexity along a gradually declining blue line, followed by a sharp drop in cost around 2025 labeled "AI Agents" in red.

    Domain knowledge is the only moat

    So where does that leave us? Right now there is still enormous value in having a human ‘babysit’ the agent – checking its work, suggesting the approach and shortcutting bad approaches. Pure YOLO vibe coding ends up in a total mess very quickly, but with a human in the loop I think you can build incredibly good quality software, very quickly.

    This then allows developers who really master this technology to be hugely effective at solving business problems. Their domain and industry knowledge becomes a huge lever – knowing the best architectural decisions for a project, knowing which framework to use and which libraries work best.

    Layer on understanding of the business domain and it does genuinely feel like the mythical 10x engineer is here. Equally, the pairing of a business domain expert with a motivated developer and these tools becomes an incredibly powerful combination, and something I think we’ll see becoming quite common – instead of a ‘squad’ of a business specialist and a set of developers, we’ll see a far tighter pairing of a couple of people.

    This combination allows you to iterate incredibly quickly, and software becomes almost disposable – if the direction is bad, then throw it away and start again, using those learnings. This takes a fairly large mindset shift, but the hard work is the conceptual thinking, not the typing.

    Source: Has the cost of building software just dropped 90%? – Martin Alderson

    I've made reference to the Yogi Berra quote: "Predictions are hard, particularly about the future," more than once in my career. Why is predicting the future so challenging? It's because of not the first-order effects but the second-order effects, and in particular, the economic impacts of change, which is extremely hard to envision. If the cost of building software is dramatically reducing due to AI, and it's a reasonable size, one that I'd be willing to back up, then then what happens when the price of producing software is massively less? Do software engineers no longer have a job, or does a lot more software get produced? History would suggest it's more likely to be the latter. And then what's our role? What's our opportunity? What's our challenge? What's the risk? This is a really good essay that I think anyone who works in software engineering should read and then take on board

    Prediction: AI will make formal verification go mainstream

    AI, computer science, software engineering

    When I studied computer science at university in the 1980s, formal verification was a quite an area of research and interest. What doesn't really occur to most people is that computer science, in many ways, is a branch of mathematics. There are many mathematical approaches to programming and programming languages. There are famous theorems, like Turing's work on the halting problem, there are questions about P vs NP. None of this occurred to me, either, as a naive teenager who'd done some programming in Pascal and Forth and, of course, BASIC, and then I largely forgot about it for the last 30 or 40 years, but turns out verification is making a comeback. Formal verification is different from things like debugging. It's about proving mathematically the correctness of a piece of code against its specifications. As Martin Kleppmann observes here, formal verification is really hard, time-intensive, and expensive and there are a handful of experts in the entire world. It's very valuable for systems that essentially cannot fail, but years of work can go into verifying a few hundred lines of code. But it turns out large language models might be really good at this work. And we might see a renaissance of formal verification of software.

    MCPs for Developers Who Think They Don’t Need MCPs | goose

    AI, LLMs, MCP, software engineering

    Illustration of a developer with braided hair working at a desk with a laptop and external monitor displaying code, next to a banner that reads "MCPs for devs who think they don't need em" on a colorful gradient background.

    MCPs weren’t built just for developers.They’re not just for IDE copilots or code buddies. At Block, we use MCPs across everything, from finance to design to legal to engineering. I gave a whole talk on how different teams are using goose, an AI agent. The point is MCP is a protocol. What you build on top of it can serve all kinds of workflows.But I get it… let’s talk about the dev-specific ones that are worth your time.

    Source: MCPs for Developers Who Think They Don’t Need MCPs | goose

    Angie Jones looks at the MCPs you might find valuable as a software engineer.

    Your Spec Driven Workflow Is Just Waterfall With Extra Steps

    AI, LLMs, software engineering, spec driven development

    Stacks of paper documents and folders with numerous yellow sticky notes on a desk in a fluorescent-lit office setting.

    AI coding tools were supposed to change everything. And they did! But maybe just not how we expected. The first wave was chaos. Vibe coding. Let the AI write whatever it wants and hope for the best. It worked well for prototypes, but fell apart for anything real.

    So the community course-corrected. The answer was structure in form of Spec-driven development. Generate requirements, then a design doc, then a task list, then let the agent execute. Tools like Kiro and spec-kit promised to keep agents on track with meticulous planning.It sounded smart. It felt responsible. And it’s a trap.

    Source: Your Spec Driven Workflow Is Just Waterfall With Extra Steps

    Many if not most of today's software developers will never have heard of waterfall development or might think it's something to do with performance tooling. When I studied software engineering many many years ago, Waterfall was the state of the art because prior to that, it had simply been chaos. The idea behind waterfall development is there would be strict phases of software engineering from requirements gathering through the specification, coding, testing, delivery, and maintenance. Something like that from memory, and this was then meant to ensure that we had software quality. Waterfall has long since been abandoned for agile methods. The Agile Manifesto was all about moving away from waterfall. Something curious has happened in the last year or so when it comes to AI and software engineering. Spec-driven development has gained some real interest, that many consider it to be somewhat like waterfall. This particular piece treats spec-driven development as a kind of structure. Draw man. But I think it's worth considering the point being made.

    AI in 2025: gestalt — LessWrong

    AI, LLMs

    My view: compared to last year, AI is much more impressive but not proportionally more useful. They improved on some things they were explicitly optimised for (coding, vision, OCR, benchmarks), and did not hugely improve on everything else. Progress is thus (still!) consistent with current frontier training bringing more things in-distribution rather than generalising very far.

    Source: AI in 2025: gestalt — LessWrong

    A rather comprehensive look at how frontier light-language models have evolved and improved over the last twelve months or so across a number of different aspects.

    llm weights vs the papercuts of corporate

    AI, AI Engineering, AI Native Dev, LLMs, software engineering

    We are now one year in where a new category of companies has been founded whereby the majority of the software behind that company was code-generated.From here on out I’m going to call to these companies as model weight first. This category of companies can be defined as any company that is building with the data (“grain”) that has been baked into the large language models.

    Model weight first companies do not require as much context engineering. They’re not stuffing the context window with rules to try attempt to override and change the base models to fit a pre-existing corporate standard and conceptualisation of how software should be.

    Source: llm weights vs the papercuts of corporate

    My instinct is, and this will be a seminal observation, as we evolve the way we work with large language models as software engineers. As Jeff Huntley observes here, one approach is to bend the models to our approach to software engineering. That's largely what we've been doing for the last three years, whether it's begging them to output in JSON through to filling their context with Agents.md files. But I think Jeff was really onto something here with his observation that there's a different approach, and that is to go with the flow of how an LLM wants to work rather than work against its instincts. This brought to mind a great interview with Brett Taylor some months ago now at Latent Space, where he talked about the AI architect and how the role of software engineers will increasingly be less and less about writing the code and more and more about guiding the outcomes.

    A Software Engineer’s Guide to Agentic Software Development

    AI, coding agent, LLMs, software engineering

    Daily schedule graphic with time blocks from 9 AM to 5 PM:

    I’ve cracked the code on breaking the eternal cycle – features win, tech debt piles up, codebase becomes ‘legacy’, and an eventual rewrite. Using coding agents at GitHub, I now merge multiple tech debt PRs weekly while still delivering features. Tickets open for months get closed. ‘Too hard to change’ code actually improves. This is the story of the workflow.

    Source: A Software Engineer’s Guide to Agentic Software Development

    Brittany Ellich a software engineer at GitHub shares how she works with Agentic coding tools day-in-day-out in her job.

    A pragmatic guide to LLM evals for devs

    AI, evals, LLMs

    Illustration of three islands labeled Developer, Data, and LLM Pipeline, connected by bridges representing challenges: Gulf of Comprehension, Gulf of Specification, and Gulf of Generalization; each gulf includes notes on limitations in communication, data interpretation, and model behavior in the context of large language model development.

    One word that keeps cropping up when I talk with software engineers who build large language model (LLM)-based solutions is “evals”. They use evaluations to verify that LLM solutions work well enough because LLMs are non-deterministic, meaning there’s no guarantee they’ll provide the same answer to the same question twice. This makes it more complicated to verify that things work according to spec than it does with other software, for which automated tests are available.

    Evals feel like they are becoming a core part of the AI engineering toolset. And because they are also becoming part of CI/CD pipelines, we, software engineers, should understand them better — especially because we might need to use them sooner rather than later! So, what do good evals look like, and how should this non-deterministic-testing space be approached?

    Source: A pragmatic guide to LLM evals for devs

    E-vals are a core part of debugging LLM-based systems, managing non-determinism and ensuring quality of output. This is a really good introduction to the concept and some of the key ideas based on real-world case studies.

    Resonant Computing Manifesto

    Design, software engineering

    Pen and ink or charcoal-style drawing of people sitting at an outdoor European-style café with tables, chairs, and umbrellas, suggesting a casual, social atmosphere.

    And so, we find ourselves at this crossroads. Regardless of which path we choose, the future of computing will be hyper-personalized. The question is whether that personalization will be in service of keeping us passively glued to screens—wading around in the shallows, stripped of agency—or whether it will enable us to direct more attention to what matters.

    In order to build the resonant technological future we want for ourselves, we will have to resist the seductive logic of hyper-scale, and challenge the business and cultural assumptions that hold it in place. We will have to make deliberate decisions that stand in the face of accepted best practices—rethinking the system architectures, design patterns, and business models that have undergirded the tech industry for decades.

    Source: Resonant Computing Manifesto

    It's no surprise that this manifesto on resonant computing resonates with me (sorry, not sorry). Two of its drafters are past speakers at our conference. And people whose thinking and work I very much admire, Maggie Appleton and Simon Wilson. Those of us in my expense drawn to the early web and its promise were drawn to principles like this and the hope the web could connect us in positive, uplifting ways. The last 20 years or so have gone rather differently for all kinds of reasons. We can get into elsewhere. That doesn't mean we can't take a deep breath, take stock, and commit to doing something better, as this manifesto challenges us to do. Many of the signatories have also spoken at our conferences. I invite you to join in signing it, too.

    Architecture, Specification, Execution: A Paradigm for AI-Accelerated Development

    AI, LLMs, software engineering, spec driven development

    In this post, I’m going to share a paradigm that’s been working for me. To be clear: I’m not advocating for any particular products – Copilot, Kiro, Cursor, they’re all amazing. What I’m offering is an approach that works regardless of which tools you choose, delivering the compounding returns vibe coding never reaches.

    Here’s the core principle: carve the path for AI to follow, don’t walk it yourself.

    Your job as the engineer is to set direction, establish constraints, and define success. AI’s job is to execute within those boundaries. Mix these roles and you’ll just muddy the waters.

    This paradigm builds on spec-driven development, and it consists of three pillars:

    • Architecture – Document the decisions that shape your system
    • Specification – Define the features within those constraints
    • Execution – Prompt and let it run

    Source: Architecture, Specification, Execution: A Paradigm for AI-Accelerated Development

    We've been collecting approaches to developing with AI and large language models as software engineers, not because we necessarily think a specific approach is the right one, but because we're at such an early stage, it's interesting to see these patterns emerge. Here Anthony Martinović shares his approach.

    State of AI | OpenRouter

    AI, AI Engineering, LLMs

    Stacked bar chart titled "Dominant Categories Over Time" showing the percentage of total tokens used per category from May to November 2025, with programming increasing from 11% to around 50%, highlighted by a black arrow and label.

    Categories: How Are People Using LLMs?Understanding the distribution of tasks that users perform with LLMs is central to assessing real-world demand and model–market fit. As described in the Data and Methodology section, we categorized billions of model interactions into high-level application categories. In the Open vs. Closed Source Models section, we focused on open source models to see community-driven usage. Here, we broaden the lens to all LLM usage on OpenRouter (both closed and open models) to get a comprehensive picture of what people use LLMs for in practice.

    Source: State of AI | OpenRouter

    OpenRouter is a service that unifies APIs across different large language model API providers. One thing they have is deep insight into which models are being used and how they're being used. In this pretty detailed paper, they outline, based on the traffic they see, how different models are being used and which models are used and to what extent. One thing I think well worth noticing here is that code generation, or use for software engineering, accounts for over 50% on a per token basis of all large language model use. And, perhaps a little bit more surprising, but not so much if you work with these technologies, Claude's models are 60% of token usage in this category. Other use cases fall away pretty quickly. Role play is one that got early traction but which seems to be fading somewhat as an overall percentage of token use. Although I imagine growing in absolute terms. And then other areas which gained initial early traction like marketing automation and legal applications. While they get quite a bit of attention, they get quite a bit less use than above all the software engineering use case.

    Design Systems for AI: Introducing the Context Engine | by Diana Wolosin | Design Systems | Nov, 2025 | Design Systems Collective

    AI, Design Systems, LLMs

    Infographic titled "Stop Overthinking Every Design Decision" featuring 7 UX design frameworks: 1) OODA Loop, 2) First Principles, 3) 6 UX Thinking Modes, 4) 80/20 Impact Rule, 5) 5 WHY Framework, 6) Clean Design Rule, and 7) 5x5 UX Rule, each in a colored box with brief descriptions below.

    For years, design systems have served one primary purpose: humans. They document patterns, components, decisions, and principles, all presented in formats meant for designers and engineers to read, interpret, and translate into products.

    However, the moment AI entered your workflow, one truth became painfully clear. Your tokens, guidelines, accessibility rules, and UX patterns don’t matter if the LLM consuming them can’t read them as structured, meaningful context.This is why AI prototypes often fail: they feature off-brand UI, inconsistent layouts, vague flows, and content that doesn’t align with the intended personality. It’s not hallucination, it’s missing context.

    Source: Design Systems for AI: Introducing the Context Engine | by Diana Wolosin | Design Systems | Nov, 2025 | Design Systems Collective

    As Diana Wolosin observes design systems were created by humans for humans, which made a lot of sense until LLMs came around. Here she asks: "What happens to design systems when AI becomes a our new user?"

    Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl

    AI, LLMs, software engineering, spec driven development

    After looking over the usages of the term, and some of the tools that claim to be implementing SDD, it seems to me that in reality, there are multiple implementation levels to it:

    • Spec-first: A well thought-out spec is written first, and then used in the AI-assisted development workflow for the task at hand.
    • Spec-anchored: The spec is kept even after the task is complete, to continue using it for evolution and maintenance of the respective feature.
    • Spec-as-source: The spec is the main source file over time, and only the spec is edited by the human, the human never touches the code.

    All SDD approaches and definitions I’ve found are spec-first, but not all strive to be spec-anchored or spec-as-source. And often it’s left vague or totally open what the spec maintenance strategy over time is meant to be.

    Source: Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl

    Birgitta Böckeler looks at "spec driven development" and what she identifies as three different flavours of this approach.

    Spec-Driven Development: The Waterfall Strikes Back

    AI, LLMs, software engineering, spec driven development

    Scene from Star Wars featuring AT-AT walkers and Rebel snowspeeders in battle on the snowy planet Hoth, with laser blasts and mountainous terrain in the background.

    Spec-Driven Development (SDD) revives the old idea of heavy documentation before coding — an echo of the Waterfall era. While it promises structure for AI-driven programming, it risks burying agility under layers of Markdown. This post explores why a more iterative, natural-language approach may better fit modern development.

    Source: Spec-Driven Development: The Waterfall Strikes Back

    Spec-driven development is an approach to developing software with large language models that has gained some traction in recent months. Here, François Zaninotto explores the why and how of this approach.

    Introducing AI, the Firefox way: A look at what we’re working on and how you can help shape it

    AI, LLMs

    Illustration of a stylized Firefox browser window with a dropdown menu showing three options: "Current Window," "AI Window" (highlighted with a cursor pointer), and "Private Window," set against a retro-futuristic grid background with sparkles and stars.

    With AI becoming a more widely adopted interface to the web, the principles of transparency, accountability, and respect for user agency are critical to keeping it free, open, and accessible to all. As an independent browser, we are well positioned to uphold these principles.

    While others are building AI experiences that keep you locked in a conversational loop, we see a different path — one where AI serves as a trusted companion, enhancing your browsing experience and guiding you outward to the broader web.We believe standing still while technology moves forward doesn’t benefit the web or humanity. That’s why we see it as our responsibility to shape how AI integrates into the web — in ways that protect and give people more choice, not less.

    Source: Introducing AI, the Firefox way: A look at what we’re working on and how you can help shape it

    AI in one form or another has been in our browsers for many years with the speech APIs that predate large-language models. More recently with Chrome, we've seen general and specific APIs being experimented with. Firefox 2 has started with similar experiments. Here the Firefox AI team talk about their philosophy of why and how they are implementing AI in the browser.

    Storage in the browser

    cookies, indexedDB, localstorage, offline, webstorage

    From cookies to IndexedDB and more, there are a growing number of ways to persist data in the browser. This is an excellent overview of these.