12:30 pm
Portrait of AJ Fisher

Fail fast, fix faster: Why faster AI models beat smarter ones

AJ Fisher Technologist & Writer ajfisher.me

The smartest model doesn't always win.

In agentic coding loops, a model that is 10x faster but only marginally competent can often fail its way to success before a frontier model finishes reasoning.

AJ Fisher breaks down the maths behind this counterintuitive result using diffusion models like Inception Labs' Mercury 2. Unlike autoregressive models that generate tokens sequentially, diffusion models refine outputs in parallel, removing a serial bottleneck that slows iterative agent loops.

If each attempt improves a solution by even 20%, dozens of iterations per minute quickly compound into faster convergence than slow, high-quality reasoning.

With live code examples and a bit of napkin maths, this talk shows why loop velocity is becoming the dominant factor in AI-assisted engineering, and why verification, not model intelligence, will become the real bottleneck.

The key question isn't "how smart is your model?" It's "how fast is your loop?"

12:50 pm
Portrait of Ally Macdonald

Building Frameworks Building Systems

Ally Macdonald Staff Builder Stile Education

CI infrastructure is ripe for the vibing — so why don’t we? I have been.

Our company must deliver 200 interactive science and maths games for millions of students this year. Only a few years ago were we making 30, by hand, for significantly fewer students. Rather than handcrafting 200 interactive games though…what if we could make something do it for us? What if we could build a system that builds the system for us?

CI is the perfect place for all our guardrails and all of our systems to come hang out. Without the right grounding, without the right guardrails… the pipelines drop off a cliff. Without the right observability stack, you can’t see what’s going on when things do go wrong — or when things go right. And then…you can’t scale it.

So I vibed it all.

And it delivered.

1:10 pm
Portrait of Jason Cornwall

Why AI coding tools might not make the slightest difference

Jason Cornwall Head of Engineering Enablement SEEK

Most “AI gives you 10x productivity” stories assume coding is the bottleneck. For large and mature companies this is almost never the case, so you roll out AI coding tools, people feel faster, but delivery metrics barely move. In this talk I’ll show how we used a Theory of Constraints approach at SEEK to find the actual bottlenecks holding back throughput, and how that changed our AI productivity strategy resulting in $1.5m / year in measured productivity gain.

You’ll leave with a practical playbook: what to measure, how to run experiments, what interventions usually unlock the next step, and how to get investment that optimises both for humans and coding agents. We'll cover where AI can genuinely help cross-functionally across software delivery, and enabling changes you might need to roles and responsibilities, collaboration practices and platform capabilities.

1:30 pm
Portrait of Prem Pillai

Constitutional Prompting: Making AI Coding Agents Reliable Without the Iteration Tax

Prem Pillai Sr. AI Engineer Block Inc

Every engineering team trying to automate developer workflows with AI agents hits the same wall: the iteration tax. You ask an agent to review a PR, scaffold a feature, or audit code quality — it does something almost right, you correct it, it overcorrects, you add guardrails, it gets confused. Four round-trips later you have acceptable output and a prompt that's fragile, opaque, and impossible to hand off to another engineer on your team.

Constitutional prompting is a pattern that eliminates this loop for developer tooling. Instead of iterating toward correctness at runtime, you encode your team's engineering standards, workflow constraints, and output contracts directly into structured agent specifications - upfront, before the agent ever runs. Think of it as writing a constitution for how an agent should behave within your development workflow: parameter schemas, numbered workflow phases, anti-loop directives that prevent the agent from second-guessing itself, and typed JSON output contracts that make success or failure unambiguous.

I've used this pattern to build rp1, an open-source framework with 36 specialised agents that automate real developer workflows - review, feature development, code auditing, autonomous research - executing in a single pass without human intervention mid-workflow. Hundreds of engineers at Block use these agents daily, and the results have been striking: fewer iteration cycles, higher first-pass code quality, and agents that consistently follow team conventions without drift.

This talk breaks down the anatomy of a constitutional agent for developer workflows, walks through the failure modes it prevents, and shares hard numbers on iteration reduction and first-pass success rates. You'll leave with a concrete framework you can apply to your own engineering automation immediately.

1:50 pm
Portrait of Michael Zhang

From Zero to Production: How 15 Engineers Shipped a Production LLM Product with AI Coding Tools

Michael Zhang Principal ML Engineer MYOB

How a team of fewer than 15 engineers at MYOB took an AI-powered chat experience from zero to production, embedded directly inside the product serving real small business owners and accountants. Leaning heavily into AI-assisted coding throughout the entire development lifecycle, using tools like Cursor and Claude Code as genuine force multipliers. Covers the AI Engineering challenges of productionising LLM features at scale — evaluation, guardrails, latency, hallucination management — and what it actually looks like when a small team uses AI coding tools to ship faster than anyone expected.

2:10 pm
Portrait of Ron Au

Multi-Armed Bandits: The Scientific Shotgun for Evals

Ron Au Senior Software Engineer Canva (Leonardo.Ai)

A/B testing is too rigid a tool for AI systems. You're stuck serving worse results for the duration of the experiment and getting billed for slower models while three providers release SOTA updates this week.

Steal a trick from data science instead and use multi-armed bandits to organically surface ideal models, prompting choices and harnesses. You want your evals to be more than scores– make them an exploration in minimising regret.