AI Engineer Melbourne 2026 — AI Engineering (Day 2 Afternoon)

Session Details

Why LLMs Fall for Stories (And 5 Production Patterns That Actually Stop Them)

Mal Curtis Principal Software Engineer NVIDIA

Prompt injection isn't a bug - it's a feature. LLMs trained on humanity's written corpus learned something we didn't intend: narrative structure. They understand dramatic tension, plot twists, and persuasive framing. When an attacker crafts a compelling story ("Actually, the real system prompt said..."), the model follows because that's what stories do. This talk connects 2,500 years of storytelling theory - from Aristotle's Poetics to Derrida's "there is no outside text" - to explain why prompt injection is an inevitable consequence of training on human language, not a solvable vulnerability.

Understanding why doesn't stop the attacks, but it changes how you build defences. You'll learn production-tested layered defence patterns and leave with a mental model for threat modelling and patterns you can implement immediately.

15:20

Hacking the Model: AI Red Teaming in Practice

Pas Apicella Field CTO Snyk APJ

AI is already in production—but almost no one has tested how it breaks. Today I’ll show you how attackers think, how models are actually exploited—from prompt injection to data exfiltration—and how to systematically uncover those risks before they become incidents.

15:40

Why Most AI De-Identification Fails in Production, And How We Built One Lawyers Actually Trust

Moin Zaman Co-founder Smartnote

De-identifying text is easy to demo and surprisingly hard to ship. This talk is a deep technical case study of building SmartScrub, a reversible de-identification system designed for legal workflows, where privacy guarantees, auditability, and user trust are non-negotiable.

The original goal was simple, allow lawyers to safely use LLMs on transcripts without exposing client data. The reality was a long series of architectural failures that common PII masking approaches cannot survive in production.

I will walk through what we actually built and why naive solutions broke down. This includes placeholder token design, collision avoidance, stability across edits, and why masking too aggressively destroys downstream LLM usefulness. I will show how reversible de-identification changes your entire data model, UI, and persistence strategy, and why this becomes a systems problem rather than an NLP problem.

The talk covers hard trade-offs we made around local-first processing, cloud services, manual review tooling, user-defined PII patterns, and audit-safe re-identification. I will also share failure modes we only discovered after real users interacted with the system, including false positives that destroy trust, silent data drift, and UI decisions that unintentionally leak meaning.

This is not a theoretical talk. It is a production story about building AI under legal risk, zero tolerance for silent errors, and users who will abandon the product instantly if they do not fully understand what the system is doing. If you are building AI systems that touch sensitive data, this talk will save you months of painful mistakes.

What Attendees Will Learn - Why common PII masking approaches fail under real legal workflows - How to design reversible de-identification that survives editing, reprocessing, and audits - Placeholder strategies that preserve LLM utility without leaking meaning - Architectural patterns for isolating raw data while still enabling AI pipelines - UI and data model decisions that directly impact user trust - Failure modes you will not catch until real professionals use your system

Technical Topics Covered - Reversible de-identification architectures - Placeholder token stability and mapping persistence - Manual scrub tooling and override precedence - User-defined PII pattern overlays - Auditability and re-identification guarantees - Local-first vs cloud processing trade-offs - Why this problem is systems engineering, not just NLP

Plus. a short live walkthrough showing how a legal transcript is de-identified, reviewed, edited, and safely re-identified, including examples of failure cases and how the system prevents them.

16:00

Are Your AI Agents Secure? Defending the Privileged Agent

Daizen Ikehara Principal Developer Advocate Auth0

Are the AI agents you're developing truly secure?

AI agents that execute actions autonomously offer unprecedented value. But what about the "privileges" granted to them to act "on behalf of the user"?

Improper privilege management for agents is no longer a theoretical problem—it's a clear and present danger. An exploited AI agent with excessive privileges can lead to significant financial losses and devastating data breaches.

This session dives deep into the biggest pitfall in AI agent development: privilege and authorization. I will demystify the latest risks, such as Excessive Agency and Identity Abuse, and discuss defensive measures you can take to protect your AI agents from malicious actors. This is the critical security state that every development organization must understand before deploying AI agents into production.

16:20

Your Agents Pass Every Benchmark—Then Memory Breaks Them in Production

Ananya Roy AI Architect Databricks

You add memory to your agent, it works great in testing, and you ship it. A few weeks later, outputs start getting worse and nobody can figure out why. The agent is pulling in old information that's no longer true, retrieving context that's loosely related but clutters its reasoning, and sometimes carrying forward bad data that quietly corrupts every response after it. Standard evals won't catch any of this because they test single turns, not how memory behaves over hundreds of sessions. In this talk, we will walk through practical design principles and evaluation patterns you can implement to detect memory degradation before your users notice it. You'll walk away knowing how to design and evaluate memory enabled agents so it actually makes your agent more reliable instead of silently breaking it.

16:40

AI Agents Are Distributed Systems

Lovee Jain Senior Software Engineer | Google Developer Expert | AWS Community Builder

AI agents aren’t magic. They’re distributed systems — with better marketing.

Behind every impressive demo is a messy reality: multiple tools, remote services, auth boundaries, latency, retries, side effects, and deployment trade-offs. When I took a seemingly simple multi-tool agent built with MCP and Gemini ADK and pushed it into production, I stopped thinking about prompts — and started thinking about architecture.

In this talk, I’ll share what changed when the agent left localhost.

We’ll explore what happens when tools become independently deployed services, when stdio orchestration meets HTTP in the real world, and when generating an image, storing it, and emailing it turns into a reliability problem — not just a feature.

You’ll see how treating the agent as a control plane — and exposing it as a service — transforms it from a demo into infrastructure.

This isn’t a code walkthrough. It’s a systems story.

If you’re building AI agents meant to survive outside a notebook, this talk is about the parts no one shows in the demo.

Conffab