12:00
Portrait of Shivay Lamba

Deploying AI at the Edge: Model Compression and Hardware-Aware Optimization

Shivay Lamba Senior AI/ML Engineer Qualcomm

Large AI models often struggle to meet the latency, memory, and power constraints required for real-world edge deployments. This talk explores practical techniques for making modern AI models efficient enough to run on-device using model distillation, quantization, and hardware-aware optimization strategies. Attendees will learn how to reduce model size and inference costs while maintaining accuracy, covering approaches such as post-training quantization and efficient runtime optimization across modern AI frameworks and accelerators. The session will also highlight real-world tradeoffs between performance, memory footprint, and power efficiency when deploying AI applications on edge devices.

12:20
Portrait of Avni Bhatt

When a Small Language Model Beat Our LLM in Production

Avni Bhatt Sr Enterprise Architect

Large language models are often the default choice for production AI systems, even when the task does not require broad reasoning or generative depth. In this talk, I will share a real production case where an LLM-based solution underperformed on latency, cost, and reliability and was ultimately replaced, in part, by a small language model.

The system in question supported a high-volume enterprise workflow involving structured extraction, classification, and validation. While the initial LLM implementation performed well in early prototypes, production usage exposed several issues: inconsistent outputs, escalating inference costs, and difficulty enforcing deterministic behaviour. These problems became more pronounced under scale.

I will walk through the decision process that led us to introduce an SLM, the architectural changes required, and the criteria we used to evaluate success. The talk will cover where the SLM outperformed the LLM, where it clearly did not, and how we designed a hybrid pattern that escalates to an LLM only when necessary.

The session includes a live demo showing the before-and-after behaviour of the system, along with production metrics such as latency, cost per request, and error rates. I will also discuss failure modes we encountered, trade-offs we accepted, and the signals that helped us decide early whether an SLM was a viable replacement.

My aim is not to advocate for SLMs over LLMs in general, but to share the signals, metrics, and decision criteria that helped us choose the right tool for the job. I believe this perspective is timely as more teams move beyond experimentation into sustained production usage.

12:40
Portrait of Jack Rudenko

Multi-Model Collaboration with Claude Code: How to Measure What Actually Works

Jack Rudenko CTO MadAppGang

We built Claudish, a free open-source proxy that lets Claude Code work with any AI model. 15+ providers directly - Google, OpenAI, xAI, Kimi, MiniMax, and more. OpenRouter for even wider access. Or fully offline with Ollama. That was just the starting point. What came next was way more interesting.

When you can run any model through the same interface, you start asking real questions. Which model works best for which task? Does mixing models actually help or is it just expensive complexity? How do you find the right combination for your team? And the hardest one - how do you measure any of this when LLM output is non-deterministic? You can't run the same prompt twice and get the same result. I'll share what we learned running multi-model setups across 100+ projects with a 70-engineer team. How we approach measurement, what surprised us, and a practical framework for engineers who need to evaluate AI tooling with something more than "it feels faster."

13:00
Portrait of Jeremy Kelaher

Edge AI with Direct Device Control

Jeremy Kelaher AI Enablement Architect SBS

Despite all the hype and promise, we are in the Timeshare Mainframe moment of AI. Even our devices rely on the cloud for most inference. As AI moves beyond the cloud and into the physical world, the real opportunity lies at the edge. It’s where local intelligence meets local data and action. In this talk, we explore how AI systems can move from cloud agents to direct device control reducing round-trip latency, preserving user privacy, and enabling real-time responsiveness without constant cloud dependency.

Drawing on experiments using platforms such as NVIDIA Orin Nano, ESP32 and Axera edge AI SoCs, we’ll examine how to architect low-power systems that combine local data and action with inference. This includes running compact speech-to-text and video models on-device and using USB and Bluetooth HID interfaces to translate AI outputs directly into keyboard, mouse, and other human interface device control signals. Attendees will gain an insight into tools such as Platform.io and ready-made modules like those from M5Stack that accelerate edge development.

13:20
Portrait of Matthew Gillard

COBOL and AI: Building a Self-Serve Knowledge Layer for 2,000 Batch Jobs

Matthew Gillard Principal V2 AI

Modernization planning stalls when the business rules are locked inside decades of COBOL code. This talk shares a practical, production‑tested playbook I used to extract those rules, make them explainable, and serve them to teams in a usable form. It’s not economical to have humans extract this level of operational knowledge from COBOL at scale. The outcome of this work is an agent that saves hours for operational staff by surfacing what a batch job does, which input files it consumes, and which outputs it produces.

I’ll walk through the end‑to‑end pipeline: how we used AI to parse COBOL into control‑ and data‑flow structures, generating diagrams that make execution paths and data dependencies visible, and assembling structured knowledge about each job (purpose, inputs, outputs, key rules). The emphasis is on trade‑offs: what we automated vs. where we needed human review, which COBOL constructs are most error‑prone, and how we scaled the approach across a legacy estate of ~2,000 COBOL jobs. Converting specific modules to Python is shown as one possible downstream outcome—but the core goal is understanding and planning. I will demo a self‑serve knowledge agent we built for developers and business analysts. It makes available the original code repositories plus the derived diagrams and extracted rules, so teams can ask questions like “where is premium eligibility calculated?” and get grounded answers with traceable sources. This will include a live demo using a public COBOL repository so the workflow is reproducible without proprietary code.

13:40
Portrait of Chris Rickard

Legacy Software + Agentic Discovery

Chris Rickard Founder & CEO Userdoc

Legacy Software powers the world - from banking to utilities and government. The hardest part isn’t the code - we have the code.... it’s when the old guy with the beard leaves, and the knowledge walks out with him: what the code really means, the system truth, the business rules, and the original intent.

To modernise safely you need more than technical understanding - you need functional understanding. A legacy codebase is a crime scene: you have to retrace the steps, gather evidence, and reconstruct the story in plain language everyone can work with.

In this session I’ll share learnings from building a software reverse-engineering platform, including the real trade-offs between quality, cost, and speed, plus case studies showing how teams have de-risked modernisation by turning 12M lines of legacy code into living requirements - in weeks not years, and for a fraction of the cost. You’ll leave with practical patterns for agentic discovery, where it breaks down, and how to keep it honest.