A Gentle Intro to Running a Local LLM

February 11, 2025

But there is an overarching story across the field: LLMs are getting smarter and more efficient.And while we continually hear about LLMs getting smarter, before the DeepSeek kerfuffle we didn’t hear so much about improvements in model efficiency. But models have been getting steadily more efficient, for years now. Those who keep tabs on these smaller models know that DeepSeek wasn’t a step-change anomaly, but an incremental step in an ongoing narrative.

These open models are now good enough that you – yes, you – can run a useful, private model for free on your own computer. And I’ll walk you through it.

Source: A Gentle Intro to Running a Local LLM | Drew Breunig

I believe large language models are a transformative new paradigm of computing.

Can they do all the things they are hyped to do well? No. Will they ever be able to? An open, and in many ways unimportant question. Since they can already do many things incredibly well.

If your career involves making things that people interact with on computers, and you aren’t actively exploring the impact of these technologies on the world you do, I share Geoffrey Huntley’s view that very quickly you may find yourself vastly less productive than you would otherwise be (and your peers who do expose these technologies will have become).

Mot of the ways we have work with these models until now has been via some sort of cloud service, whether a foundation model company, like OpenAI or Anthropic, a cloud computing service like Azure, AWS or Google Cloud Platform, or a host of open models like Hugging Face.

But it is becoming increasingly feasible to run models on your own consumer grade hardware as Drew Breunig examines here (and even in the browser as we’ll explore with our online conference Inference later in the year.)

Related videos