Introduction to Next Generation Application Development

Alex Danilo speaks about the evolution of application development, urging to look beyond traditional web app development due to the advanced capabilities of modern browsers, and introduces the concept of Local First Software.

Local First Software Movement

Danilo introduces the Local First Software movement, emphasizing the importance of data privacy and the drawbacks of cloud dependency. He also highlights the potential risks associated with relying solely on cloud services.

Evolution of Progressive Web Apps

Focusing on the evolution of web apps, Danilo argues against building for backward compatibility and instead suggests focusing on modern browser capabilities to create web platform apps.

Building a Revolutionary App

Danilo shares a personal vision for an application that enhances real-time document editing and collaboration, which leverages modern web technologies like WebAssembly and WebRTC.

Utilizing Web Tech for Application Development

Discussion on how contemporary browser features like WebAssembly, WebRTC, and the File System Access API can be utilized to realize complex applications directly within the browser.

Overcoming Development Challenges

Addressing the major challenge of layout in application development, Danilo discusses the potential of WASM and Canvas to achieve consistent rendering across browsers.

Architecture of a Web-based Word Processor

Exploring the architecture behind a hypothetical WebAssembly-based word processor, highlighting how different aspects of word processing can be handled efficiently through the browser.

Demonstration of 'Pages'

Danilo demonstrates 'Pages,' a web app capable of handling complex word processing tasks, showcasing document editing and layout capabilities.

Handling Image Formats within Documents

Discussion on the challenges of managing different image formats in documents, particularly proprietary formats, and how leveraging browser capabilities can alleviate these issues.

Real-time Collaboration Features

Exploring potential for real-time collaboration using WebRTC's data channel, Danilo discusses the technical possibilities and benefits of incorporating seamless, collaborative features into applications.

Introduction to WebRTC and Document Sharing

The presentation begins with a focus on using WebRTC for setting up data channels to share documents directly between devices. It highlights the use of progressive web apps and QR codes for easier document sharing and collaboration.

Enhancing User Experience in Application Development

Discussion shifts towards improving user experiences through handling large documents and maintaining UI responsiveness by offloading tasks to WebWorkers and leveraging Comlink for asynchronous communications. Also, mentions using offscreen canvas for painting in the background to keep the interface responsive.

Collaboration and Avoiding the Fishbowl Effect

Introduces concerns regarding the traditional collaboration model as exemplified by Google Docs, which could cause anxiety among users. It moves on to discuss the transition from Operational Transforms to Conflict-Free Replicated Data Types (CRDTs) for a more decentralized form of document editing, akin to the functionality of Git.

Motivation Behind Developing Client-Side Applications

The speaker explains the rationale for developing complex, client-side applications, emphasizing the longevity and user control over data, cost-effective scaling, compatibility across devices, and the non-requirement for installing software for one-off uses.

Encouragement to Embrace Web Technologies

The conclusion encourages application developers to view web technologies not as secondary to native developments but as robust and mature options for building applications. It underscores that technologies like WebAssembly, WebRTC, and Canvas are well-established and should be utilized to create powerful web applications.

Hi everyone.

About 40 years ago, a lady called Tracy Kidder wrote this book called The Soul of a New Machine and it tracked a project where a group of engineers were racing to build a new piece of hardware.

Like basically challenging time to bring something new to life.

So I want to talk today about a similar approach but for an application.

So how you would build a new next generation application, or how you would approach application development as such.

So the first message I want to bring across is to forget the web.

And what I mean by that is, when we're building applications, there was this line of, you build a native app, you build a web app, and that kind of made sense five, maybe ten years ago.

But today, browsers are so capable that we need to like shed that whole mindset, you can use any language you like.

You can write in Rust if you want and compile to WebAssembly.

You can write in TypeScript.

You can use any language you like.

So just think of it more as I want to build an application.

How am I going to build it?

Of course you build it on the platform and the platform is the browser.

So I'm dropping the word web here because the browser is a platform that has become so incredibly capable in the last five years, or more.

It's just constantly improving.

New APIs are coming in.

And so you come along and think, I want to build a kind of a next generation application.

How would I go about building it?

And what would I build?

So I want to talk very briefly as an intro about this concept that is now a bit of a movement called Local First Software.

I don't know, has anyone heard of Local First Software?

Yes, a few people.

Local First Software is this movement where we take control of our data, we take control of our compute, and we stop relying on cloud services.

For example, something like, a Figma.

If Figma went out of business, what happens to all your designs?

Gone, lost in the cloud.

Or if you base your business on a cloud based service and that cloud based service gets turned down or sunsetted, you've lost your data.

So the local first software kind of approach is building applications that run anywhere but primarily in the browser, but as PWAs, that you can load, that actually cache in your service worker and allow you to work on your files on your machine and nothing actually goes to the cloud.

So it's a client control kind of concept.

And why would you want to do this?

Because they're watching you.

Did you know that?

The other day I actually had to put together a document on Google Docs about my son had to put a list of the countries he traveled to in the last few years and I put it all in and, Fiji, we went those dates.

And then, next thing I picked up my Android phone and I've got ads for Fiji and I've never looked up Fiji in maybe five to ten years.

Because, of course, once you lose the privacy by sending things into the cloud, you get served ads.

You may not think you are, but has anyone here read the EULA for Google Docs or, OneDrive?

Bet you haven't.

As soon as you share a doc, it's on six continents and it can be scanned.

The Local First movement is all about maintaining your privacy.

You have your data on your computer, you work on it in your app, and you save it back to your computer.

And if you want to share it with someone, you can.

And of course, in Europe, there's the GDPR with all the kind of privacy regulations that we had to keep in mind.

Another thing that I want to do, Alex Russell will be on at the end of the day, which this is a day that's bookended by two Alexes.

And he's a big proponent of progressive web apps, and we've all heard this bandied around as a term for the last five years, and the way I look at progressive web apps, it's about progressive enhancement, which also means that it's supposed to degrade gracefully to older browsers.

And today, what I'm trying to push to you is forget progressive web app.

Swap the W and the P and think about the web platform itself.

So you're building a web platform app.

Assume your user has a good modern browser.

Because there's no point building for the past.

Now one of the beauties of Windows is that you can bring something from 1980 and it'll run in a 2023 Windows machine.

So forward compatibility is a thing, but you don't have to build for the 1980s machine.

How would I go about building this?

I said I was building this whole new application.

I had this like decade old vision.

It's a bit rusty of an idea, but it's something that I came up with around a decade ago.

And, in fact, here it is.

This is to solve a real world problem.

My brother in law is a barrister, and this is, this use case is basically, he would go to court, and he would go into the, do his pitch with the judge and all that kind of stuff.

And there'd be constantly things that had to be updated in the documents that they were working with.

So he would get on the phone, ring the secretary, and she'd be on Word, in the office, working away, taking, dictation and changing the documents.

And I thought, oh, what if we could build like an app that instrumented Word, and you could beam it across the network into an iPad app, and you could edit it, do it all on the fly, And that's basically what this is.

And if you look at my beautiful messy handwriting, it's instrumented, send the data via Skype style peer to peer proxy, reflect into the thing, save it, make sure you don't break any rules with Microsoft's EULA.

And the bottom point, which I wrote, this is actually 2011 I wrote this, so this is a 12 year old idea, use encryption for all the traffic.

So why don't we try to build something like this, I thought in Web technology.

What browser features do we have that can realize this kind of vision?

Now we have WebAssembly.

Now, has anybody used WebAssembly at all in the room?

A few people.

It's a really powerful primitive that's in the browsers today.

In fact, your browser has it in it.

Figma, things like that are built on WebAssembly.

Adobe brought Photoshop to the web, AutoCAD runs on the web, there's a whole lot of things that are using WebAssembly, a whole lot of new applications, so this is a potential client.

WebRTC.

Now in that vision that I had 12 years ago, it was going to use Skype and peer to peer connections using Skype or some kind of protocol like that.

Every browser has WebRTC built into it.

We use it pretty much every day when you do a Zoom call or a Teams call or a Google Meet call, it's using WebRTC.

We also have the file system access API, which lets you read and write files.

So it's possible to write an app that just goes, Oh, I'll browse my hard disk, read stuff, work on it, save it.

Now imagine if the app that was doing that was actually running on the browser platform.

So there are lots of features like this that would work.

So when you think about building something like this, if you were to build something that would edit, reproducibly.

What do you think is the number one developer pain point?

I think it's right here.

Layout.

So if you were trying to build something that was going to edit this, edit a Word document for example, layout is a nightmare.

Layout has been a nightmare and it's not just cross browser layout because we think about problems with reproducibility.

I worked on the, with Chrome for a long time and on the Chrome team, and we had problems between versions of Chrome.

I had the Commonwealth Bank literally ring me, send a message one time, NetBank broke because the new Chrome release layout changed.

And literally it was a small change in CSS that caused the bank website to break.

So it's not just simple applications.

WASM and Canvas come to the rescue in this application.

Does anyone know how Google Docs and Google Sheets work?

They actually have a giant canvas.

So the page is the canvas.

In the early days it was DOM, but they actually do it all on Canvas now for reproducibility and that's the reason.

Since we have WASM, of course, we can then use that to compile whatever code we liked and speak to the Canvas and produce 100 percent identical results on any browser without risk.

And so today I wanted to shout out to, Microsoft Word.

It's 40th birthday in 2023, if nobody knew.

The original problem was working with Word.

Peer to peer, somehow.

Which brings me to, what the hell is inside a Word doc?

Imagine if you wanted to try and build a Word document, a Word doc editor, in your browser.

This is what's inside a Word document.

Basically, they're a zip archive, and in there is a bunch of files.

There's a document.

Xml, which is your actual file that has the text that you type and all that stuff.

There's a styles tree.

So instead of CSS, it actually has styles specified as XML.

There's a relationships tree, which is basically says, Hey, when the document goes, I want to draw this JPEG image here, it's, it dives into the relationships tree, which tells it where to find the image.

So there's all these pieces.

And from a kind of architectural point of view, that really looks like a lot of DOMs to me.

You can imagine, you could probably take that document dot XML, generate DOM nodes in your browser, and somehow work out how you can render the thing.

The difference here is that a Word doc actually has a bunch of these trees, so You know, can you actually, conceptually think, will I make DOM document fragments and then try and manipulate them to do some kind of word layout?

Would that work?

Or, you could bring your own DOM.

This is basically what I've done in this, approach, is basically rolled my own DOM.

And there are DOMs out there, there are open source ones, you can, there are Apache licensed ones, you can just go and find one if you want to try and do something like this yourselves, and compile it into WebAssembly.

This is an example of how you might structure this application.

We stick to a normal shell, JavaScript user interface.

The advantage of doing that is it's good for accessibility, so all of your menus, all your buttons, all that works, easy to make accessible.

And then you have the, like the page where you're going to display whatever document is that you're working on.

Bring your DOM and your business logic into a compiled WebAssembly blob and then just draw to Canvas.

See, nice and simple.

We can build this, can't we?

I reckon every person in this room could build this.

So I'll just, quick demo of, don't worry, this is fresh off the plates.

I built this thing called Pages.

So what Pages is a, it's a web app.

So let's just see.

Okay, so here's a little document from the Europass.

So this is, it's got columns, tables, all that kind of thing.

Pagination, or, you can load, Nope, don't want to save it.

Not yet.

So for example, here's like another document, so a book example.

And so here you have, it's like a 30 page book and it's running, basically drawing to a canvas, running in WebAssembly, inside the browser.

So it's parsed that zip file, created all the trees, mapped everything out, worked out what it's supposed to do, if you're lucky you can even type in it and it reflows.

So that's what this thing is about.

It is possible.

So it is actually possible to build what I'm talking about in the browser.

But of course, that's all well and good because text is easy to draw on the canvas.

But what about image formats?

So inside a Word document, you have all these image formats and you have your usual suspects, JPEG, PNG, GIF, etc.

But they have these other things.

There's like WMF, which is Windows Metafile, which is some proprietary Windows operating system format thing and vector markup language, which almost became SVG, but Didn't make the cut.

And drawing ML, which is yet another one of their, proprietary language, things that lives in there.

How do you deal with all these?

Architecturally, if you're gonna build something like this, the smartest way is to leverage the browser.

So what I mean by that is.

I could compile libjpeg into the WebAssembly module to draw my jpegs to the canvas, but that's just stupid, because the browser already does that.

So what you do is you build like a hybrid thing, so you offload as much load onto the browser itself to do for you that it can, and write as little code as possible.

Here I've drawn like draw image, so if I was trying to do a JPEG or a PNG or a GIF, I would just call draw image to draw that image on the canvas and send the image itself is living inside the Word doc, of course, so you have to pluck it out inside your WASM module, and then send it across to draw on canvas, and off it goes, and it draws.

For other formats, like the WMF I spoke about, you could take a JavaScript renderer.

You could actually use a JS renderer to do this.

I'll just show you how that could possibly work.

We'll just scroll back to the top here.

Here is an example of a, of a document that actually has a WMF stuff in it.

You see the little shield up the top there?

The text actually flows around that shield.

They do have the concept of flowing around shapes and text.

And the WMFs are there and like you've got multi column, flowing around the text and all this kind of stuff.

Oh, what have I clicked?

and so the way that I've implemented WMF for this case, and it's being redrawn every scroll, by the way, it's in JavaScript.

So there is a library called WMF.JS, which is put out by some people called Sheet.JS, and they allegedly have their code running in Office 365, but they're nice people, and they open sourced it.

So it's great, I'll use it.

So let's just get back to the …. So where do we go from there?

The original concept was, we want to collaborate.

I want to get my document and send it to you and say here.

Yeah, let's work together on this same document because it'd be really nice if we could communicate.

And so WebRTC is the ideal vehicle for this.

Now we're all familiar with it because of video, but does anybody know about the WebRTC data channel?

Yes.

Some people do.

Excellent.

So the beauty of WebRTC is there's a video channel, an audio channel, and you can have data channel.

It's just a bunch of pipes like the internet, right?

And you can actually set up a lot of data channels.

So if I connect to you over WebRTC, I could just send you something and what I can send is obviously a Word doc.

So I can load it up off my laptop and go, here you go.

You can look at it and we can work on it together.

Anyway, I can beam it directly across using data channel when I've done my WebRTC code properly, so that's blame that on me.

The other thing you can do with sharing is that.

There's web share.

So when you actually build a progressive web app, you can actually play with the manifest and set it up so that you can share the document easily.

Browsers these days will generate a QR code.

So it's completely viable to set up your doc in a room, send the QR code, the person you're trying to talk to scans it, and then you start working on it together.

Okay.

So moving forward on where this kind of project slash approach to building applications is going.

It's like thinking about how you can enhance the user experience.

Like one of the problems is that if you are going to deal with things like documents that are maybe a thousand pages long, like a Word doc can be half a gigabyte.

They're limited to 32, I think 32 meg of actual text, but lots of images, et cetera.

So with that original architecture, if you take that centerpiece, which has the WebAssembly blob and all the logic, you can just go and shove it into a WebWorker.

So that way, when you're doing your thousand page scrolling of a giant document, the UI is still responsive because it's sitting in a WebWorker.

Now a nice way to actually offload that code into the WebWorker is using this thing called Comlink.

So Surma's this really great engineer who's over in Europe and he wrote this thing, Comlink, and it basically lets you define a class.

And a bunch of methods on the class and takes care of all the asynchronous communication to the thing that you've shoved in the worker.

And of course, once you've gone and put it into the worker, then you have to work out how you're going to paint.

Because the model we had was running WebAssembly, painting onto a canvas, and then displaying it to the user.

So to do that, web workers support this thing called offscreen canvas.

And so off screen canvas lets you actually do all your painting into something that's not visible.

And then you basically flip the canvas onto the main thread and it shows, to the users.

This is really powerful and it's hardware accelerated these days as well.

So I want to have a little segue into the concept of collaboration.

Now the whole idea of this, document model thing that I was trying to build is talking about letting multiple people work together on a document, whether it's two people or ten people or a hundred people, but we have that collaboration.

People come and say, Oh, we've already got that collaboration, but they haven't considered one problem.

And this is what's called the fishbowl effect.

So there's a lot of user research going around that people are freaked out by the kind of Google Docs-y experience.

Where you can see everybody's cursor.

So you know, you're editing the document together and you can see how bad my typing is.

And you can, oh I'm backspacing.

Oh, you can't even spell and so people do get, this is a real thing.

It's an anxiety thing.

So the technology that underpins that collaboration is a thing called Operational Transforms or OT.

And so there is now cutting edge computer science research is moving away from OT as a model.

And so OT is dead.

And so I bring you a thing called the, conflict free replicated data type, the CRDT.

Now, if you can think of this as, if you used, people used to use revision control systems like Subversion, or CVS, and you would like, it's a single threaded model, you do check ins, check outs, and there's a repository over there that is the single source of truth.

And that's what OT is.

It's.

Everybody's doing the one sequential edit that is interleaved or whatever, but it's basically one line.

Whereas we moved to Git, and so the Git model is that we all get our own copy, we have our own repository, we can check in locally, we don't have to go to the main repo, I can check in on my laptop and do changes and check in.

And this is what CRDT's enabled.

You can basically take the document, And then cut off the WebRTC connection or the network connection and go off and work on your own.

Imagine like journalists, they want to put out, a front page tomorrow.

Here's the master document, that's your column, that's your column, that's your column, that's your column, go away, do your work.

And then they all come back the next day and connect up and the entire document syncs without any conflicts.

And this is the the future of collaborative apps, I believe, anyway.

I guess you're probably asking yourself, why would you go and put yourself through all this kind of pain to build this?

This is a lot of work to try and unpack a stupid Word document in a fully client side thing.

The answer is simple.

It's a standard file format that is going to be around forever.

It predates the web.

There's so many million people in the world that use it.

If my application dies and gets thrown in the bin, people can still use their data.

Their data is not lost in some cloud service or some walled garden.

Because all the compute is client side as well, like for people that are running these businesses, running cloud services, it's super cheap to scale.

Because all you're doing is you're downloading, this blob, that, the WebAssembly blob that is doing, running that demo that does all that is like 660k, so it's very small.

It's less than one typical image.

And every device on the planet can run it.

You can run this on your phone, Canvas is hardware accelerated everywhere, and you don't have to install it if you don't want to.

If you really make it an installable PWA, people will install it.

But if there's a simple scenario where I just want to interact one time with this person, here you go, beam, doc, let's work together.

You probably had to make another point.

I'm trying to convince all of you to work with these APIs, and you're probably thinking, this is all cutting edge stuff, that nobody has this in their browser, surely this is like a bleeding edge, and nothing here that I've shown you today is bleeding edge even remotely.

WebAssembly's been around seven years now.

It's mature as anything.

WebRTC that we use for all our video conferencing and now we can use for data channel is 12 years old and Canvas is even older.

So this is all easy stuff to build.

My message here today really is for you application developers, when you're building things, to think about the web, not as a web thing that's distinct from a native application, or native operating system, or should I build on iOS, should I build on Android, should I build for the web?

The biggest platform in the world, the widest platform, is the browser, so I encourage you to use it.

That's all I've got.

Thank you.

The Soul Of A New Application

Alex Danilo - Projectadoc
Abstract smoke patterns over a dark background with a stylized flame-like logo in the center.

Forget the Web

Just an Application

Background image of a spider's web

The Browser Is the Platform

A subway train stationary at the platform with open doors and passengers inside. Several people are walking on the platform, which has green exit signs and an electronic information board.

Local First Software

Skip The Cloud

An indoor marketplace with various stalls displaying fresh produce and signs
A collection of various stylized illustrations of eyes with different expressions and makeup. There is a large central pair of eyes with prominent eyelashes and colorful eye shadow, surrounded by smaller sets of eyes in different sizes and orientations.

Ads

Privacy

GDPR

A man looking surprised while looking through binoculars with the words 'Ads', 'Privacy', and 'GDPR' overlaid in large blue text.

Web Platform App

An outdoor scene featuring a large glass pyramid structure with water surrounding its base, in front of a classical museum facade with a partly cloudy blue sky in the background. There are several people around the pyramid, indicating a public space.

A Decade old Vision

An abandoned car with faded and peeling paint sits in a dry grassy field with a backdrop of hills under a clear sky.

Handwritten notes with diagrams showing connections between iPad app, srv, firewall, and PC app server.

Browser Features to Realise The Vision

  • Web Assembly (WASM)
  • Web Real Time Comms (WebRTC)
  • File System Access API
  • And lots more...
Background image of a closeup of a human eye with a reflection of a wind turbine on the eye surface suggesting vision or foresight related to technology.

Reproducible Output

#1 Pain Point - layout
A stressed individual with head in hand sitting at a desk with an open notebook and a laptop, suggesting frustration or a challenge with work.

WASM + <canvas> to the rescue

Abstract blue and green textured background

Happy 40th Birthday Word!

A festive birthday cake with "Happy 40th Birthday Word!" written on it in blue icing, surrounded by colorful confetti, with candles and forks on a purple background.

Anatomy of a Word doc

(it's ZIP and lots of XML trees)

Slide showing a diagram of the internal structure of a Word document. It depicts a large ZIP file icon on the left, arrows pointing from it to a series of branching tree structures labeled 'document.xml' at the top and 'styles.xml' at the bottom, simulating XML trees. To the right, there are file icons for different image formats: JPEG, PNG, GIF, and WMF.

Isn't that just a lot of DOMs that live together?

An image showing a close-up of a champagne bottle with a label that is partially visible and glowing bokeh lights in the background.

BYOD

A person wearing a cap is shown from behind, carrying a red bicycle up a set of stairs, with the acronym BYOD in bold text overlayed on the image.

Example Architecture

Diagram depicting a web architecture with three components: a gray outline of a "JS User Interface" representing a web browser window on the left, a central black block labeled "BYOD + logic" above "WASM blob," and a cyan rectangle labeled " >canvas> " on the right. Arrows connect the components to indicate interaction.

Logo of Pagez consisting of a stylized red tulip-like shape with a flame-shaped cutout at the top, accompanied by the word 'Pagez' in black, sans-serif font.

Lots of image formats

  • JPEG
  • PNG
  • GIF
  • WMF
  • VML
  • DrawingML
Open cans of paint with various colors as a visual representation of image formats.

Leverage The Browser

Illustration showing the components of a web application leveraging the browser: a web browser window to the left labeled 'JS User Interface', a central black box labeled 'BYOD + logic' with incoming arrows labeled 'WASM blob' and outgoing arrows 'drawImage()' and 'JS renderer' to a right blue rectangle labeled '<canvas>'. Arrows show the flow of data between components.
A word like UI in the browser.

Sharing

Hello WebRTC

Two young children sharing a toy on a bed, with one child handing it to the other.

WebRTC Data Channel

A background of complex industrial piping with a dark hue.

WebShare

Two silhouetted individuals on opposite sides of a frame seem to exchange a glowing, transparent block with intricate details, symbolizing sharing or transferring data or assets.

Enhanced Features for better UX

A person reclined in surgery with a syringe being inserted into their forehead.

DOM in worker

(kind of, using BYOD)

Comlink

(thanks to Surma)

Offscreen <canvas>

A classical sculpture depicting multiple figures with dramatic expressions and gestures, overlayed with semi-transparent presentation text.

Collaboration

A group of five diverse individuals engaged in a meeting around a table with laptops and mugs in front of them. One is gesturing actively during a discussion, while another is standing by a cork board that has numerous sticky notes attached to it.

The Fish Bowl Effect

Two people side by side, one with a clear sphere enclosing their head, resembling a fishbowl, reflecting light on one side.

'OT' is dead, Long Live the 'CRDT'

"Never show half finished work"

- R. Buckminster Fuller
Construction site at dusk with an unfinished building structure

Why this kind of WPA

  • Offline first, works always - no cloud outages
  • Leverages a standard file format so content lasts
  • Full control of privacy, your data is yours
  • Compute is all client side so scaling is easy
  • Every device on the planet can run it
  • No install required
A smartphone and a book bound with chains and a padlock, suggesting data privacy or security concept.

Built with Mature APIs

  • WASM - 7 years old
  • WebRTC - 12 years old
  • <canvas> - 14 years old
A close-up background image of a textured, rusted metal surface with text overlay listing the ages of various web APIs.

The World’s Widest platform is the Web (Browser)

Use it

A view of Earth from space showing city lights and the geographical outline of continents against a dark backdrop, symbolizing global connectivity.

Thank You

alex@projectadoc.com
X: @alexanderdanilo