The Mediated Web

Introduction and Overview

Rupert Manfredi begins by setting the stage for a discussion on how the interaction with the web will drastically change in the next five to ten years. He introduces his background in AI and web interface design at companies like Adept and Mozilla's Innovation Studio.

What Makes the Web Special?

Manfredi discusses the unique aspects of the web, emphasizing user control and the ability to inspect and modify web content directly through browsers, which differentiates it from other forms of software.

The User Agent Concept

Explains the concept of the user agent in web architecture, which acts as an intermediary between the user and the network, facilitating user actions like browsing, editing, or automated tasks without direct user involvement.

User Control and Accessibility

Discusses the various ways users can manipulate their web experience for accessibility, such as changing font sizes or using text-based browsers, highlighting the intrinsic user empowerment built into the web's design.

Evolution and Challenges of the Modern Web

Reflects on how advancements in technology have made the web less open and more complex, leading to challenges in maintaining user agency against the commercial and technical structures that increasingly control web experiences.

Introduction to Mediated Interfaces

Rupert discusses the concept of mediated interfaces, illustrating how technological advancements have allowed for the development of systems that perform complex tasks on behalf of users, such as navigating a car or browsing the web more intuitively.

The Concept of the Mediated Web

Explores the idea of a mediated web where interaction is not direct but through layers of software that enhance user agency by simplifying actions and tailoring outputs to user intent.

Advancing AI's Role in Web Interaction

Details current and potential uses of AI to improve how users interact with web information, making the process more efficient by reducing the steps between intent and outcome.

Demonstrating a Mediated Web Experience

Shares a personal project, showcasing how AI can personalize web interactions based on user-specific data, suggesting restaurants and answering queries by integrating comprehensive web insights.

Future Implications for the Web

Discusses the implications of these technologies on the structure of the web, considering how they might change the economic models underlying current web systems and potentially lead to a more segmented and personalized web experience.

Emerging Trends in Web Monetization and Access

Discusses recent moves by platforms like Reddit to restrict data crawling, signaling a potential shift towards a web divided into freely accessible and premium, paywalled segments. This section also explores the implications of these changes for web user agents and the broader digital ecosystem.

Consolidation and Control of Web Content

Explores how content consolidation into major platforms could lead to a more restricted web environment, limiting smaller creators' ability to remain independent and potentially undermining net neutrality principles.

Future of Web Search and Interaction

Speculates on the transformation of search functionalities from interactive web pages to service-based interactions controlled by large corporations, emphasizing the strategic shifts in content access and the potential loss of an open web.

Potential Solutions and Web Payment Models

Proposes a web payments layer as a solution to maintain an open web, where user agents handle microtransactions to access content. This model suggests a balance between free access and compensating content creators through innovative financial mechanisms.

Concluding Thoughts on the Future Web

Summarizes the challenges and opportunities the web faces, urging the audience to anticipate changes, engage in discussions, and explore new models to preserve the web's open, user-driven nature. Calls for proactive steps to ensure the web remains a beneficial platform for future generations.

Morning.

So I want to make the case to you today that the way we interact with the web, so as an open, navigable network of documents and apps displayed through this browser window, is going to change really drastically over maybe the next five to ten years.

Before I like quite get started on that journey, a little about me, John gave a bit of an intro.

So I'm a designer and programmer, I'm working at Adept, which is a company over in San Francisco that's looking at how computers basically can be augmented by AI and some of these models emerging, multimodal models, large language models, and what I'm in particular really interested in what the interface of that looks like, how we actually interact with computers in this future.

Before that I was at Mozilla's Innovation Studio, which is the origin story of this talk.

I gave a lightning talk similar to this internally around looking at how AI will affect the web and some of, early experimentation with products there.

And then prior to that, Google Creative Lab, I wanted to start today with considering this question of what makes the web special.

I'm gonna make the assumption that it is special, especially because we're at Web Directions, we're not at NET Directions or I don't know what, other sort of like software conference we might be at, but there's something special about the web.

And I don't mean JavaScript, although it is very unique, in its own way, but let's look at something deeper about the web that's, important to us.

And something that I contend is really special about the web, as separate from other forms of software, is that the user has a lot of control.

And you can see that, it's like right in the DNA.

I can go to the Web Directions website and pop open an inspector and see this application running in real time.

And I can even go in and edit.

By the way, very impeccably commented code, I've got to say there.

but yeah, I can go in and edit, I can change styles, I have control over this application running.

That's quite, sets it apart from a lot of different software.

I can't do that with, most things running natively on my Mac.

Many of you probably experienced this, in the early days of MySpace.

This is how a lot of my friends got interested in programming and, had this first taste of control over the web.

I believe it's actually like a recreation of MySpace because, unfortunately, I'd love to show you my profile.

This is probably for the best by the way, I think.

But, it also highlights another risk of the web, which is that there is this separation of the server and the user.

We'll talk about that a bit more in a bit.

But yeah, this, this is actually part of the DNA of the web, and it's been there since the very beginning.

There's this divide between, the creation of information, this is a quote from an early Tim Berners-Lee paper, about the web, and there's this divide between the information sources and the consumption of information and this idea that there should be an interface and neither really need to know that much about the other, but they can communicate through this interface of protocols and, you might have like sources past and present and into the future that are all compatible with consumers of this information that are architected completely differently past, present and future, which is really cool.

And the terminology around that down at the bottom of this, RFC, this is like a Foundational text of the web basically, is user agent.

I think that's a really cool term because it carries this extra metaphorical baggage.

I've been trying to investigate exactly where it came from, but there was this idea of software agents, even as far back as like the seventies and eighties, and it comes from that terminology, but there's this idea of a user agent.

It's this software that represents the user and takes actions on the user's behalf.

So we've got this thing out there.

The World Wide Web, this sort of network of servers and computers all interconnected.

And then we have the human on the other side, and the gateway there is this user agent.

User agent presents a human computer interface to the user, and then interacts with the network to make requests and get information back and present it back to the user.

This is the architecture.

There are two big divides there.

There's this human computer interface, and there's the network interface, and the user agent is like the bridge between these two divides.

Which means that, yes, I can browse a website like this, or I can take control and change the font size, as a really basic example.

I can do that, because I have a web browser that allows me to do that, and will interpret the HTML and CSS that it's getting from the server in the way that I want to interpret it.

I can also do cool things like this.

This is a small clip of the Arc browser.

And they have these things called boosts where you can literally just pick out colors and manipulate the page in real time.

There's like a little graphic design suggestion.

and I can do this too.

This is the Lynx text based web browser.

I can go to the, a website and interpret it with this very different looking browser that it turns the entire page into text, and you know what?

It works.

this might not work for Google Maps, but, it does work for a lot of webpages.

And super importantly, and again, this is another founding principle of the internet, is that this allows any website to be accessible.

Again, as a user, I have the agency on how I want to present it, so I can have it presented as Braille, or, I can have a screen reader read it out to me.

Another interesting property of this user agent, and you can take a look if you read this, is that these are often browsers and editors, but they could also be spiders, web traversing robots, or other end user tools.

we don't have to have the human there all the time.

Because, again, everything on this side of the network interface doesn't really need to know what's going on.

So that allows us to have things like search.

You can have web crawlers that go and interact with websites without a user, without the intent of a user seeing it, but actually like crawling it for information to cache and then put into a search network.

And you can also have things like Google Reader, which is crawling for RSS pages, RIP.

And, this is this is a really cool quote from Richard Stallman, one of the founders of the open source, free software movement, on a public message board where he spoke about, to browse the web, he has this technique where he will, email a program running on a computer, which will then fetch the page automatically, and then, pass it and email it back to him.

And you can just do that, because, no one's gonna stop you.

Again, the user's in control of, like, how they interact with the internet.

So it's this fundamental and really interesting property.

And my question is, do you feel empowered by this?

If you're like me, the answer is probably, not particularly, because yes, I can like, make a little slider and change some colors on a website and that's great, in general, at the end of the day, every browser interface looks like a window.

And, right now that window is overlooking a swamp, which is the current present internet in many ways.

This is the only AI generated image I have in this slide show, and mainly because I actually, when I read Google pictures of swamps, it was like very idyllic.

So that didn't quite support my point.

So I had to generate a very bad one.

This is a picture of the very first web browser, it's called World Wide Web, running on an old Next system and pages here in this early web were like very much simple documents with text and links, very parsable very easy to use.

and of course the modern web is much more complicated.

Actually, modern web looks a bit more like this.

This is me doing some research for this talk.

I've already bought socks.

Like I, I literally just already bought socks.

I don't need socks and yet here we are.

So yeah, the modern web is not the idyllic swamp.

It is turning rapidly into the AI generated scary swamp.

And Maggie Appleton is going to do a great talk later today that we'll dive into this a bit further.

So what happened here?

I'd say that this is doesn't give a lot of agency to the user.

What's happening to make the internet a place of less user agency?

One thing that happened is technology advanced.

So the web started hosting applications.

Things became more complicated, and less inspectable.

If you're running, yeah, Google Maps or Figma in a browser, you can't just read source and change things in the application.

A lot of the source code is hidden.

And as a result of this, browsers also needed to ramp up their efforts in being more consistent and displaying this consistent view of the web, because that was going to be important for compatibility across all these websites doing very, cutting edge stuff.

So deviating became, deviating from this norm of a window to an exact replication of what the author intended became impossible, or unfeasible.

It also led to, in the Web 2.0 era, there was this huge shift towards, from a restful architecture where state wasn't really stored anywhere, or if anywhere was stored on the browser, on the client, to state being stored with the server and these like servers grew and encompassed user information and started collecting data and you get things like Cambridge Analytica, again, this like shift of power and agency away from the user and towards the servers and the creators, and platforms.

The other thing that happened was the web got a business model, and that's a big thing that is both, resulted in a lot of flourishing on the web, and is turning into a lot of negative effects as well, and all of these ads and tracking and people trying to manipulate browser search results, and things like that.

.And also, we started using the web for everything, like we're using it for work, we're completing a lot of tasks on it, we're not just browsing some research documents like it was in the early days, this is like a great chunk of our lives and livelihoods are being put onto the web and interacted with on there.

And yet, we still have this like window into the web, but that window is preserving less and less user agency and acting less and less like a user agent.

What if we were to rethink a little bit how this worked?

I'm going to pause and we're going to start to get closer to this, definition of what I mean when I'm talking about the mediated web.

But first I need to talk about a mediated interface.

I put it up here and made it look all official.

I just want to let you know I made this up.

I found it a really useful term, so use it if you'd like.

And that is, it's a human computer interface that uses an underlying system to operate another human computer interface on the user's behalf.

Sounds a bit complicated, but actually you've all probably used many of these before.

One example, here I am on a normal day driving my car to work and I have this problem where on the highway I have to like really concentrate on manipulating my accelerator pedal in just the right way to stay at, 110, so that I don't get booked or go too slowly.

But we actually have this solution to this problem of manipulating this small human computer interface, or human machine interface rather, and Ford will gladly mansplain this to you with this ad, which is this idea of a cruise control system, and instead of manipulating the accelerator pedal, I manipulate this, dial, or lever, and I dial in the speed that I want to go, and then that system manipulates the accelerator pedal on my behalf to keep with my high level goal.

So this is, I would term, a mediated interface.

and what mediated interfaces allow you to do is to climb the ladder of abstraction.

So instead of operating the accelerator pedal, I'm operating the cruise control lever.

And I'm, instead of my intent being move a little bit faster or move a little bit slower, my intent is just go 110 kilometers an hour.

Of course my actual goal isn't to go 110 kilometers an hour.

At least, not usually.

My goal is to get to my destination at a timely, in a timely way and safely.

And so if we climb even higher, way higher on the ladder of abstraction, I might just want to set this goal of go to my destination.

So I had the chance to ride in a Waymo on my last trip to San Francisco, which was pretty exciting and strange.

Actually, the strangest thing was that it really didn't feel that weird.

Like it just felt normal, which is very bizarre to me.

But something I want to call out with this is that, the interface is actually not even in the car.

So you have this, mediated interface where you set the location you want to go on an app beforehand and the car arrives and you get in.

And then the car interfaces with what I would call the human interface of the car, which is steering, accelerator, brake.

Also, signals from the outside world, signs, people, this is all part of the human computer, human machine interface of a car, and it's all taken care of, based on my intent.

I have to mention that I think Cruise was just banned from San Francisco, so how well this is going, remains to be seen.

But as you climb this ladder of abstraction, the computing necessary to do it gets more and more complex as you might imagine, so you know, accelerator pedal is like a piece of metal connected to the oxygen hole, like that's how it works.

And as we get higher up we get the cruise control lever, which might just be some simple bit of code, like almost you could imagine an Arduino doing this with a sensor input.

And higher still, we have to compute all of this information about the human world that is like really vague and sometimes a stop sign looks red, sometimes under a light it might look different.

People are hard to tell from backgrounds, a lot of complicated stuff to solve and that means you need to go higher and higher in the technology level to these large neural networks.

I'm assuming there are multiple of them and other systems as well.

So that is, a mediated interface.

And I should also mention the ideal world, we just like change roads.

So it's almost like a slot car thing and they have magnetic strips and all traffic lights have APIs, but no, that's never going to happen.

So we have to build these complicated systems to navigate the human world that we've built, in order for this to be possible.

So now we come to this idea of the mediated web, and again, made it up, but if it's useful, take it and run with it.

Which is this potential future of the web, whereby it's accessed not directly by users, but rather via some sort of mediation with software.

And you could argue that a web browser is that, to an extent, but I'm talking about that mediation being more and more, in the middle, because while the browser will negotiate protocols, what happens if we give more agency, more of a role in mediation to the browser?

And this ladder of abstraction analogy works here too.

It's right now, interacting with things on the web, trying to get stuff done, and then, I'm doing simple things like imperative workflows, like clicking the accept or reject cookies banner to start to do stuff on the web.

What's what comes at the top of this ladder of abstraction?

What's closer to my actual goal of like why I'm interacting with the web in the first place?

So this might be the way we interact with the web now, again, like very imperative steps.

And this might be a mediated web where you get closer to this, intent.

The outcome is close to the intent, if I want to do something, there are less small steps for me to actually make that happen.

And likewise, the output can be represented more in what I intend to see.

So like the, we can both have actions being undertaken on the web and also the web being interpreted in a way that closely matches what I'm trying to get out of it.

This is a very like basic sketch for what that might look like.

You have a user interface, which is interacting with some sort of system, and that system is doing the talking to the servers and then interpreting the web content it gets back from various sources.

Like mediated interface, I think we already have a form of this, existing right now, which is, one level that might be the bit at the top of a search engine that, tries to give you an answer.

That is the search engine browsing the web, basically, on your behalf, and extracting an answer from a page for you.

It's getting close to the intent.

I can ask a question and get the answer directly.

The more, cool hip way to do this is to have some sort of AI search engine thing.

Perplexity is an example of that.

It's a great product, and that will allow me to, again, ask a question, it will go and browse the web on my behalf.

I don't think that's where, this stops.

I don't think we just go, cool, we've made search engines better and they tell us the answer.

I think there's a lot more that can be done here and a lot of exciting stuff as well.

So I'm gonna just, I'm gonna just wave some hands and give some hints of that based on some stuff that I've worked at, worked on, and things that I'm seeing emerge.

This is a a small video of a project I'm working on called Familiar, and I've been using it myself.

This is at first, a very basic text interface, but let me just play a little bit and you can take a look.

So we've got this metaphor of a magic piece of paper.

This is a very janky demo.

But, one of the cool things about this is that, it knows who Isabella is, Isabella's my wife, knows she's Italian, and it makes like a custom suggestion for me based on some stuff that it's reading on the web.

So as this conversation plays out, the system is both reading up on research, interpreting webpages for me, doing particular searches and asking particular questions of the web.

And then it's also integrating its personal knowledge of me, which is much more than any like search engine would do.

Suggests me a restaurant, which is actually just like a very short walk from me, aligns with some of my preferences, I've actually been there before.

And it tries to answer questions by reading the restaurant's website.

So this is an example of a next stage of a mediated web experience where, yeah, as I mentioned, what's going on behind the scenes here is, this interface is collecting information about me locally, on the client and starting to understand and put these sort of cards together based on what it thinks my preferences are and my relationships and things I like, upcoming events, stuff like that.

It will then combine that with an API I'm working on which basically destructures the web and tries to pull information out of the web.

and that combination of a destructured web plus what the client knows about me is then put into an answer, with a language model.

And it's really handy, like I've been using it, I've been using it when I travel, and it's been recommending me places.

I had a great dinner once because of it.

Obviously there are different use cases, but it's very much a demo at this stage.

I want to put this here because it reminds me a little bit of this vision of the semantic web.

This idea of, this was a idea proposed by Tim Berners-Lee for, authors of websites putting in this structural information.

And I think that's a reason that we don't have it yet today is that it, relies on people actually putting this in manually on their sites and, making relationships to make the web more computer understandable.

But with some of these models, and I'll talk a bit more about some of the models, like it's possible for the web to be understandable by machines because they can extract these insights and destructure and structure information from, the unstructured web.

This is an example, released by Adept recently, of a model that can do this sort of stuff.

So it's a multi modal architecture for AI agents.

And what this can do, basically it can input both, there's a, you can read the website, text and images.

And those images are particularly trained on screenshots around the web, and it will understand, if I'm looking to do something on a particular website, where I might want to click and draw bounding boxes around that, it will also, be able to answer questions based on spatial reasoning and this, what we actually see on the web when we're browsing it.

So in this case, we've got a question about relative location of two objects and it can understand what's north and what's south.

So we're building the technology to make the web more legible to computers and building the ability for computers to interact more fluidly with the web with more reliability.

One area that this can really help in is when we're trying to do tasks on the web as I mentioned like more and more of our work is happening on the web and more and more tasks being undertaken on the web and even a simple task like finding accommodation somewhere for a conference can turn into a lot of imperative tiny steps that we have to do.

At work I often use the example of finding a daycare as well, because I've been looking for a daycare for my daughter.

It's there are so many factors and so many different websites that you have to find information on that similar to accommodation, it can be really tricky task.

So in this example of this is a sketch, some stuff that we've been working on to guide our thinking internally.

this isn't product per se, but it's just like some, thinking we've been doing, which I'm allowed to share with you.

If we take a website like Airbnb, I can look at it on the Airbnb website.

But maybe I want to have a custom view into, this that is tailored specifically for what I want to get out of it.

So I, you could imagine that I might be able to set some of these columns up and, say yeah, I actually want to know how long it will take to walk to this place that I want to visit.

And then for every place that I'm looking at, it can extract that information, make some calculations, even pull in information from third parties, and present an interface that I want to view it in.

Not that it's been created for.

And I could also have a few actions here that I could, automate very quick, sequences of actions like, I just want to reserve this, or I want to send a particular message that I always send to places to ask about it.

I can also search in the reviews for a particular phrase I'm interested in, like what the Wi Fi is.

And that's great and maybe I can browse through a few things like this, but maybe on a higher level I want to zoom out a little bit more and compare a bunch of places.

On the web now that involves having lots of tabs and like doing a lot of manual comparison or if you're really, into the task, copying and pasting things into a spreadsheet.

I know there are some of you out there.

So what if we could just do that for you?

And have an interface that pulls out the information I want from a list of items.

I could get some search results, then pipe that directly into this sort of like living table where I can basically type in the top like, oh, this is the information I want to get and have that slowly fill in over time.

And then just create actions in bulk right there on the page.

So this again is like a sketch, some early thinking, but an example of how, like day to day tasks could be much easier.

And you can get way quicker from intent to the outcome you want.

And I think this points towards a future of malleable computing on the web.

this is a term that's been around for a while, but floating around especially more and more recently.

and I think this is an exciting future for users because, it points to like a browser that isn't necessarily a window into the web any more, but rather, many different windows into the web, or like a new way of interpreting the web, based on what you're trying to get out of it, way more user agency, user control, and there are Already some projects like looking in this direction of a canvas based UI for exploring the web.

And, pulling in different insights like that.

And of course, that then allows humans to have more agency in their computing.

So instead of relying on, a company somewhere across the world that, knows nothing about you and your use case, you can take control over the computing that you want.

I think this is why we all still, many people still use spreadsheets, is because we can actually have control of like how the computing works and how we lay things out, and wouldn't be nice if more computing was like that and more interactions with the web.

So that's great, but like you might be wondering, okay, there's some issues here because, like we, people in the tech industry don't like to admit this much, but things do have consequences and especially in an ecosystem like the web is.

there will be consequences for this sort of approach, even if it's well intentioned and can have many good, effects.

Smash cut back to my favorite website here.

Now I'm not commenting on the merit of ads.

It is a fact that, however, the ad driven economy on the web is also a huge factor in why the web is still a very open place.

On the one hand, there are many creators that, use ads in a very respectful way and as a means to support their work in a very not user hostile way.

And we do want people to be able to, be able to do this and continue to do this in the web.

And there are also many platforms that exist and can only exist because they depend on advertising for revenue.

Again this will depend on what you think about platforms, one might argue that ideally the entire web is the platform, but right now this is how a lot of the web works and a lot of communities form around these platforms that depend on advertising revenue and, what they get from their intention, attention to remain open.

So if instead of looking at the web in the way that the creators of the websites intended, we can just take the bits of the web that we care about and leave the rest.

What does that mean for the ad supported web?

There are two scenarios here.

One is that this doesn't really become a primary way of interfacing with the web.

I think it probably will, but I might be wrong.

And in this scenario, maybe you've got like more of these sort of search engines that are using web content and giving you an answer.

And maybe that affects things a little bit.

But on balance, the web can keep ticking over as it has and it's not a huge deal.

The other scenario is that it does take off and that, as my hunch is suggesting, we actually start to interact with the web very commonly through different interfaces.

Not that we won't look at websites at all.

I think we always will because it'll be important to see how people present themselves and see a full picture of a business or of someone's site.

But a lot of the web, use of the web might happen through other interfaces and things that start to pull out stuff from existing sites.

So we can look at the history of, recent history of the web to have some clues as to what might happen.

This is entering conjecture, but this is one picture of what might happen in the future as far as I see it.

And, if we look at, the business model of sites in the past that depend on advertising at a certain level a lot of them have started to close down and you're seeing sites like, I mean I'm picking on the New York Times here because the irony of this picture, but many sites do this and they start to lock down and depend on subscriptions for the real source of their revenue.

It's understandable but it is a knock against the open web that we love because ideally we want this stuff to be information to be navigated freely.

Importantly most websites that are doing this still allow search engines to come and index the pages because they want to drive traffic.

But that might start to change too.

This is already happening.

This is relatively recent, from a week ago.

Reddit is looking into blocking crawling altogether because a lot of language models are using the very nicely formatted question answer data that is Reddit to form datasets.

And again, understandably, they think they should get a cut of that.

But again, this could have huge consequences for the web, and especially if any sort of agent that is trying to engage with web content is getting blocked or needs to sign in.

So how might that play out?

We might see a bifurcation, or I guess a further bifurcation of the web into a free web and a paid web, an open web and a closed web.

Again, this is already in effect, but it could get much more pronounced.

And the paid web would block user agent access, unless some conditions were met.

Here's a recipe for a more locked down web, so one would be restricting open web access more, that is, having more things protected by, by, paywalls and, more dedicated apps where, large publishers can fully control experience and advertise and control access.

Another piece of this puzzle would be consolidating content into platforms.

This would be especially, the case for smaller content creators that wouldn't have the market power to basically convince people to get an account for them or, buy their app.

And this is a picture of Apple News.

It could be like any app that becomes a platform for content.

I think this is already turning into a bit of a mini web, but it's not the web.

And the way a lot of this content comes on to a platform like Apple News is through deals with privileged clients for access.

Apple News would be a privileged client that can get access to various publishers based on deals.

And I think this is, the worst case scenario, where we get to a future of the web where, net neutrality is out the window, and, we see something that mirrors, it's like streaming right now.

But it's the web.

This is terrifying.

This is not a good thing.

Were basically, yeah, large market players who are creating browsers or ways to interact with the web, creating deals that publish the content, and that's how the web works.

Also, good to notice, note, what happens to search in this world?

Because we wouldn't be navigating to a search engine to perform a search, rather, one thing that might happen is that, we're not using search engine GUIs anymore, instead our clients are basically engaging with search as a service, like an API.

Who knows what happens to Google in part of this world, but search becomes part of the internet's infrastructure instead of the front page.

Another world is where, folks like Google start making their own clients that incorporate more of this agency.

And they start to make, deals and pay companies to be able to crawl their sites, like paying Reddit to be able to crawl, et cetera.

Again, this would really cement a lot of market inequalities in the search world.

It's already exists because smaller startup search engines wouldn't be able to pay as much or have as much coverage.

And that would, hugely impact the inequalities in the world of search that are already very pronounced So I'm going to highlight this quote from Robin Berjon in a post called building the next web.

We're not far from having from not having a web left, it's been so captured by the pathologies of command and control.

I think the reason I put this up is really to just say like we mustn't presume that the web will be here no matter what we do and we need to protect the open web.

It's not something we should take for granted, especially in huge shifts like the one that I'm talking about might happen.

We really don't want to have a world where we look back on the open web as this sort of beautiful blip on the radar of a more techno optimistic vision of the future that was never destined to pass.

I think it's something that we really need to fight fiercely to protect.

Doom and gloom session over, it does seem perilous indeed, but I think it's, to recap, this is actually something that makes the web special in the first place and is part of its DNA.

This idea that, we have a user agent that ought to be able to interpret the web however it wants, this is fundamental.

But we've started to stray far on this and, the folks creating websites have come to expect that their website will be taken exactly as it's intended and at the same time, we have this business model that now depends on that and depends on being able to serve ads, and has this like lack of agency as a bit of a given.

So how do we make this all work?

Can we like have our cake and eat it too here?

I don't have the answers to this, but I think it's something we've got to start thinking about.

And I have one small sketch at a potential way this could be solved.

I think there are many that need to come to the table.

That's small sketch is like this idea of a web payments layer.

It's actually been kicking around for a while, but you could see this future where, the user agent can basically pay to access web content on the user's behalf, no matter where that web content comes from.

So again, there's this payment interface that, the user and the server don't actually have to know all the details, and you could easily, produce a micropayment to access some content.

That micropayment might also come from, still from advertising on the client side, so that the web can stay free.

That's a potential, area that this could go.

And it could even result in a much better web than the one we have now, which is full of paywalls and subscriptions and content that you just can't access because you'd have to subscribe.

And you can't, you don't want to pay 40 a month for one article.

This is already, this has been thought about for a long time.

This is a document from 1995 talking about this idea of a web micropayments protocol.

I think this could do with some renewed attention and focus.

We also have this, HTTP code 402, which has existed for a long time, which is payment required, which basically is a response from a server.

It's experimental, it's not really used in practice, but it's out there, and I think the intention is that a server could respond and be like, "Oh, I need five cents for access to this article".

And then a payment could be negotiated, and then the website is displayed.

And that could be a great way for these agents to actually fairly compensate the data holders for, and the writers and creators for their content.

This might also apply for machine learning scraping.

So I guess my point is that the web has always been changing, but it's based on this solid set of principles that we want to remain the same of openness and user agency.

That's like one way we could retain that user agency.

I think there are many other areas to explore.

But I'm starting to wrap up now, so I want to say I hope I've made the case that the web at least a reasonable case that the web might be changing, drastically in the upcoming 5 to 10 years.

And that this is both a huge benefit to users and a new way of interacting with the web, and also something that carries risks because of the ecosystems that currently exist on the web and how they're structured and so ingrained.

And it's either going to turn into great story, or it's going to turn into a big battle between the web and who owns it.

A big part of this will, like, how this plays out in the next 5 to 10 years will be anticipating it, and that's where I think everyone in this room comes in.

There are three things that we need to start doing and talking about.

One, I think, is anticipating this possibility, fleshing out what it might look like, it might look different to what I've sketched out here, and start discussing it having conversations around it, and thinking about what we want it to be and start dreaming about it.

The second thing I think is like starting to do explorations and looking at like the best version of this for users and for the web.

And then finally is working towards some foundations, whether it's protocols or other underlying technologies or regulations that can keep the open web ecosystem alive throughout this transition.

And that's all I have for you for today, thank you and if you want to, stay in touch or also like I'm gonna put some references for this talk up on my website, URL is down the bottom, you can sign up to the newsletter or just check in a few days and I'll put that stuff up there, and enjoy the rest of Web Directions, thanks.

Hello, I'm Rupert.

Designer & programmer.

NOW: Building interfaces at Adept .

PREV: Mozilla Innovation Studio, Google Creative Lab.

MISSION: Make computers more humane & accessible.

A pixelated image of a person's face, resembling an 8-bit video game character.

What makes the web so special?

Obscure but valid JavaScript. A black dog with a reflective collar, looking to the side, with a confused or puzzled expression overlaid with the text "WAT".
Cursor icon in a pixelated hand pointer style, typically used to represent a mouse cursor in computing but as a clenched fist.

Screenshot of a Web Directions website with a navigation bar, including links and categories such as Blog, About, AI, Code, and Summit. A browser developer tool window is open at the bottom showing HTML and CSS code.

Myspace loses all content uploaded before 2016

Screenshot of headline

Information Management: A Proposal

Tim Berners-Lee, CERN

March 1989, May 1990

Therefore, an important phase in the design of the system is to define this interface. After that, the development of various forms of display program and of database server can proceed in parallel. This will have been done well if many different information sources, past, present and future, can be mapped onto the definition, and if many different human interface programs can be written over the years to take advantage of new technology and standards.

Screenshot of an excerpt about information management, with additional focus on the importance of interface design and technological adaptation over time.
Screenshot of RFC 2616, HTTP/1.1.
WORLD WIDE WEB
A stylized line drawing of a globe with a grid overlay, representing the concept of the World Wide Web.
A schematic diagram showing the relationship between the World Wide Web, depicted as a stylized globe with grid lines, connected to a box labeled 'USER AGENT', which in turn is connected by a dotted line to a symbol representing a 'HUMAN'.
Diagram illustrating the relationship between the World Wide Web, network interface, user agent, human-computer interface, and human. On the left, a stylized globe represents the World Wide Web, linked by a straight line to a central box labeled 'User Agent'. Further to the right, a dashed line connects to an abstract human figure, indicating interaction.
Screenshot of a blog page
Same page shown as plain text in reader mode in Firefox
Short video showing the Arc browser feature Rupert describes.
The same web page shown in an entirely text-based browser.
A black laptop computer with a braille display attachment sits on a white surface. A pair of over-ear headphones rests to the left of the laptop.
Screenshot of RFC 2616, HTTP/1.1.
Diagram showing the World Wide Web represented as a sphere connected to a box labeled 'USER AGENT', which is in turn connected to an icon representing a 'HUMAN'. Two vertical lines labeled 'NETWORK INTERFACE' and 'HUMAN-COMPUTER INTERFACE' separate the sphere and the user agent icon from the human icon respectively.
Screenshot of search results
Screenshot of Google Reader interface

From:

Richard Stallman rms-AT-gnu.org

To:

"Edd Barrett" exvot01-AT-gmail.com

Subject:

Re: Real men don't attack straw men

Date:

Sat, 15 Dec 2007 16:37:06 -0500

Message-ID:

<E1J3eh0-00057I-LM@fencepost.gnu.org>

Cc:

misc-AT-openbsd.org

For personal reasons, I do not browse the web from my computer. (I also have not net connection much of the time.) To look at page I send mail to a demon which runs wget and mails the page back to me. It is very efficient use of my time, but it is slow in real time.
Screenshot of email

Do you feel empowered?

WEB BROWSER
A line drawing of a web browser interface with tabs and an address bar at the top.
WEB BROWSER
A graphical representation of a web browser with minimalistic design, displaying an image of a textured surface resembling algae-covered water
Illustration of a star-filled space background centred around a colorful 'Space Jam' logo, surrounded by themed icons representing various webpage sections such as 'Planet B-Ball', 'Lunar Tunes', and 'Stellar Souvenirs'.
A screenshot of a web browser displaying an article titled "Who is Ben Horowitz, Silicon Valley's Billionaire Rainmaker?" In the foreground, a video playback window is partially visible with the play button icon centered. Ads for socks are displayed.

#1 Technology advanced

#2 The web got a business model

#3 We started using the web for everything

WEB BROWSER
Illustration of a generic web browser interface with tabs and an address bar at the top.

Mediated interface (n.)

A human-computer interface that uses an underlying system to operate another human-computer interface on the user's behalf.

An newspaper ad for Ford cruise control.

LADDER OF ABSTRACTION

Diagram of a ladder of abstraction showing a continuum between accelerator pedal and cruise control lever, with an arrow indicating direction towards the cruise control lever.
Diagram depicting 'Ladder of Abstraction' with three levels labeled 'Accelerator Pedal,' 'Cruise Control Lever,' and 'Destination Entry' from bottom to top, with an arrow pointing upwards alongside the ladder.
A nighttime scene of a parked white self-driving car with the logo "WAYMO" on the side, in an urban parking area with other parked cars and a lit multi-story building in the background.
A diagram illustrating the 'LADDER OF ABSTRACTION' with levels of increasing abstraction from bottom to top. The bottom level is labeled "ACCELERATOR PEDAL" with a description "(PIECE OF METAL CONNECTED TO OXYGEN HOLE)," the middle level is labeled "CRUISE CONTROL LEVER" with description "(CODE & SENSOR INPUTS)," and the top level is labeled "DESTINATION ENTRY" with description "(LARGE NEURAL NETWORKS)." An arrow points upwards along the ladder indicating a progression of abstraction from more concrete to more abstract concepts.

Mediated web (n.)

A potential future for the web, whereby it is accessed not directly by users, but rather via mediated interfaces.

Hand-drawn image of a person with a speech bubble that reads "I WANT A THING..." followed by a written list of steps including using a browser, searching, clicking links, spamming sites, and cross-referencing tabs.

I WANT A THING...

Hand-drawn illustration of a person looking at a browser window with "THING" written on it, accompanied by a thought bubble saying "I want a thing..."
A simple horizontal dotted line with the word 'INTENT' at the starting point and the word 'OUTCOME' at the end point, representing a continuum or process flow from intent to outcome.
A hand-drawn diagram showing a flow from servers to web content, including elements labeled 'Catcha' and 'AI SYSTEM', leading to a user interface denoted by 'Intent' and ending with 'User'.
Screenshot of page of search results for "why do people drive tiny cars?"
Screenshot of page of Perpleity answer to the question "What's the biggest car that exists?"

Some hints at the future...

A video plays as Rupert narrates. Text appears on the screen as the agent suggests results and a human interacts
Diagram featuring interconnected elements including a stylized webpage layout on the left, central boxes labeled "Values & Taste", "Relationships", and "Upcoming Events", and a JSON code snippet on the right. A globe icon is shown at the bottom right corner.
Screenshot of article by Tim Berners-Lee titled "Semantic Web Road map".

Fuyu-8B: A Multimodal Architecture for AI Agents

October 17, 2023 — Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, Sağnak Taşrılar

We're open-sourcing Fuyu-8B - a small version of the multimodal model that powers our product.

Question: "Is La Taqueria north of the 24th St Mission Bart station?"
Fuyu's answer: "no"
A screenshot of a map from an online mapping service with marked locations and navigation routes. On the left an excerpt from an announcement that Adept is "open-sourcing Fuyu-8B - a small version of the multimodal model that powers our product".

Finding accommodation

  • Querying search engine
  • Checking each result is within walking distance
  • Look at reviews to see if the wifi is good
  • Send a message to the owner asking a question
  • Reviewing images
Screenshot of an Airbnb listing for a large, beautiful flat in Cow Hollow, displayed on the Airbnb website within a presentation slide. This listing includes various images of the flat's interior, such as the kitchen and dining area, and displays pricing and rating information.

Collage of photos depicting the exterior and interior of a flat, including the facade of the building, a kitchen with modern appliances, a dining room with a bay window, and a bedroom with a large bed and window.

Towards a future of malleable computing on the web.

A simplified user interface wireframe showing a webpage layout with a title bar and varying content blocks including text lines and placeholders for images or graphics.

Things have consequences.

Repeat of earlier slide of website with Socks ads.

Screenshot of a Daring Fireball web page showing a single tasteful ad

Screenshot of a Reddit page

What happens to the ad-supported web if this takes off?

SCENARIO #1:

This doesn't really take off.

SCENARIO #2: This does take off.

Opinion

OP-ED CONTRIBUTOR

Keep the Internet Open

By Vinton G. Cerf

The New York Times
Screenshot of a New York Times article on a browser with subscription offer banner at the bottom.
A screenshot of a news article headline discussing Reddit's potential response to search engine indexing titled "Reddit can survive without search"".
PAID WEB
A diagram showing a stylized globe with a padlock symbol labeled 'PAID WEB' connected by an 'X' to a rectangular box labeled 'USER AGENT,' which is connected by a dotted line to a right-facing arrow symbol representing 'HUMAN.'

RECIPE FOR A LOCKED-DOWN WEB:

Restrict open-web access (more logins & dedicated apps)

A screenshot of a Medium article on a desktop browser and the Medium mobile app interface showing an article titled "How to Discover a Planet".

Consolidate content into platforms

A smartphone displaying a news app interface with various news categories and an article about Coronavirus cases rising in Florida, Mississippi, and other states, viewed on a July 15 date.

Wait, what happens to search?

The image shows a screenshot of the Google search homepage with a search box and two buttons labeled "Google Search" and "I'm Feeling Lucky".

We're not far from not having a Web left. It's been so captured by pathologies of command and control that it's falling apart at the seams and life is draining from it. Some of our load-bearing planks are quite distinctly stinking of rot and our ship is taking on water. It's time to scrape off some barnacles and get to work.

Robin Berjon, "Building the Next Web"

What makes the web so special?

A diagram illustrating the concept of the World Wide Web, User Agent, and Human interaction. On the left, a sphere representing the World Wide Web; in the center, a rectangle labeled 'USER AGENT' connects the World Wide Web to a human icon on the right through lines labeled as 'NETWORK INTERFACE' and 'HUMAN-COMPUTER INTERFACE' respectively.

Q. How do we make this all work?

A. I don't know, but we better start thinking about it.

A diagram showing two labeled boxes, "WEB SERVER" and "USER AGENT," connected by a line with the text "PAYMENT INTERFACE" positioned above the line.

Micro Payment Transfer Protocol (MPTP) Version 0.1

W3C Working Draft 22-Nov-95
Status of this document

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than "work in progress". A list of current W3C Working Drafts can be found at: https://www.w3.org/pub/WWW/TR

Note: since working drafts are subject to frequent change, you are advised to refer to the URLs for working drafts themselves.

Abstract

A protocol for transfer of payments through the services of a common broker is described. This protocol makes it practical for small payment amounts. The latency makes it unsuitable for some applications. The scheme thus satisfies the two key criteria for a micropayments scheme as laid out in the variation of the Pay-Word proposal of Rivest and Shamir [RivestSh95]. It is also compatible with the iKP proposal by Bellare et. al. [BellareE

Screenshots for "Micro Payment Transfer Protocol (MPTP) Version 0.1" specification and "402 Payment Required" MDN pages.
A diagram of a slide with a generic layout, including a title bar, two content boxes side by side on the left, and a text content area on the right with bullet points, as well as a smaller content box beneath.

THANK YOU.