Agents and Auto-GPT
Introduction and Acknowledgements
Mark opens his talk by acknowledging the Gadigal land and thanking previous speakers, Kaz and Paul, for setting the context of his presentation on autonomous agents and AI.
Concept of Autonomous Agents
He introduces the concept of autonomous agents, tracing its origins back to Apple's 1987 Knowledge Navigator. Mark draws parallels between past visions of AI and current developments.
Generative Agents and Interactive AI
Mark discusses the evolution of generative agents, highlighting recent experiments with 25 ChatGPT-based agents in a virtual environment, showcasing organic interactions and decision-making.
Creating Synthetic Personalities
He shares his experience of creating synthetic personalities using generative agents, emphasizing the importance and complexity of prompt engineering in AI.
Classical Autonomous Agents: AutoGPT
Mark introduces AutoGPT, an application that allows users to input commands and observe AI-driven actions. He explains the goal-oriented nature of AutoGPT and its iterative process.
Potential and Challenges of AutoGPT
He discusses the capabilities and limitations of AutoGPT, noting its potential for both positive applications and misuse, like generating disinformation campaigns.
Generative Worlds and Syntheverse
Mark talks about the concept of 'Generative Worlds' and 'Syntheverse', envisioning a future where autonomous agents create dynamic and interactive virtual environments.
LangChain and Coding for Autonomous Agents
He mentions LangChain, a tool for coding autonomous agents, and provides resources for further exploration in this field.
Future of Autonomous Agents: Sensors and Effectors
Mark concludes his talk by highlighting the need for sensors and effectors to enhance the capabilities of autonomous agents, linking this need to the field of robotics.
Thank you, John.
Good afternoon, everyone.
And I would like to acknowledge that I, too, am talking to you on Gadigal land.
Sovereignty never ceded.
I want to thank Kaz and I want to thank Paul because a lot of the hand waving I was going to have to do, you'll see on the slides, I no longer have to hand wave because you've gotten explanations for what's going on.
The first thing I need to do, I am required by court order to tell you that I may produce inaccurate information about people, places, or facts.
Oh no, wait, that's not me.
That's what ChatGPT says at the bottom of its screen.
This field is moving very fast.
I am very new to this field.
I am probably in exactly the wrong place on that Dunning Kruger curve.
So forgive me if I get some of this wrong.
But the first thing I want to tell you is autonomous agents may not be something that you've heard of before.
But let me tell you, everything old is new again, because in 1987, Apple created a demo.
And demo in the same way that Elon created a demo of full self driving back in 2016, created a demo of something they called the Knowledge Navigator.
Have a listen.
[Knowledge Navigator] Today, you have a faculty lunch at 12 o'clock.
You need to take Kathy to the airport by two.
You have a lecture at 4.15 on deforestation in the Amazon rainforest.
[Human] Let me see the lecture notes from last semester.
No, that's not enough.
I need to review more recent literature.
Pull up all the new articles I haven't read yet.
[KN] Journal articles only?
[Human] Fine.
[KN] Your friend Jill Gilbert has published an article about deforestation in the Amazon and its effects on rainfall in the Sub Sahara.
It also covers drought's effect on food production in Africa and increasing imports of food.
[Human] Contact Jill.
I'm [KN] sorry, she's not available right now.
I left a message that you had called.
[Mark Pesce] All right, so that's 1987.
And it inspired an entire generation, because Siri, and Google Assistant, and Alexa are all trying to be that.
And none of them work, because as Kaz pointed out, none of it worked.
The difference is that it works now, and the difference I want to break down for you.
Alright, there are already in this field two major different areas, and I want to take each of them separately for where autonomous agents are going.
The first one that I want to talk about is what we call generative agents.
These are basically people being created in the computer.
Alright what do when I say that?
This work got its modern rebirth in a paper that was published in early April this year.
So there were researchers working at both Google and Stanford who created 25, what they called, generative agents.
What they basically did is they described, using text prompts, the characteristics and behaviors of these agents, and ran 25 separate Chat GPT's, and allowed them all to interact organically with one another in the software environment that they called Smallville, alright?
And so they inhabited this town, they described the town, they described the people's behavior.
What they did was Valentine's Day was coming up, and they said, Okay, we're gonna ask one of the agents to throw a Valentine's Day party.
So one of the agents gets the idea to throw the Valentine's Day party, starts to talk about it.
Another agent spontaneously says, I will help you decorate the cafe for that party, because I'm chasing this guy and I want to invite him.
All right?
25 people live in Smallville.
Eight invitations go out to the party.
Three invitations are rejected because people have other things to do.
All of this happened organically.
It emerged out of the interaction of the behaviors of the agents in that system.
Alright?
So this is generative agents.
This is one area that is now happening.
If you think of every stupid NPC you've ever had in a video game, and now every one of those are going to be intelligent.
If you think of being able to test something around product or human interaction, you're now going to have synthetic versions of these, it probably won't be as good as real people.
There is already an open source program you can download called GP Team that allows you to start to work with this.
Over the weekend.
With John's permission, I created a synthetic version of this day, so I created personas, in a JSON file for John, for myself, and a person that I'm creating called pat.
Pat is here because they don't know enough about AI, and they're getting worried about their job as an engineer, and so we created a situation where we put ourselves out in the networking area, and we're having a conversation where we're trying to teach Pat about AI, and you spark it up, and you run it, And it works okay.
I clearly need to tune these stories more.
There's a lot to learn about prompting these characters, right?
As there is with prompting about LLMs.
But what's happening behind the scenes here is that each of these characters are being fed into ChatGPT, creating realistic responses, which are then being shared with one another.
So the characters are observing the other characters, they're thinking about how they respond, and then they're acting.
And they're in that loop of: observing, thinking and acting.
Okay, now we get to what we think of as the classical autonomous agents.
How many of you have used AutoGPT?
Okay, so some of you.
Rest of you, fasten your seatbelts.
AutoGPT was released, I think, around sort of March of this year, basically as soon as GPT 4 was available for programmers to be able to use.
You install it on your PC, or your Macintosh, or wherever you like, and you fire it up.
And it runs through its normal setup, and then it asks you, it says, I want AutoGP to…, and it just asks you what you want it to do, and of course I, as I always do with any system that's asking me what I want it to do, I say, okay, I want you to open the pod bay doors.
Now what I'm going to do is I'm going to break down for you what happens when I hit return.
All right, so yes, my prompt is open the pod bay doors, that's an English prompt.
All of a sudden this has to be translated into a set of things that will happen.
The first thing that has to happen is it has to become a goal.
So it gets translated into goal.
How does that happen?
And this is where I'm so happy that Kaz and Paul did the heavy lifting for me.
All right, because essentially what happens is that this translation process happens through some prompts, plus some memory, plus a whole system called React, which is not the JavaScript system, if you're familiar with that, but is reflection and acting.
Taking a look at what you're doing, figuring out whether what you're doing is working, and iterating on that until you've locked into what is.
Is doing.
So the, you're asking the LLM to create a possible action.
You are then doing that possible action.
You are then testing the result of that possible action.
When that possible action looks like it's succeeding, you can go on forward to the next action.
So you have to think of it as a loop and it's a feedback loop because you're feeding in the results of your previous activities.
Alright, once you have a goal, you can break a goal down to a series of steps.
And actually, it turns out you don't even need an LLM for this.
We've had 70 years of operations research which shows us how to break things down.
And so there's a whole bunch of different strategies here for doing this well.
But in this case, we're going to go off and we're going to ask ChatGPT, how do we do this?
So look, why don't you start with trying to have a conversation with a pod bay door opener.
HAL's not letting you in.
Let's see if we can talk to it ourselves.
Alright, so we're going to break that then down into a set of actions.
Again, by thinking about it.
Okay, if I have to communicate with a pod bay door, how do I do that?
What are the set of actions that will allow me to do that?
The first thing I need to do is go find the docs.
And so my agent is going to go do a Google search.
To find the docs, and it's going to come up with a likely list of docs.
When it finds that likely list of docs, it's going to feed them in to ChatGP to go, does this actually have a communication protocol in it, or is this a sales manual for pod bay door openers?
Eventually, we're going to find something that looks like it has the technical docs that will have the API in it.
When we think we've hit something likely, we can then hand that off to another process which is going to write a Python module and unit tests so that I might be able to talk to the pod bay door openers.
And that would be the first step in a longer process to be able to get the pod bay door open.
And it will go through the rest of the steps this way until it completes.
That's how AutoGPT works in theory.
Doesn't always work perfectly.
Can get trapped in weird loops.
Can misunderstand things.
Can end up doing the wrong thing.
These systems seem to work, but often do require a fair degree of monitoring.
Sometimes they work perfectly.
We're only at the beginning of this.
These systems will continually improve.
Now, what can you do with this sort of thing?
This is only two months ago, but it feels two years ago, someone went and did the bad thing.
And they said, I want to create a disinformation campaign for the 2024 US presidential election.
Fires it in.
All of a sudden auto GPT is creating face.
Fake news sites and fake Facebook accounts and interlinking them so it's creating an entire ecosystem of fake news that's going to be fed into the campaign.
He indicates that he pulled the plug on it after a couple of hours.
And we have to believe that he did that because he says he did, so obviously he wouldn't lie to us, and there wouldn't be any other state actor anywhere online who wouldn't be using AutoGPT for anything like this.
This is a very powerful tool, and it's now a powerful tool that's pretty much available to almost anyone who can download it and get it up and running.
My goals for Autonomous Agents, and the reason I'm so fascinated by them, is a bit different.
Because I see Autonomous Agents as being a fundamental building block to a new thing.
So we've talked about Generative Agents.
We haven't really talked about Generative Worlds.
But Generative Worlds takes the work that's been done with diffusers, and brings it from 2D into 3D.
So you have Google Dream Fusion, which allows you to create 3D objects.
Based on text prompts, and then you have another project called Text2Room, which turns text prompts into spaces.
So you can go from text prompts to little mini metaverses.
So if you have worlds with people in them, being created and orchestrated by an autonomous agent, you have something that I'm calling the syntheverse.
Let me give you three examples.
Create a moot court.
If you've been to law school, you know what a moot court is.
It's where you get to practice.
Create a moot court.
Generate a tort.
Argue for the plaintiff.
I will argue for the defendant.
Client brief for a new apartment block with blueprints.
Build a walkthrough model and create a site plan.
Create a simulated hospital.
Emergency ward.
Staff with nurses in a queue of patients to be treated.
I'm the doctor.
Queensland Health just did something like this to help doctors and nurses understand how to deal with cranky people in an emergency room, which is a situation they get to.
They spent hundreds of thousands of dollars and they have one of them.
What we would like to see is something that would allow people to be able to generate as many as needed on demand, inexpensively, to help people train for all of the situations they would encounter.
So that's why I'm fascinated by all of this.
Now, most of this work, if you are a coder, is being done in something called LangChain.
And, you may hear more about that today, if not, there you go.
The QR code that's up there is an amazing page that John flipped to me earlier this week.
All about LLM powered autonomous agents.
It takes you through in detail all of the stuff that I've glossed over because I only had 15 minutes with you.
But it's really helpful.
So that'll get you started with the coding.
That'll get you started with the thinking.
Alright, you got it now?
What do we need?
There's still a lot.
This is very early days.
There's at least two things that we need lots of here to really help autonomous agents take off.
First thing we need are sensors, because these autonomous agents need to be able to read the temperature, they need to be able to read where they are, they need to be able to read the dials on something, and there's not a lot of that yet.
We also need effectors.
You need to be able to control the motor, to open the pod bay doors, you need to be able to change system settings, you need to be able to ring a bell.
And these are not new things, because in fact the entire discipline of robotics, which is about, what, 70 years old now, is based on this science of sensors and effectors.
So there's an enormous amount of work that's already been done here.
Robots are real time.
Autonomous agents, maybe not, but we still have that.
But here's the thing, if I've learned anything from all of my work in the last couple of months on autonomous agents, it's that we can probably ask AutoGPT to build a lot of this for us.
Thank you.