Strength in weakness: JavaScript memory management and weak references
Hello, and welcome to my talk "strength in weakness: JavaScript memory management, and weak references".
In this talk, I will explain how JavaScript automatically reclaims unused memory using a mechanism called garbage collection.
I will also describe new JavaScript APIs that enable us to better interface with the garbage collection mechanism and to avoid memory leaks.
First, though, a few words about myself.
My name is Dan Shappir, and I've been doing Web development for a really long time.
I'm currently the performance tech lead at next insurance and InsureTech unicorn, aiming to revolutionize the way in which small businesses get insurance by doing it quickly and easily online.
My role at next is to make sure our websites and Web apps load and execute as quickly as possible.
I'm also a host and panelist on the popular JavaScript Jabber podcast.
And I would love for you to listen to our show.
And I'm an invited expert to the W3C Web Performance Working Group where new performance APIs are created.
If you want to connect with me about Web development or JavaScript or Web performance, the best place to do it is on Twitter where my handle is @DanShappir.
Okay then let's start by talking about how JavaScript manages memory.
JavaScript uses an automatic memory management mechanism called garbage collection or GC for short, for automatically reclaiming unused memory.
This means that while you do need to explicitly create new JavaScript objects using the `new` operator, or just as object literals, you don't need to delete them where they're no longer in use.
The JavaScript engine does this automatically for you.
It's important to understand that GC only reclaims memory that can no longer be used, not memory that is no longer needed.
I'll get into the details of what this distinction means later on in this presentation.
It's also important to note that the way in which GC works in each JavaScript engine is an implementation detail that is not specified in the language standard.
And finally, JavaScript doesn't provide APIs to force or prevent GC.
It happens if, and when the JavaScript engine decides.
That's it.
That said the language specification has been extended to provide new APIs for interfacing with the GC, and that's what we'll cover in this talk.
In order to understand these APIs and their purpose we'll start with a realistic example.
Let's say we're implementing a system for managing employees within an organization.
To that end, we use an employee library that provides employee type objects.
In this code snippet, we're instantiating a single employee instance.
Now let's say that our system needs to manage additional entities that are associated with employees, such as phones, computers, and cars.
We have additional libraries that we can import for these entities as well.
But how do we associate each such entity with an employee that uses it?
How do I tell the system "this is my phone".
The most obvious solution is to extend the employee library and add methods for associating these items with the relevant employees.
But it turns out that there are some problems with this obvious approach.
First, can we even extend this library?
Maybe it's a third-party library that we're using.
Do we really want to fork it and support that fork going forward just in order to add these methods?
Also by adding item specific methods to that library, we've created tight coupling between it and the various entity libraries.
Whenever we add or remove or replace an entity library, we will potentially need to modify the employee library as well.
So let's try another approach and see if it's better.
Since we're using JavaScript, we can just tack on new properties on the employee objects without needing to modify the implementation at all.
After all JavaScript lets us modify the structure of any object, whichever way it's created dynamic programming for the win.
But it turns out that there are downsides to this approach as well.
Adding properties to objects, willy nilly can easily cause collisions.
For example, what happens if a new version of this library comes out that also implements a phone property?
That version will likely conflict with our own implementation.
Also putting data on properties in this way breaks encapsulation, since any part of the code can do anything with them, can read them, modify them, whatever.
That is obviously a dangerous thing to do.
And this approach is antagnostic to Typescript, which doesn't like such a fast and loose approach to object structure, precisely because it wants to save us from ourselves.
Okay, then let's try a different method.
Instead of modifying the employee object to reference a phone object somehow, let's create an independent map collection to, well, map one to the other.
This way we can get from an employee to their phone without needing to modify either one of these entities.
Now we've not created coupling, we don't care if either library changes and TypeScript likes us.
It's a perfect solution.
Or is it?
Turns out that there is a problem with this solution as well.
Say we have an employee object that references an employee ... that is referenced by an employee variable and a phone object that is referenced by a phone variable.
And a 'phones' map, which uses employees as keys and phones as values.
And we've added this employee and phone into the map.
Now let's say that the employee has left the organization and has taken their phone with them.
So we set both variables to null, and since neither object is needed anymore, we expect a GC to reclaim their memory automatically.
But remember what we said about the GC, it only reclaims memory that can no longer be used and these objects can technically, still be used.
That's because we can still get to them through the map object itself.
That is from the phone's variable to the map and from the map to the employee instance and to the phone instance.
As a result, unless we also explicitly remove these objects from the map, the GC cannot reclaim them and we get a memory leak.
Precisely the problem that GC was intended to solve.
Okay then, let's fix this by explicitly removing them from the map.
For that, we will need to implement a global cleanup function that removes the unneeded employee object from all the maps in which it might be used as a key.
But that means that instead of relying on the GC for automatic memory reclamation, we now need to manage employee object lifetime explicitly by calling that cleanup function, whenever we no longer need an employee instance.
We've effectively eliminated a main benefit of using GC.
The fact that it's automatic.
It's become manual.
Also, this cleanup function must be aware of every such map in the application.
Whenever we add or remove such a map anywhere in the code, we will need to update this function accordingly.
In other words, it's tightly coupled with everything.
This is definitely not a desirable solution.
So is there anything better that we can do instead?
Turns out that there is something better, something that can finally solve all our problems.
In fact, all we need to do is change the map to a WeakMap.
That's it.
Now we don't need that cleanup function anymore.
Now GC just works as intended.
But how does this magic happen?
For that we need to understand how WeakMap works.
We create a WeakMap, just like creating a map only using the WeakMap constructor instead.
But when we add a key/value pair into it, instead of referencing both of them, the WeakMap creates a hidden field on the key object that directly references the value object.
And this field is hidden in such a way that only that specific WeakMap can see it.
This means that when we use the WeakMap to look up a phone for an employee, instead of using some internal dictionary or some other data structure, it simply uses that, the value of that hidden field.
If you remove the reference to the phone instance, it cannot be GC'd, because it is still referenced by the employee instance, using that hidden field.
This is exactly what we want because the employee still uses that phone, even if we don't hold a reference to that phone.
But when we remove the reference to the employee instance, then there are no references to it anymore, because the WeakMap itself does not hold a reference to its keys.
So it can in fact, be reclaimed by the GC.
And when the employee object is gone, there are now no references to the phone object either.
And it can be reclaimed by the GC as well.
You can say that we are back to the option of tacking on a custom property onto the employee object, but without any of the downside that we saw, when we examined that approach.
Bottom line, no memory leak and no need for a cleanup function, everything just happens automatically.
Or you might say 'automagically' exactly how we want the GC to work.
It's important to understand that the WeakMap isn't really a collection in that you can't get its members, either the keys or the values, from the WeakMap instance.
That's because as I explained it doesn't reference either.
Instead, it's just a pipe directly from the keys to the values using hidden fields.
For that reason, it doesn't have methods like 'forEach' or 'keys' or others.
WeakMap is available in all modern browsers.
It's worth mentioning that in addition to WeakMap ... there's also WeakSet, which is also available in all modern browsers.
You can say that WeakSet is to set as WeakMap is to map.
For a WeakSet the hidden field on the keys is a boolean value, rather than an object reference.
It indicates that that object is a part of that WeakSet.
This implementation makes WeakSet lightweight and very efficient when appropriate.
And again, no concerns about memory leaks.
Both WeakMap and WeakSet have been part of the JavaScript standard for a while now.
Now let's look at a much more recent addition to JavaScript.
A real weak reference.
First, we create a new object instance and the ref variable is a regular reference to it.
Obviously the GC cannot reclaim this object while this reference exists.
Now we use a new WeakRef constructor to create a weak reference to the same object.
We can use the `deref` method to get at the object that's weakly referenced.
If we now clear the ref variable so that it no, no longer references that object, the GC is free to reclaim it, even though the weak reference still exists.
If we wait a while, so that the GC has a chance to kick in, then we may find that the object has indeed been reclaimed.
And if that happens, then deref will return `undefined`, instead of that object reference.
If the GC doesn't happen, then deref will continue to return that reference.
By the way, you can see this is a nice usecase for the optional chaining operator.
As I said, WaekRef is a new addition to JavaScript and was only added to the standard in 2021.
Despite this it's already supported by effectively all modern browsers.
Here's a simple use case for it.
Say we want to repeatedly update a DOM element.
For example, to display the current time.
We can simply hold a reference to this element, but that will prevent it from getting garbage collected if the display has changed and this element is removed from the browser's DOM.
Another approach is to search for it each and every time, say, using document querySelector, and only update it if we find it.
But that adds the search overhead, especially if the DOM is large and that search will be executed for every clock tick in this example.
A weak reference solves this problem by enabling us to retain a connection to that element without preventing it from getting GC when it's removed from the dome.
We will create a weak reference to that DOM element, and for every clock tick, we will use `deref` to get it.
If deref returns an element, then we'll update it with a new time.
If deref returns undefined.
Well, then we can stop.
It's important to note that MDN cautious about the use of WeakRef because of its indeterminate nature.
And because if and when a GC runs and which objects it collects are implementation dependent, as I described before.
I also want to briefly mention another API, which is called `finalizationRegistry`, which was added alongside WeakRef to the JavaScript standard.
It provides a mechanism for performing extra cleanup during GC, by registering a callback to be invoked when an object is reclaimed.
For example, say we have an object that contains a WebSocket connection or a database handle, and we want these to be released when the object is no longer in use.
We can use FinalizationRegistry as a fallback safety mechanism in case they aren't released explicitly by the user of that object.
As with WeakRef the behavior of FinalizationRegistry.
Is somewhat indeterminate and implementation dependent because it depends on the garbage collector.
That's why FinalizationRegistry is not a replacement for explicit cleanup of scarce resources, such as database handles.
It's just a fallback or fail safe mechanism to do the cleanup in case there's a bug in the code that so that the explicit cleanup does not happen.
That is all.
So to summarize, as we saw WeakMap is a great mechanism for avoiding resource leaks while preventing coupling with between libraries and different parts of an application.
In other words, it can be a great replacement for map in many, many cases.
WeakRef enables you to retain access to an object for as long as it exists, without preventing it from being garbage collected, and FinalizationRegistry enables hooking into the GC for performing extra cleanup as a fallback mechanism.
But as I've said before both WeakRef and FinalizationRegistry should be used with caution because of their indeterminate nature.
So this concludes my presentation on JavaScript management, memory management and weak references.
If you want to contact me to discuss these JavaScript language features or any other aspect of Web development, feel free to connect with me over Twitter.
Thank you for joining me and goodbye.