Strength in weakness: JavaScript memory management and weak references

Dan Shappir at Global Scope 2022

Transcript
Slides

Hello, and welcome to my talk "strength in weakness: JavaScript memory management, and weak references".

In this talk, I will explain how JavaScript automatically reclaims unused memory using a mechanism called garbage collection.

I will also describe new JavaScript APIs that enable us to better interface with the garbage collection mechanism and to avoid memory leaks.

First, though, a few words about myself.

My name is Dan Shappir, and I've been doing Web development for a really long time.

I'm currently the performance tech lead at next insurance and InsureTech unicorn, aiming to revolutionize the way in which small businesses get insurance by doing it quickly and easily online.

My role at next is to make sure our websites and Web apps load and execute as quickly as possible.

I'm also a host and panelist on the popular JavaScript Jabber podcast.

And I would love for you to listen to our show.

And I'm an invited expert to the W3C Web Performance Working Group where new performance APIs are created.

If you want to connect with me about Web development or JavaScript or Web performance, the best place to do it is on Twitter where my handle is @DanShappir.

Okay then let's start by talking about how JavaScript manages memory.

JavaScript uses an automatic memory management mechanism called garbage collection or GC for short, for automatically reclaiming unused memory.

This means that while you do need to explicitly create new JavaScript objects using the `new` operator, or just as object literals, you don't need to delete them where they're no longer in use.

The JavaScript engine does this automatically for you.

It's important to understand that GC only reclaims memory that can no longer be used, not memory that is no longer needed.

I'll get into the details of what this distinction means later on in this presentation.

It's also important to note that the way in which GC works in each JavaScript engine is an implementation detail that is not specified in the language standard.

And finally, JavaScript doesn't provide APIs to force or prevent GC.

It happens if, and when the JavaScript engine decides.

That's it.

That said the language specification has been extended to provide new APIs for interfacing with the GC, and that's what we'll cover in this talk.

In order to understand these APIs and their purpose we'll start with a realistic example.

Let's say we're implementing a system for managing employees within an organization.

To that end, we use an employee library that provides employee type objects.

In this code snippet, we're instantiating a single employee instance.

Now let's say that our system needs to manage additional entities that are associated with employees, such as phones, computers, and cars.

We have additional libraries that we can import for these entities as well.

But how do we associate each such entity with an employee that uses it?

How do I tell the system "this is my phone".

The most obvious solution is to extend the employee library and add methods for associating these items with the relevant employees.

But it turns out that there are some problems with this obvious approach.

First, can we even extend this library?

Maybe it's a third-party library that we're using.

Do we really want to fork it and support that fork going forward just in order to add these methods?

Also by adding item specific methods to that library, we've created tight coupling between it and the various entity libraries.

Whenever we add or remove or replace an entity library, we will potentially need to modify the employee library as well.

So let's try another approach and see if it's better.

Since we're using JavaScript, we can just tack on new properties on the employee objects without needing to modify the implementation at all.

After all JavaScript lets us modify the structure of any object, whichever way it's created dynamic programming for the win.

But it turns out that there are downsides to this approach as well.

Adding properties to objects, willy nilly can easily cause collisions.

For example, what happens if a new version of this library comes out that also implements a phone property?

That version will likely conflict with our own implementation.

Also putting data on properties in this way breaks encapsulation, since any part of the code can do anything with them, can read them, modify them, whatever.

That is obviously a dangerous thing to do.

And this approach is antagnostic to Typescript, which doesn't like such a fast and loose approach to object structure, precisely because it wants to save us from ourselves.

Okay, then let's try a different method.

Instead of modifying the employee object to reference a phone object somehow, let's create an independent map collection to, well, map one to the other.

This way we can get from an employee to their phone without needing to modify either one of these entities.

Now we've not created coupling, we don't care if either library changes and TypeScript likes us.

It's a perfect solution.

Or is it?

Turns out that there is a problem with this solution as well.

Say we have an employee object that references an employee ... that is referenced by an employee variable and a phone object that is referenced by a phone variable.

And a 'phones' map, which uses employees as keys and phones as values.

And we've added this employee and phone into the map.

Now let's say that the employee has left the organization and has taken their phone with them.

So we set both variables to null, and since neither object is needed anymore, we expect a GC to reclaim their memory automatically.

But remember what we said about the GC, it only reclaims memory that can no longer be used and these objects can technically, still be used.

That's because we can still get to them through the map object itself.

That is from the phone's variable to the map and from the map to the employee instance and to the phone instance.

As a result, unless we also explicitly remove these objects from the map, the GC cannot reclaim them and we get a memory leak.

Precisely the problem that GC was intended to solve.

Okay then, let's fix this by explicitly removing them from the map.

For that, we will need to implement a global cleanup function that removes the unneeded employee object from all the maps in which it might be used as a key.

But that means that instead of relying on the GC for automatic memory reclamation, we now need to manage employee object lifetime explicitly by calling that cleanup function, whenever we no longer need an employee instance.

We've effectively eliminated a main benefit of using GC.

The fact that it's automatic.

It's become manual.

Also, this cleanup function must be aware of every such map in the application.

Whenever we add or remove such a map anywhere in the code, we will need to update this function accordingly.

In other words, it's tightly coupled with everything.

This is definitely not a desirable solution.

So is there anything better that we can do instead?

Turns out that there is something better, something that can finally solve all our problems.

In fact, all we need to do is change the map to a WeakMap.

That's it.

Now we don't need that cleanup function anymore.

Now GC just works as intended.

But how does this magic happen?

For that we need to understand how WeakMap works.

We create a WeakMap, just like creating a map only using the WeakMap constructor instead.

But when we add a key/value pair into it, instead of referencing both of them, the WeakMap creates a hidden field on the key object that directly references the value object.

And this field is hidden in such a way that only that specific WeakMap can see it.

This means that when we use the WeakMap to look up a phone for an employee, instead of using some internal dictionary or some other data structure, it simply uses that, the value of that hidden field.

If you remove the reference to the phone instance, it cannot be GC'd, because it is still referenced by the employee instance, using that hidden field.

This is exactly what we want because the employee still uses that phone, even if we don't hold a reference to that phone.

But when we remove the reference to the employee instance, then there are no references to it anymore, because the WeakMap itself does not hold a reference to its keys.

So it can in fact, be reclaimed by the GC.

And when the employee object is gone, there are now no references to the phone object either.

And it can be reclaimed by the GC as well.

You can say that we are back to the option of tacking on a custom property onto the employee object, but without any of the downside that we saw, when we examined that approach.

Bottom line, no memory leak and no need for a cleanup function, everything just happens automatically.

Or you might say 'automagically' exactly how we want the GC to work.

It's important to understand that the WeakMap isn't really a collection in that you can't get its members, either the keys or the values, from the WeakMap instance.

That's because as I explained it doesn't reference either.

Instead, it's just a pipe directly from the keys to the values using hidden fields.

For that reason, it doesn't have methods like 'forEach' or 'keys' or others.

WeakMap is available in all modern browsers.

It's worth mentioning that in addition to WeakMap ... there's also WeakSet, which is also available in all modern browsers.

You can say that WeakSet is to set as WeakMap is to map.

For a WeakSet the hidden field on the keys is a boolean value, rather than an object reference.

It indicates that that object is a part of that WeakSet.

This implementation makes WeakSet lightweight and very efficient when appropriate.

And again, no concerns about memory leaks.

Both WeakMap and WeakSet have been part of the JavaScript standard for a while now.

Now let's look at a much more recent addition to JavaScript.

A real weak reference.

First, we create a new object instance and the ref variable is a regular reference to it.

Obviously the GC cannot reclaim this object while this reference exists.

Now we use a new WeakRef constructor to create a weak reference to the same object.

We can use the `deref` method to get at the object that's weakly referenced.

If we now clear the ref variable so that it no, no longer references that object, the GC is free to reclaim it, even though the weak reference still exists.

If we wait a while, so that the GC has a chance to kick in, then we may find that the object has indeed been reclaimed.

And if that happens, then deref will return `undefined`, instead of that object reference.

If the GC doesn't happen, then deref will continue to return that reference.

By the way, you can see this is a nice usecase for the optional chaining operator.

As I said, WaekRef is a new addition to JavaScript and was only added to the standard in 2021.

Despite this it's already supported by effectively all modern browsers.

Here's a simple use case for it.

Say we want to repeatedly update a DOM element.

For example, to display the current time.

We can simply hold a reference to this element, but that will prevent it from getting garbage collected if the display has changed and this element is removed from the browser's DOM.

Another approach is to search for it each and every time, say, using document querySelector, and only update it if we find it.

But that adds the search overhead, especially if the DOM is large and that search will be executed for every clock tick in this example.

A weak reference solves this problem by enabling us to retain a connection to that element without preventing it from getting GC when it's removed from the dome.

We will create a weak reference to that DOM element, and for every clock tick, we will use `deref` to get it.

If deref returns an element, then we'll update it with a new time.

If deref returns undefined.

Well, then we can stop.

It's important to note that MDN cautious about the use of WeakRef because of its indeterminate nature.

And because if and when a GC runs and which objects it collects are implementation dependent, as I described before.

I also want to briefly mention another API, which is called `finalizationRegistry`, which was added alongside WeakRef to the JavaScript standard.

It provides a mechanism for performing extra cleanup during GC, by registering a callback to be invoked when an object is reclaimed.

For example, say we have an object that contains a WebSocket connection or a database handle, and we want these to be released when the object is no longer in use.

We can use FinalizationRegistry as a fallback safety mechanism in case they aren't released explicitly by the user of that object.

As with WeakRef the behavior of FinalizationRegistry.

Is somewhat indeterminate and implementation dependent because it depends on the garbage collector.

That's why FinalizationRegistry is not a replacement for explicit cleanup of scarce resources, such as database handles.

It's just a fallback or fail safe mechanism to do the cleanup in case there's a bug in the code that so that the explicit cleanup does not happen.

That is all.

So to summarize, as we saw WeakMap is a great mechanism for avoiding resource leaks while preventing coupling with between libraries and different parts of an application.

In other words, it can be a great replacement for map in many, many cases.

WeakRef enables you to retain access to an object for as long as it exists, without preventing it from being garbage collected, and FinalizationRegistry enables hooking into the GC for performing extra cleanup as a fallback mechanism.

But as I've said before both WeakRef and FinalizationRegistry should be used with caution because of their indeterminate nature.

So this concludes my presentation on JavaScript management, memory management and weak references.

If you want to contact me to discuss these JavaScript language features or any other aspect of Web development, feel free to connect with me over Twitter.

Thank you for joining me and goodbye.

Strength in Weakness

JavaScript memory management and weak references

Performance Tech Lead at Next Insurance
Host on the JavaScript Jabber podcast
Invited Expert to the W3C Web Performance Working Group
@DanShappir

JavaScript Memory Management

JavaScript uses Garbage Collection for memory management
Automatically reclaims memory that can no longer be used - not that is no longer in use
Exact method of operation is implementation dependent (Not specified by ECMAScript standard)
No methods provided to force / prevent collection

New APIs now available for interfacing with the Garbage Collector

Scenario: App for Managing Organizations

Using an employee library in a project

import { Employee } from "employee"
// ...
const employee = new Employee('Dan', 'Shappir', 'speaker')

Requirement: Add Entities

Such as: phone, computer, car

import { Employee } from "employee"
import { Phone } from "phone"
// ...
const employee = new Employee('Dan', 'Shappir', 'speaker')
const phone = new Phone(972, 545404209)
// How to associate employee with phone?

Option #1: Extend the Library

Add methods as required

import { Employee } from "employee"
import { Phone } from "phone"
// ...
const employee = new Employee('Dan', 'Shappir', 'speaker')
const phone = new Phone(972, 545404209)
employee.setPhone(phone) // Add methods

Cons

What if employee is a 3rd-party app?
Creates tight coupling

Option #2: Tack on Properties

Add properties to objects

import { Employee } from "employee"
import { Phone } from "phone"
// ...
const employee = new Employee('Dan', 'Shappir', 'speaker')
const phone = new Phone(972, 545404209)
employee.phone = phone // Extend object (it’s JavaScript 😄)

Cons

Collisions
Breaks encapsulation
Antagonistic to TypeScript

Option #3: Use Map

External to objects → no coupling

import { Employee } from "employee"
import { Phone } from "phone"
// ...
const employee = new Employee('Dan', 'Shappir', 'speaker')
const phone = new Phone(972, 545404209)
const phones = new Map<Employee, Phone>()
phones.set(employee, phone)
// ...
const p = phones.get(employee)

The perfect solution ... or is it?

Cons

Rectangle labelled "employee" with an arrow pinting at an oval labelled "‘Dan’, ..."
Rectangle labelled "phones" with an arrow pointing at diamond labelled "Map"
Rectangle labelled "phone" wqith an arrow pointing to an oval labelled "972, ..."
An arrow points from "Map" to "'Dan, …" and "972, …"
In large text the word Leak

Explicit Cleanup Required

Highly coupled function which must be called

export function remove(employee: Employee){
   phones.delete(employee)
   computers.delete(employee)
   cars.delete(employee)
}

Map → WeakMap

Trivial change

import { Employee } from "employee"
import { Phone } from "phone"
// ...
const employee = new Employee('Dan', 'Shappir', 'speaker') 
const phone = new Phone(972, 545404209) 
const phones = new WeakMap<Employee, Phone>()
phones.set(employee, phone)
// ...
const p = phones.get(employee)

How WeakMap Works

Created like Map

Rectangle labelled "employee" with an arrow pointing to an oval labelled "‘Dan’, ..."
Rectangle labelled "phones" with an arrow pointing to a diamond labelled "WeakMap"
Rectangle labelled "phone" with an arrow pointing to an oval labelled "972, ..."

let employee = new Employee('Dan', 'Shappir', 'speaker')
let phone = new Phone(972, 545404209)
const phones = new WeakMap<Employee, Phone>()

How WeakMap Works

Hidden reference from key to value

Diagram as in the previous slide with an arrow with dotted line pointing from "'Dan', ...' to "972, ..." across the diamond labelled "WeakMap".

let employee = new Employee('Dan', 'Shappir', 'speaker')
let phone = new Phone(972, 545404209)
const phones = new WeakMap<Employee, Phone>()
phones.set(employee, phone)

How WeakMap Works

Keeps value alive

Diagram as in the previous slide, with a red cross over the line from "phone" to "972, ..."

let employee = new Employee('Dan', 'Shappir', 'speaker')
let phone = new Phone(972, 545404209)
const phones = new WeakMap<Employee, Phone>()
phones.set(employee, phone)
phone = null

How WeakMap Works

Until neither can be reached

Diagram as in the previous slide, with a red cross also over the line from "employee" to "'Dan', ..."

let employee = new Employee('Dan', 'Shappir', 'speaker')
let phone = new Phone(972, 545404209)
const phones = new WeakMap<Employee, Phone>()
phones.set(employee, phone)
phone = null
employee = null

How WeakMap Works

Now both can be GC-ed

Similar illustration to before. The arrows from Employee and phone are no longer there. A red cross is now over "'Dan, ..." and "972, ..."

employee

let employee = new Employee('Dan', 'Shappir', 'speaker')
let phone = new Phone(972, 545404209)
const phones = new WeakMap<Employee, Phone>()
phones.set(employee, phone)
phone = null
employee = null

No memory leak and no need for release()

WeakMap Isn’t Really a Collection

Can’t get members from WeakMap instance
These methods don’t exist:
- forEach(),keys(),values(),entries()
It’s a pipe from keys to values
Keys can only be objects because of hidden field

Also WeakSet

WeakSet is to WeakMap as Set is to Map
Like WeakMap, WeakSet isn’t really a collection
- The hidden field is just a Boolean flag
Lighter-weight and more efficient if you don’t need to iterate over members

WeakRef - A real weak reference!

After GC .deref() may return undefined

let ref = {value: 42}
const wr = new WeakRef(ref)
console.log(wr.deref() === ref) // true
// ...
ref = null
// ...
console.log(wr.deref() !== undefined) // maybe true, maybe false console.log(wr.deref()?.value) // use-case for optional chaining

Using WeakRef

Part of ES2021; supported by modern browsers
Sample use-case:
- – Keep reference to DOM element without preventing it from being removed and GC-ed
- – Also Instead of searching for it again and again

MDN on WeakRef: avoid if possible (indeterminate and implementation dependent)

Bonus: FinalizationRegistry

Hook into the GC to perform extra cleanup
Part of ES2021; supported by modern browsers
May never be invoked
Together with WeakRef makes it possible for library authors to mitigate impact when users forget to release / clear / unsubscribe / remove / unregister

Summary

WeakMap → Great for avoiding coupling. Reduces risk of resource leaks
WeakRef → Provides access to objects for as long as they exist. Indeterminate and implementation dependent. Use with discretion
FinalizationRegistry → Enables hooking into the GC. Indeterminate and implementation dependent Use with discretion