Increasing Array’s immutable interface

Ashley Claymore at Global Scope 2022

Transcript
Accessible Slides

Hello, welcome to my talk.

Let's start with some basics.

Arrays in JavaScript.

They're mutable, at least by default.

This means that they have methods like sort, and those methods are perfectly fine to actually mutate the original.

So even though you get back a sorted array.

You're getting back the same array, you're not getting back a new array.

If you want to sort an array without actually modifying the original, you've gotta create a copy first and then work with that copy.

So you're still mutating beyond mutating a copy.

There's a JavaScript proposal called 'records and tuples' [prounces it 'tupples'] or 'records and tuples' [pronounces it 'two-ples'] depending on your tuple/tuple preference.

And tuples look a lot like arrays, except they have a hash at the front, and this means that they are immutable.

And this means that in the design of them, it would be a little bit odd to have a method, like 'sort', because 'sort' looks and sounds like array sort, but it couldn't be mutating because tuples are completely immutable.

So instead, and you're still, sorting is still a useful thing, so it has the sorting method, but it has a different name-it's called 'toSorted'.

And just like sort, you get back a sorted sorted touple, just like you'd get back a sorted array, but this is a copy.

And the original one is exactly as it was before.

I remember back when I was looking at the records and tuple proposal a while back, back before I was working at Bloomberg and I was looking at it and I actually thought that seeing those extra kind of methods at the time, it had sorted and popped and pushed like, oh, some of those actually look like they'd be quite useful on arrays.

And it turns out after I joined Bloomberg and talking to people that work on this the records on tuple proposal that they had thought the exact same thing, and they'd also thought it's quite useful to have those methods on array.

So I was then lucky enough to start working on this proposal and feel really fortunate that something that I thought would be interesting way back, you know, a while back I'm now getting to work on which makes me rather happy.

And the reason why it would be quite nice to have those methods on tuple is because if you don't do that, it means that the tuple prototype has a large overlap of array.

But then it has this kind of a few extra methods that kind of hang off over to the side where they only exist on tuple, and they're not on array.

If you actually take those methods and then add them arrays, then you expand the array prototype to kind of subsume the tuple prototype, meaning that everything you have on tuple is also on array.

In other words, the tuple prototype is a subset of the array prototype.

And to be clear, I'm talking about the, kind of the names of the methods.

The actual functions on the tuple prototype are different functions designed to work on tuples, not arrays, but conceptually, they are-one is a subset of the other.

To kind of further explain why this is a good idea, is, that if you don't do this, if you wanna write a function that wants to work on arrays and tuples, you know, it's kind of a useful utility and you want people to be able to pass in arrays and tuples and not have to worry too much.

You'd have to explicitly add code to make sure you handle the fact that these are slightly different.

Whereas instead, where you're doing this kind of subsetting, it means if you stick to that subset-so if you treat the inputs as if they're tuples-it means that in a lot of cases, that it would also work just as well if someone passes in an array and you don't have to worry too much about it.

So here I can say, list toSorted and it doesn't matter if that list is a tuple or array.

Either way, it's, it's gonna work.

Going off on a little important detail.

I wanna talk about Web compatibility and kind of JavaScript's motto to not break the Web.

And this impacts proposals to the JavaScript language, you know, the things that TC39 are working on.

Most recently, this impacted a proposal called array grouping.

That proposal was adding a method called groupBy.

It's kind of a cool method, you should go check it out.

And when this was kind of designed and worked on, and then when it got to a stage of being implemented in browsers and started to be shipped out, the some tests detected that a bunch of websites they broke, right?

They saw bits of content disappear and there was some investigation and they worked out that those websites were using an old version of something called sugarJS and sugarJS, it turns out, also adds a method to array called groupBy.

But crucially two things.

One it's groupBy is a bit different to the standardized groupBy it kind of accepted had a few more overloads.

There were some extra things you could do with it.

And, importantly, it also in these old versions, it would only add these methods if they weren't already there, which while it may seem like a good idea, or if it's already there, leave it alone.

That's actually not good because it means that when the browsers implemented this method, this sugarJS version would see the method was already there and not add it even though actually it's a different method.

It has different behavior.

So the rest of the code that was relying on the sugarJS version, and that previously worked fine, that suddenly was getting different behavior and their websites broke.

So the proposal had to kind of roll back and not be shipped.

What they did, the proposal champions kind of went back to their committee and looked at and looked at different, just coming with a different name for this method.

There were just too many websites that were impacted by this.

The websites can't be updated.

So instead the proposal changes and they're now going for kind of the same behavior, but they're gonna call it "group" instead.

The theory being, hopefully that is Web compatible.

Worth mentioning that in newer versions of sugarJS, they've actually removed that check and they just always override the method.

So you get a kind of, much more consistent, predictable behavior, but there was, yeah, these websites were using old diversions of sugarJS that had that kind of issue of only installing it, if it wasn't already there.

So rather than take the things on tuple and put, and like put them on array to kind of overlap them, that's a bit risky because it may take a bit of time to work on record and tuple and if you then afterwards, add things to array, it then may turn out that they're not compatible.

Like maybe that these things already exist on certain websites.

And then they, they break in the same type of way as the sugarJS groupBy problem.

So actually it's much safer to put these things on array first, design them on array, make sure they work on array, make sure they're compatible on array.

Really get them solid there first.

And then proceed of record and tuple and then just follow suit, because tuple doesn't have this Web compatibility issue, cuz it doesn't exist.

You know, it can kind of call the methods, whatever they want pretty much.

So that kind of idea of focusing just on what expanding the array's immutable interface, that's poured off and was worked on as a separate proposal.

And as I said earlier, I got to join and work as a co-champion on that proposal.

So while we're working on this my co-champion Robin, there was quite a lot we still had to do, even though this proposal was much smaller than records and tuple there's, you know, it was still quite a bit to work out like what methods we adding to array and what should their names be.

And then other things kind of subtle up things that weren't obviously at the beginning, but kind of cropped up in conversations.

Like what should we do with handling symbol dot species?

And what do we do about when you have holes in arrays?

So that kind of the design of this, we, first of all, let's like take everything that's currently on array and also on typed arrays.

And really let's just focus on the things that have a mutating behavior, because all the others, they should just directly port over to tuple as they are.

And when you do that, you know, there's like about 10 methods and adding copies like slightly different versions of all these 10, that seems like quite a lot of things to add to array.

You know, it could be quite bloated.

So we wanted to reduce that down.

One of the things we did was looked at kind of a corpus of GitHub repositories, about a million GitHub repositories that have some JavaScript in them.

And then look at how common these methods turn up.

And that kind of shows that some of like some like sorting very, very common, almost half.

Other things like using fill and copyWithin much rarer.

And then using that bit of research and also the kind of observation that some of these methods like shifting, popping and pushing, you can actually already do those in a non mutating way, if you just use some of the existing array methods like slicing and concat.

That left the reverse method, sorting and splicing.

And technically it also left unshift as a way of pushing kind of something onto the front.

But actually once you have the non mutating splice method, that gives you a way of doing unshift as a non mutating method.

So we kind of took doing all of this, this, let us narrow things down to just three methods.

And actually the proposal added a fourth and it has a method `with`.

This isn't based on a mutating method like the rest instead, this is the kind of method, the non mutating method form of array[index] assignment.

So we've got our kind of four things.

But we have to work out what names should they have and.

And, originally, when these were in the records and tuple proposal, they were kind of just the past tense.

So like sorted and popped and pushed and reversed.

And the feedback we got on that is two.

One, actually, when you're looking at other languages like Swift and Java and Python, the past tense is good because they also have that, this pattern of sort-mutate, sorted-it doesn't mutate.

So, okay.

That, that past tense is good, but we've got feedback that it could still be a bit confusing when talking about these things, if you only have that past tense and adding a prefix may really help make it much clearer, this kind of distinguishment between the two.

So we were looking at prefixes and along with what we decided to go with with the 'to' prefix we also looked at 'with' about that did not get much love.

Somewhat of a joke in the JavaScript community is that array has slice and splice and which one's, which, and it, you could think that, oh, you're adding a third one "toSpliced"-isn't this just making things even more complicated?

And actually I think perhaps not, because when you're editing and I type in splice, when you get your kind of auto complete popping up, if you have this third option come up, that is maybe a little subconscious hint that "Hey, I've got splice and toSpliced".

That's a little hint that splice is mutating and slice isn't mutating.

So slice is "I'm getting a copy.

I'm taking a slice out of an array as a copy" and splice is mutating or splicing stuff out or into the actual original.

And that's why you have toSpliced.

So I actually think this helps a little bit.

Naming kind of the setter methods decided, and the names of the methods decided we know have to actually kind of write this specification text of what exactly should the algorithm be of these methods.

And initially you could think, surely that's just, you take what splice and doing splice reverse would do-splice would just give you a copy, then you can reverse that copy.

But actually when we're talking about this.

No, we, we did stuff a little bit differently.

So if we look at the kind of rough pseudo ish code of what the specification says we actually do, there are a few things that we spend a bit of time discussing.

One of them was what should you do about what you, what you should do about Symbol.species?

And this is about handling subclass constructors.

And this has been in the language since about year six.

But one of the issues with it has been with people implementing JavaScript that makes these these methods really hard to optimize because they, they create a lot of dynamic behavior.

So there's been kind of a steer to move away from this symbol dot species behavior.

And there wasn't precedent for doing this.

So.

this was quite something we actually had to discuss with TC39 with the rest of the committee to go, are we actually, oh, we've been talking about this, but are we actually gonna go in this new direction with new proposals and the consensus was yes.

So we actually always create a new array.

We don't create these subclasses.

And that means in the loop underneath engines can make more assumptions because they know that they're always working on their built in standard arrays and not kind of exotic subclasses.

Another thing is not checking for holes.

And this does have precedent like newer array methods, like dot includes also don't check for holes.

For those who don't know holes are when, even though you're kind of accessing an array, within the kind of range between zero and length, it doesn't actually have a property on the object.

It's kind of this kind of odd case that you can end up with with arrays.

And again, this makes working qith arrays just a little bit more awkward.

And newer methods, they ignore that kind of the fact that race can have these holes.

So dot includes will just search through the array ignoring if certain places are undefined, because they're actually storing undefined or they're undefined, because there's nothing there at all its a whole.

But there wasn't precedent of whether this should happen in methods that return copies.

dot includes just returns true or false, its not returning a copy.

So there's some discussion on whether we should continue this precedent of not checking for holes, even when you are returning copies.

And what does this actually look like?

So if I have a subclass of array, you know, MyArray extends array and I create it.

I put a few things in, but then I'm gonna push up its length to six, which means I'm gonna have a few holes in it.

If I slice and reverse that what I get back is a new array, but it's still a subclass of MyArray.

Also, it's still gonna keep, it's still gonna be a little bit odd, like the original, and it's still gonna have those holes in it.

Whereas when you do use this new toReverse method, the holes just get treated as if they're, they are properties with undefined and you then still get the reverse behavior.

And also you are always getting back a standardized array.

You're not getting back, you're not getting back a subclass still.

One other thing that came up after it actually got to stage three.

When V8 were implementing this, they spotted something where the way we'd written the text, made it again a bit hard for them to optimize.

And this was for typed arrays.

Because with typed arrays when you are adding something in because typed arrays can only store numbers or bigInt, depending on the type of array, it means that they take that value and then they try and convert it into a number or a bigInt, and if that value is an object, it can have kind of special-you can kind of intercept it, doing that, where if you implement the kind of valueOf, or symbol, to primitive hooks, and you add those onto your object.

And what that meant was that as V8 would be looping over the typed array, at some point in that loop, you could suddenly inject a bit of code and that makes it hard for them to optimize that loop.

So instead they suggested we change the specification slightly and we just pump that up and we do that that kind of dynamic behavior immediately.

So we still do it.

We kind of get it over and done with, before they start the loop.

Then while they're in that loop, they can be much more sure about the fact that nothing else is changing.

Cause we're never gonna go back into running other code.

We're only gonna be running kind of V8's kind of loop code.

So, you know, that's something I know that I wasn't really aware of, and it's, I, I think this just shows Why kind of so much effort goes into these proposals because just small changes in specification, you know, do have kind of big impacts on setting precedent, following precedent, the optimizer how optimized these methods can be.

So that's what I have today and I would love it if people go check out the proposal, you can read the specification kind of read kind of some of our designs.

There's also polyfills in core JS and ES-shims.

So you can install those and use those Igalia,they also implemented this in JavaScript core and Firefox sponsored by Bloomberg.

And because it's in JavaScript core, it means that it's in bun and it's also currently in Safari Tech Preview.

In Firefox, it's behind a compile-time flag, so you can use it in Firefox, but you are going to have to build Firefox yourself.

And that's all I have for you today.

Thank you so much for listening.

Talk you soon.

Increasing Array's Immutable Interface

Web Directions Global Scope 2022 July 22, 2022

Ashley Claymore
Software Engineer
JavaScript Infrastructure & Tooling

TechAtBloomberg.com

WHOAMI

Ashley Claymore
Software Engineer at Bloomberg
@acutmore
TC39 Delegate

Mutating Arrays

https://doesitmutate.xyz

const names = ["Bob", "Alice"];
Object.isFrozen(names); // false
names.sort(); // ["Alice", "Bob"]
names[0]; // "Alice"

Mutating Arrays

https://doesitmutate.xyz

const names = ["Bob", "Alice"];
Object.isFrozen(names); // false
[...names].sort(); // ["Alice", "Bob"]
names[0]; // "Bob"

Tuples

https://github.com/tc39/proposal-record-tuple

const names = #["Bob", "Alice"];
Object.isFrozen(names); // true
names.sort(); // TypeError! .sort is not a function

Tuples

https://github.com/tc39/proposal-record-tuple

const names = #["Bob", "Alice"];
Object.isFrozen(names); // true
names.toSorted(); // #["Alice", "Bob"]
names[0]; // "Bob"

Tuple.prototype.toSorted

Array.prototype.toSorted

Array.prototype.* 
Tuple.prototype.*

Venn diagram shows Tuple.prototype is largely a subset of Array.prototype

Array.prototype.* 
Tuple.prototype.*

Venn diagram shows Tuple.prototype is a subset of Array.prototype

Venn diagram from previous slide. In the area solely representing Array.tuple are "push, sort, unshift, unshift (shift?), pop". In the subset representing Tuple.prototype are "slice, with, map, toSorted, flatMap, filter "

// Before
function sortByAge(list) {
    const comparator = (a, b) => a.age - b.age;
    if (Tuple.isTuple(list)) {
        return list.toSorted(comparator);
    } else {
        return [...list].sort(comparator);
    }
}

// After
function sortByAge(list) {
    return list.toSorted((a, b) => a.age - b.age);
}

A tale on Web Compatibility

“don’t break the web”

Screenshot of Internet Explorer 3 https://www.webdesignmuseum.org/old-software/web-browsers/internet-explorer-3-0

A tale on Web Compatibility

“don’t break the web”

https://github.com/tc39/proposal-array-grouping

Array.prototype.groupBy

// sugar.js v1.3.9
if (!extendee[name]) {
    defineProperty(extendee, name, method);
}

A tale on Web Compatibility

“don’t break the web”

https://github.com/tc39/proposal-array-grouping

Array.prototype.groupBygroup

// sugar.js v1.3.9
if (!extendee[name]) {
    defineProperty(extendee, name, method);
}

A tale on Web Compatibility

“don’t break the web”

https://github.com/tc39/proposal-array-grouping

Array.prototype.groupBygroup

// sugar.js v1.4.0
if (!extendee[name]) {
    defineProperty(extendee, name, method);
}

Tuple.prototype.sorted?
Tuple.prototype.toSorted
Tuple.prototype.withSort?

Array.prototype.sorted?
Array.prototype.toSorted?
Array.prototype.withSort?

https://github.com/tc39/proposal-change-array-by-copy/

Change Array by Copy

Which methods to add?
What names?
Lookup @@species?
What to do about holes?

Venn digram of Array and TypedArray. Array only methods include

.pop
.push
.shift
.splice
.unshift

.concat
.flat
.flatMap

TypedArray specific methods are

.set
.subarray

Common methods are

.copyWithin
.fill
.reverse
.sort

.forEach
.includes
.indexOf
.join
.keys
.lastIndexOf
.map
.reduce

.at
.entries
.every
.filter
.find
.findIndex

.reduceRight
.slice
.some
.values

Diagram shows they methods of each that can mutate. Array methods

.pop
.push
.shift
.splice
.unshift

TypeArray method .set. and shared methods

.copyWithin
.fill
.reverse
.sort

Table showing frequency of use of common methods

Method	GitHub Repos (out of ~1M JS repos)
.push	69%
.sort	45%
.splice	45%
.pop	42%
.reverse	34%
.fill	12%
.copyWithin	0.5%

Mutating methods
copyWithin
fill
reverse
set
shift
sort
splice
pop
push
unshift

Mutating methods	Existing replacement
~~copyWithin~~	“high-performance method”
~~fill~~	.map*
reverse
~~set~~	TypedArray only
~~shift~~	.slice(1)
sort
splice
~~pop~~	.slice(-1)
push	.concat([v])
~~unshift~~

* `map` and `fill` treat ‘holes’ differently

Mutating methods	Existing replacement	Proposed methods
~~copyWithin~~	“high-performance method”
fill	.map
reverse		.toReversed
~~set~~	TypedArray only
~~shift~~	.slice(1)
sort		.toSorted
splice		.toSpliced
~~pop~~	.slice(-1)
~~push~~	.concat([v])
~~unshift~~	.toSpliced(0, 0, v)

Final Set

.toReversed()
.toSorted(fn?)
.toSpliced(index, deleteCount, ...items)
.with(index, val) // non-mutating // arr[index] = val

Alternative names considered

.reversed
.sorted
.spliced
.withReversed
.withSorted
.withSpliced
.withAt

Screenshot shows typeahead in VSCode of arr.slice (shows options slice, splice and toSpliced)

The subtle specification details

addMethods(Array.prototype, {
     toReversed() {
         return this.slice().reverse();
     }
});

The subtle specification details

addMethods(Array.prototype, {
     toReversed() {
        const o = toObject(this);
        const len = lengthOfArrayLike(o);
        const a = new Array();
        for (let i = 0; i < len; i++) {
           a[i] = o[len - i - 1];
        }
		return a; 
	}
});

The subtle specification details

addMethods(Array.prototype, {
     toReversed() {
        const o = toObject(this);
        const len = lengthOfArrayLike(o);
		const a = new Array(); // no Symbol.species lookup on ‘this’
        for (let i = 0; i < len; i++) {
           a[i] = o[len - i - 1];
        }
		return a; 
	}
});

Screenshot shows console with entries as follows

const a = new MyArray(1, 2, 3); a. length = 6; a;
MyArray(6) [1, 2, 3, empty X 3]
a. slice ().reverse()
MyArray(6) [empty × 3, 3, 2, 1]
a. toReversed()
(6) [undefined, undefined, undefined, 3, 2, 1]

The subtle specification details

addMethods(TypedArray.prototype, {
    with(index, value) {
		...
++      value = toNumberOrBigInt(this, value);
        const a = new TypedArrayCreate(this);
        for (let i = 0; i < length; i++) {
        	const v = i === index ? value : this[i];
        	a[i] = v;
        }
		return a; 
	}
});