Class features implementation in JSC
Hello, everyone.
I'm going to talk right now in this section about class features implementation on JSC.
So I'm going to share with you the experience that I had implementing the features and also optimizing those features as well.
So this work was done in partnership with Bloombergs as well, so thank you very much Bloomberg for sponsoring Igalia to do this work.
And also I work at this with two more colleagues, Kaitlin and Cheung.
So thank you very much for your contribution there as well.
So what are class features?
So class features is a set of proposals that are extending the kinds of class members we can declare within classes in JavaScript.
So classes in JavaScript right now, you're only able to declare methods, access source, both the static and non-static version.
But if you are from, and language like Java C++ for example, you know that it's possible to also declare other class members like public or private fields, in static or non static private fields as well and private methods, and the idea of class features is bringing those new kinds of class members to JS as well.
So we are bringing here public and private fields and private methods as well.
Yeah, let's take a look on how we can actually declare and use those new private members and also public members.
So we have here a class being declared - a class C.
And after that, we have a declaration of two fields.
First we have 'publicField' that is being initialized with the value zero.
And then we have a declaration of a private field.
To declare a private field, we need to start the entire name of this private field with the hash character.
So this is going to tell the compiler: "Ok- this private field- this field is a private one, and please treat this as a special kind of field".
Then I'm going to detail to you which are the differences between private fields and public fields in the later slides as well.
Then we have a declaration of a method, and after that we have a declaration of a private method and the private method also follows what we have for private fields here.
And we just need to use the hash character in the beginning of the identifier and this is going to say that this method is private, and we also have all the rules of encapsulation for private fields for these classes, for these members as well.
So I'm talking to you a little bit about what are class fields and about encapsulation etc.
And what does this actually mean?
So, with public fields, the semantics of this is essentially like a property, so the only difference is that, if we're taking in a mental model here, we can see that public fields are pretty much like fields that are going to be installed into the object before the constructor starts to execute.
There are some cases where you actually install that after super, but if you're not using super, there's going to be installed before the constructor code is executed.
This is also valid for private fields, but there is a noticable difference from private fields and public fields and also private methods as well because they introduce encapsulation.
So in the case of private fields, it's not possible to access them outside the class scope.
And also a class cannot access - another class cannot access a private member from another class because well, they should be private.
They should not be visible to anyone else outside the class.
Also, it's not possible to declare and use private members without declaring them beforehand.
This is actually going to generate a syntax error.
And this is different from public field where, for classes in JavaScript, if you use a field that it was not declared before, it's just going to create the name of this field within the object.
That's not the case for private field.
Also, they're unique, as I mentioned to you, and I'm going to show it to you, some examples on that.
And of course, using those hash characters - they are limited to be used only inside the class curly braces, outside that it's not possible to use them at all, okay?
So let's see here some examples of class fields - private fields encapsulation.
So here we have a class 'C' and this class 'C' is declared in a static private field called the hash field.
And of course there is nothing wrong here.
So the syntax is correct.
Then we declare a static private method which is also correct, but then, within this method we are using a private field 'hash foo', but if we look into the class body, there is no 'hash foo' declared beforehand.
So this is actually a non valid program.
And it's going to throw a syntax error because we are trying to use a private field that was not declared.
The same as valid for all the statements that we have following.
So we are trying to access a private field from 'C' outside the class body, and this is an access violation, and we know that this is a syntax error before actually even executing the code.
The same for private methods as well - so private methods has the same encapsulation rules as private fields.
So they don't diverge at all here.
Also, as I mentioned to you before, a private field from one class cannot be accessed in another class, and vice versa.
So here we can see an example where we have a class 'C' that is declaring a private field called '#field', and then also a method that is accessing the hash field from an object that is passed in as a parameter.
And then we have a class 'D' that is also declaring a private field called '#field'.
So in the line below, we are trying to access - using passing as a perameter, an object from 'D', a private field that is actually within 'C', and this is actually going to generate a 'TypeError' at row Type, because even though those private fields have the same identifier, they have the same name, they're actually different.
Because well, private field from 'C' is different from private field from 'D' and if we would allow access from each other, we would be violating the encapsulation.
And yeah, so we can see as a mental model here that they start off being the same based on the they identifier, at row type they are actually different names and the name itself doesn't matter.
And the source code of the name doesn't matter for the access.
This is also the case for private methods as well, so there is no difference here.
All right.
I think we already have all the background necessary to understand the semantics of class features.
And let's talk a little bit about the implementation here.
So starting designing the feature, we had a couple of goals in mind and the first one was actually class fields should be as fast as common properties.
And the idea behind here is that we wouldn't like that the performancy of the class features would be bad so people don't use them at all.
So this was the fishbowl.
And without that, like the patch was not going to upstream at all, not go into production.
Also the second goal that we would like to have here is that private fields and private methods, they should be better than state of the art.
So right now, there are some ways to kind of get and kind of emulate private members on JS, but the idea is that making those class features - the private fields implementation - actually way, way fresher than what we had so far.
And also bring you the ergonomics that private fields could give you as well.
So let's take a look on this.
On this kind of what we call 'state of the art'.
So you can see a lot of JS code and a lot of people actually using WeakMaps and closures to kind of emulate private members.
So in this slide here, we can see that we have kind of a function closing both a private field's WeakMap and also a class.
And the idea here is actually to store private fields within the WeakMap.
That is only accessed by the, within the closure where both the class and the WeakMap is stored and of course setting and retreiving values from these private fields are known by set and get operations within the WeakMap.
So you see that, of course, accessing a WeakMap is potentially much, much lower than that actually accessing a property, right?
So we could see that this kind of solution is actually a little bit less, optimizing than with what we could have with common property access.
And in addition to that, it's not quite economic.
So while using private fields and private methods in the field with the new features is, essentially like naming the fields with the hash at the beginning of the character, here you need to understand some details.
You need to be a kind of a power user of JS to understand what is going on.
So, of course, even though the state of the art is this, we understand that this is not ideal, if you would like to have some encapsulation in your program, right?
And I'm talking here about property access, and I think it's important for you to understand that, property access in JS VMs are actually very high optimize it.
They go off the VMs straight to access property as pretty much as fast as Java programs does.
For example, even though there are some limitations and there is some history shows that the JavaScript semantics imposes to that.
So let's take a little look on how JS objects are actually presented and how we can actually optimize access for perfuse.
It is important of course, to understand that class fields would be also performant as a property access.
Right?
So let's take a look here into the JS Object Model.
So some of you might know that JS objects are quite dynamic, and this means that essentially we can insert, delete and change a property at any time, at any program point.
So this means that it's almost impossible, actually impossible sometimes to know at compile time what's the shape of an object, which kind of properties those objects would have and how we can actually lay out those in memory.
So in this case, I think the only solution would be actually represent those objects as a 'dictionary like' data structure, let's call them a map.
So let's suppose that we have a map and all the JS objects are represented as a single map and we would have something like that.
So for the `obj.foo` that we have in slide here, having the key and the property foo and the value tests and the property bar valued at hundred, you would have a map that has the properties as the keys of the map and the values as the payload of the map.
Of each entry, right?
So this is kind of a good way.
And this would essentially mean that we can pretty much model all the same index of JS objects in general.
And it's quite common to actually see a lot of programs, around the web, using objects as maps as well, which was quite common, like back in the days before the map was introducing JS as well.
So, in this case, you can understand that the access for objects in JS objects are mostly translated to map operations.
So we would have for property access get, it would be a get within the map.
So it's a Map lookup to insert, we also a search within the Map et cetera.
This kind of thing, actually, would be much, much more expensive than doing a common program, appropriately access in languages like C ++, or Java where you just know they lay out memory at compile time and a program, actually doesn't even know what is the concept of an object itself.
It's just, all of those accesses are translated to a couple of these structures in the assembly where you just apply offset to the pointer of the object.
So how could we achieve such kind of performance on JS programs?
Well, there is an optimization called Inline Cache that tries to solve exactly this problem.
So the idea is ,why do we need to still do those lookups.
What about we actually cache all this information of a given access?
And then after that, if we have a similar access to that, instead of doing other lookup again, we just know how to do so and just apply and turn the value where we know already it's store, right?
So this is essentially what Inline Caches are.
So taking here like for example, that we are essentially writing appropriate access operation for a given VM, let's suppose that for the access fold out fu, we would have, of course the object we have trying to access the the property from, the property we are trying to access.
And then the metada that is essentially a pointer to the cache.
And contains all the cache information that we would like to verify.
We could then implement the get operation.
Something like, what we have here.
So we first check into the cache.
If the cache an object is the object that we have tried to access and they also, the property you are trying to access is the one that is also within the cache.
And if that's the case, well, you already know how to access these operation.
So we just do it's pretty much, just knowing the pointer of the storage of the object, the offset we need to apply to access this object.
So we getting the address of this object and returned the value that is stored there.
Otherwise we need to do a lookup and of course, hopefully we can fulfill the cache, if possible during these lower operation.
So that's essentially what Inline Caches are.
And we applied those in pretty much every single property operation that we have on JSC in other VM also does that as well.
So coming back to JS object models, he presented them as a map.
Sounds like a waste of memory at some levels.
So we mentioned that we have a loop and this loop is creating these objects foo and with the property, foo and bar, like a thousand times, having a map to represent those objects for every single object is kind of costly, right?
Because like the map needs to have all these internal data structure that they need to maintain to know, and insert ,and delete and get the operations in a fast manner.
So we would have not only the value of the store there, but also the payload that the map is bringing to us to be using.
So what about if we actually could be able to share these redundant information for every thousand objects, we need only a single map and just like putting kind of a reference to this.
So we know to do how to do stuff.
So, this is the kind of position that JSC also does.
So instead of storing all this map information within the object itself, the object actually becomes pretty close to what we have for Java and C ++ that is a ray of properties.
And then all the information where the keys is stored is actually transferred to these structure.
So these structures, they restructure that we have contains all the information on what we need to do to actually access our given property.
So supposing that we have two objects here.
So object o1 and object o2 And well, obviously o1 has a property foo we've developed tasks and property bar we've divided by 100, and o2 has a property foo with the value blah and a property bar with the value 200.
So we would have the following object being presented in the memory.
So we have at first element the structure ID, but then we just have the string tests and the value a hundred and the following the same for o2 as well.
So the structure is ready in all their properties.
And then the structure ID is actually pointing in a way to retrieve the structure that we would like to have, and they structure itself,.
, stores the information on how we can access each property.
So the structure would be then a map, but instead of storing a key and the value we store a key and the offset that we need to apply to the object to access this property.
So in this case here, we have a foo and the offset we need to apply to the object is 0.
And we can have here in the property bar that we need to apply to the offset 1.
So this way, we can only have a single structure that is going to have all this map payload and et cetera.
But if we have a thousand objects in even a million object these map is going to be constant, and this is structure will be constant and we are not going to waste memory in this case, and I think we can actually take a step further here and instead of caching, for a given object itself, we should actually cache, we could actually cache for the structure because while all the objects that has the same structure, we would then give a kind of a cache hit because we know how to access those properties, right?
So at this point, here we could think that they structures could be seen as the type of objects, but just internally for JSC and VM.
And we actually do that.
So we kind of characterize them as types.
So at the optimizing compilers phase, we actually use those types to kind of remove and do some optimizations more aggressively of course, like respecting the same ethics of JS.
So when this case here in the Inline Catche will change a little bit, instead of checking into the cache, the object, we then check if the structure that we are trying to access is the same as the cached one.
And if that's the case and also the property is the same one that we cache it, we just access from the object storage, the applying the offset to access this given property.
And just to give you a little bit of magnitude here, these operations is potentially 10 times faster than, doing a look up into a map.
And if we don't have the cache field, well, we need to do the lookup and et cetera.
And hopefully we can fill the cache after a this operation.
I think it's important here to highlight that the ordering of properties matter.
So, even though they serve the same set of properties, if they're not in the same order this is going to generate different strutures entirely.
So let's see the example here, where we have o1 with a foo bar and o2 with bar foo, even though they're the same objects as before, since the ordering here is changed, actually they're going to generate different structures.
And this essentially means that actually they Inline Cache for the structure.
If it's cache at the for the structure 10 and we then see a 32, we need to actually go to this low path and do the lookup and the same, like the other way around.
So ordering here matters much also.
It's also the case if you only have one property, but not the other, these also would generate a different structure as well, because well, essentially there are different, types of objects.
Right?
Okay.
Now that you know, a little bit, which kind of optimizations that JSC does to optimize property access.
Let's go back to class features.
So as I mentioned to you in the beginning, the goal was to have, class features as fast as common properties.
So why are public fields are pretty close to a common properties?
It's quite trivial to actually implement them as fast as common property access.
And that's what happened to us.
We just need to do some tweaks h but implementing public fields was way, way, way easier than implemented private fields and private methods.
The challenge that we had for the private fields in private methods was how can we do those kinds of operations while respecting the encapsulation rules as well?
In some cases, well, it's not possible to do them as fast as possible, but we were able to still rely on the IC and the structures that I presented to you before and to make those access quite fast.
And a lot of scenarios here, we are actually able to optimize out the encapsulation rules and having the access for private fields and private methods as fast as common properties as well.
So there is no difference between them in some cases.
Well, let's take a look to see how we can solve that for private fields.
So luckily for us, there is kind of a concept called private symbols within JSC.
And those private symbols are just like symbols that we have for JS programs, but they're not available to users.
So there is no way that users can create Private Symbols or reserve any Private Symbol at all.
And the way that we implemented private fields was using on top of those Private Symbols.
So here we can see that we create a class scope for a class, and we store a symbol for each private field.
We create a new Private Symbol for each private field and store this within this class like so scope and whenever we see upper field access, what we do essentially is we retrieve from the class and scope the Private Symbol.
Represented this private field and then access this using a computed access.
The major difference here from a common property access to a private field access is that it is a computed one.
So they computed property access, luckily for us also has all the support for the Inline Cache, and a structure as well.
And doing such kind of operation, of course, like at implementation is a little bit different and I'm just using JavaScript here to make the concept a little bit easier to use and understand, but like doing such kind of things.
And the bytecode, we are essentially able to, have access to two private fields much much faster than for example, using WeakMaps and closures as well.
So yeah, that's the way we implement in private fields.
Private Methods where it kind of implemented a little bit differently, I would say, but it has some similarities here and I'm not going to details of them.
We can see that private methods would be pretty close to private fields, but instead they are, fields storing a kind of a method and et cetera.
Okay.
So what are the many features that class features are bringing?
So, as you mentioned to you before that class features actually are kind of declaritive types of classes.
So it kind of documents a little bit better, how we are declaring those classes, because just reading the code, we know the set of fields that this class should have.
So this helps a little bit in the documentation of the classes.
In the case of private members actually give us encapsulation and give us encapsulation in a matter of optimization that is quite good.
I would like to mention to you here the just in time compilation.
So at the time that we are optimizing the program, we are able to actually not having these low copy hap in there at all.
So supposing that this field access method here is actually called multiple times.
This the value for perfect field would be constant because, well, there is nothing that is changing this.
The compiler then is able to optimize these access here and constant for this value.
Of course like creating some checks in a speculation to see if it is ever going to change.
But then like the amount of the same restructure that we will generate for these given access for a private field is going to be pretty close to the amount of assembly instructions that we would be generating for a public field access or outcome or property access.
So it is important to know here that at the optimizing layer of compilers for JSC, probably a few of these actually going to be quite fast in most of the cases and well, this essentially bring us one of the major benefits of class features be implemented is that we don't need to have all this big load that is used for using the WeakMaps split closures.
And so the source code would be way shorter and also it's potentially likely that the version of the class features using the native one is going to be much, much faster than using the WeakMaps, because, well, you can try to optimize and implement some caching rules for the WeakMap access, but this would increase your code Weaver even more.
Why all those like machinery stuff for the Inline Cache and et cetera, is already built into the VM itse.
And I think one kind of minor advantage of using class fields is actually because, class fields kind of avoid conditionally added properties.
So what are those, what are conditionally added properties?
Well, if we compare a program using class fields, you're not using class fields, I could try to show it to you what I'm trying to say here.
So suppose that we have a class C and within the class structure off of this class.
We have two parameters as input foo and bar.
And then we create properties based on the values of those foo and bar right a appropriately for when a property bar, but we only create a property bar if the bar is a truthful value.
And well, you can see here based on what we saw about the structures before is that these even constructions here can generate at least two instructors, right?
So one that is containing only foo.
Whenever they see the value of bar is a false zero or something like that Uh and we can have another searcher We foo and bar.
And yeah, so we can see here that in a lot of places, the Inline Cache is going to have some misses, depending on the values that we pass for foo and bar, why you can structure a new object for C.
Right?
However, these kind of situation doesn't happen for a class fields.
So, since I mentioned to you, the fields are going to be stalled before the constructor ever executes.
So at this point here, both these points dot foo and these dot bar, they are actually going to be at the same structure all that time.
And well, it doesn't matter because in this case for a program, the structure of the program in the previous one, this program is going to, we have way more class Inline Cache hits, than the previous version.
So it's potentially going to be faster as some pieces, of course, here and not considering, that you may still have different structures for whatever reason, but thinking in a way that class fields actually make these kind of stable structures for objects.
It's kind of a thing that if you, even if you've don't know the internals of a VM, it's possible to achieve this.
And yeah, I think that's pretty much it hopefully I was useful for your time and, well, thank you very much to be paying attention to this presentation in time to answer some questions.