The truth behind Virtual DOM
(upbeat electronic music) (audience applauding) - Hello, everyone.
Welcome to the talk and thank you for coming here. So the truth behind virtual DOM.
So a bit about me.
I'm an associate developer or intern at Lexvoco doing machine learning work, mainly.
To give you guys a bit of structure about what we are going to do today, we start off with a bit of background, why I want to talk about virtual DOM and how I know about it.
Then we will go through the process in which the browser renders so that you guys could have a bit of idea of where everything sits together in the whole IT world.
Next is a detailed explanation of virtual DOM and APIs that support virtual DOM.
Technologies other than virtual DOM will also be mentioned and the speed of virtual DOM in comparison to those technologies.
Essentially, one of the tasks I received at school was to study an algorithm, and research it, try implement it myself.
I was interested by how people think Preact is performed, so I decided I'd try and research what makes this possible. The algorithm I came across is reconciliation algorithm. It's just another fancy name for diffing algorithm, sure it is more than that but that's the idea. And in Preact, I'm sorry virtual DOM, it diffs the virtual DOM.
So I set out to learn about it as much I could and so researched it.
By the way, I'm a bit biassed towards React, so just bear with me.
If you know the under hood of something, you will know how to use it effortlessly.
The same thing applies here.
In order to know how virtual DOM works we need to know why people rather to use virtual DOM instead of use DOM even though they have completely different things. In the browser, we always start off with HTML. And on HTML we go through the parsing process where every broken pieces of HTML we fixed. For example, opening tab without closing tab, just because we're so lazy.
And a DOM tree will be dealt with.
For the CSS, the style sheets will be parsed into the style rules.
Next step is a attachment process where the DOM tree and the style rules will get combined together into the render tree. Technically speaking, there is more than one tree, but for the sake of the story we just say, oh, there's only one tree, which is the render tree. After is a layout process, where exact coordinates of H-note gets calculated.
The penultimate step is a painting step where the note gets spent and displayed.
And the very last step is to get annoyed when going through the whole process was too slow.
Now, it raises a question inside for us, what happens if we change something on the HTML? Obviously, the browser has to repost the HTML and it has to remove the chart with the ID of divId, and it's gonna update the DOM with the new value. It's gonna traverse the render tree and relay out and paint it to the display, or, in other words, it's a reflow.
As we can see here, in those five steps, parsing the HTML is pretty fast, as they're all just string-based.
The time complexity of parsing is only an hour, I think. On properly-nested input, it typically scans the input document in sequence.
And step two and three are fast as well, since DOM is just are change their structures, so mutating the trees, indeed, very fast.
Nonetheless, the two structures, layout and paint, it should display extremely slow because it has to do lots and lots of calculations. Normally, it has to calculate every position of the text, and then every appearance of the text and so on and so forth.
This video is an illustration of how the browser does the painting and the layout.
We'll just quickly go through the video.
We can see here, the browser has to render all of the children first before doing the parents, and imagine how many children and parents' relationship we have.
I will pause it right here.
And we can see this website is pretty simple, just a nav and a bunch of diffs list maybe, but it's pretty slow already.
This is a comprehensive list of what causes a reflow here. I will tell you about the link later, but the bottom line is everything we do will cause a reflow, which is not a way of life. So how does virtual Dom help? Literally nothing, unfortunately.
So, what is virtual DOM? The virtual DOM is an in-memory representation of the real DOM elements.
That's it, that's all that virtual DOM is doing, Actually more, if I miss it.
I've seen implementation of virtual DOM in Python before, contained inside Jupyter notebook.
That's just weird.
There are just heaps of it, so I wouldn't be doing it today. So, what makes this better? As I mentioned before, well, virtual DOM is obviously pretty useless. So why would you use virtual DOM when you are already having the DOM? Isn't it just a waste of computational power? No, virtual DOM introduces a new way of developing the UI. Treating UI as having one or more states.
By that, it'll be easier for you to use virtual DOM in conjunction with other APIs or algorithms, such as diffing our algorithm or batching bits. And those are two of the most popular one.
As seen in Vue.js, as well as React.
And we have a component, four components will render a bunch of diffs and with a key and value, add the value of each element in the items, then we then have these in the underlying UI. And then after the first, right away we render it and we are just removing all the even numbers from the items array.
Now, we are just getting three diffs.
So this is a visual representation of before and after of the underlying UI.
We, as humans, know that we're just removing the second in the four items.
But, under the hood, without diffing, there could be eight updates on the DOM tree in total. When re-rendering, it has to remove every single items on the left, which takes about five operations on the DOM tree, and add more three notes on the right, which are three operations that base it.
If we are diffing here, the virtual DOM, will be smart enough to know well.
We're just keeping the first and third and fifth items, and we just want to delay the second and the fourth item. So it tells DOM to just removing the second and the fourth items.
So this is based on the assumption that they have the same parents, and when they are not on the same parents, it will be way much more expensive as it has to update the whole sub-tree or the tree. And besides from diffing algorithm, we also have batch updates.
So normally, in virtual DOM, we use, virtual DOM uses diffing algorithm to execute all the updates in one event loop, thus causing the real DOM to update only once, hence avoiding unnecessary reflows.
If any further updates take place, it has to wait 'til the event loop gets over or it has to wait until the next event loop.
So what is it? For instance, it reacts in a clickHandler wave three set state, like so, number 10, number one, 15, number 20.
I'm doing clickHandler here because, right now, in React only updates inside event handlers are batched by default.
I think Dan Abramoff did mention at some point in future versions of React all updates in the applications will be batched by default. I'm not sure about it, but let's see.
Anyway, off the topic.
When the clickHandler gets involved, changes will not get updated immediately.
React will diff all the changes and append it to something so-called the diffQueue, which contains all of the changes in the application in a certain amount of time in the event loop.
And then, at 60 frames per second we do request animation frames.
So, in request animation frames we have to batch all of the diffs in the diffQueue, so batch will use a set of global heuristics to produce a smaller, more optimal set of patches to apply to a DOM tree from the diffQueue. It sometimes do other useful things such as reordering mutations to avoid unnecessary reflows. So this is great if your application has large spikes of stat changes that you may want to condense them into smaller, more optimal set of DOM mutations. And then next is patch.
Patch will take a real DOM element and apply the DOM mutations in order.
This is the part where it does the expensive work of mutating the DOM.
As we've seen, with virtual Dom the DOM files is supposedly to be going to be as efficient as possible at a cost of extra work done in the JS phase.
This extra work results in the manual updates that you may write by hand so another name for it would be the overhead.
virtual DOM is by definition slower than carefully-crafted manual updates, but it gives us a much more convenient APIs for developing the UI.
In fact, if you know exactly what you want to change in the application using JS or jQuery to do the updates could be way, way much faster than doing it with virtual Dom, because virtual Dom has a overhead of calculating the diffs and batching updates, et cetera. And virtual Dom is simply making DOM operations quicker in some cases when you're not doing it the right way. So in the React learning patch, there's no mention of virtual DOM in here, just declarative, component-based, and learn once and write everywhere. And on the Vue.js, virtual DOM is mentioned, but it gets compared with another virtual DOM libraries. So other options, and they all have the same courses. Virtual DOM, making our life easier and avoiding unnecessary reflow.
We have key-value observation in Ember as well as dirty checking in Angular.
I'm only comparing virtual DOM and KVO here 'cause dirty checking in Angular is sort of in the middle of KVO and virtual DOM, and memory is just as vital as GPU 'cause we care about more web users.
And all will be measured in big O notation and V is the size of view and M is the size of your model. View is actually what it renders and models. Could be the data that you download from the server, and view is always smaller than the model.
If we break it down to make our notation, we will see that KVO can, of course, update in constant time because all data are observable.
Virtual DOM updates in linear time, but it is still depending on what diffing algorithm you are using.
The set of the diffing algorithm right now is in cubic term, but since virtual DOM, first, since virtual DOM is in React, it uses a set of global heuristics.
It can reduce it to only linear time.
One of the huge difference is the memory usage. Keeping all of those computed observables is really, really expensive, whereas in virtual DOM we just keep what we render.
A list of 10 items and we want to render three, and then memory will just store those three items. I'm gonna do a bit of a performance here.
If we go to Preact website.
And I'll go to that.
Differences, Preact and virtual DOM differ.
This is an outdated version of virtual DOM diffing, but just gonna quickly copy the.
I just got out of this a few minutes ago.
There we go.
Here's the most up-to-date version, anyway. A repaint challenge gonna compare with Vanilla.js. And React.
Preact, as well.
Where's React? Yeah, here we go.
And in here we can see the red one is optimised implementation.
If we push all of the mutations through 100%, then we can see here React can.
We have to go down through the represent list. All right, we can see for Vanilla.js, it can do about 60 repaints a sec each.
Whereas in Preact it can do about 60, scroll.
Oh, yeah, roughly 60, whereas it consumes 40 megabytes of memory.
And in Vanilla, it consumes, is it, yeah, roughly 40, as well, at the rate of 60 repaints a sec. Whereas in Preact it can only do 40 repaint a sec and consuming 50 megabytes memory.
Anyway, the truth.
Here we go.
So the truth behind virtual DOM is not about performance, so what is it really? Is it a reason? Well, it's the same thing in my year.
(chuckles) No, it's about simplicity, it's about developer convenience. That's all.
(audience applauds) (upbeat electronic music)