To build benchmarks, it was decided to teach the app enough about sports to be able to take a news article and identify what sport was being referred to.
In this process, profiling the heap and the CPU showed that learning time increased proportional to the size of a document, which seemed to make sense.
It became clear that the heap size was also growing proportional to document size, which still seemed logical. The expectation was that this would taper off as the application learned words, stored the learning and would not have to learn those words again.
Not only did that not happen, but the heap was growing to triple the size of the document being learned. Something else was happening.
Profiling the actual classifier that was written, it became clear that the classifier’s memory footprint was growing with vocabulary size but proportionally less so as the documents increased in size. So it wasn’t the classifier that was creating the problem.
Chrome’s developer tools timeline showed that garbage collection operations were running very frequently. In fact, 30% of all processing time was being spent on garbage collection.
All values are stored on the heap. A value can have one or more retaining paths, and a value is removed by cuttings its retaining path.
The compiler guarantees that any live record is reachable from the root. Garbage is defined as heap-allocated records that are not reachable by any pointers or chain of pointers.
Garbage collection is reclaiming the memory occupied by garbage and making it available for use when allocating new records. The more objects there are, the longer it takes to collect the garbage.
While the Weak Generational Hypothesis states that most objects die young, V8 splits values between young and old generations.
Since the young generation is small, allocation is fast and collections are fast and frequent, while the old generation is larger and although allocations are still fast, collection can be slower. The two generations use different algorithms to go through and identify and collect garbage.
The young generation uses a semi-space stop and copy scavenger algorithm in which objects are allocated, eventually room runs out and a collection process is triggered to free up space.
When that process is repeated, the retained live objects, having survived a garbage collection cycle, are candidates for promotion to the old generation. The cost of the garbage collection is proportional to the number of live objects retained.
The old generation uses a mark-sweep-compact algorithm. The amount of memory that’s allocated for the old generation is governed by a set of heuristics around the rates of object allocation and how much memory is being used.
As with the young generation, space is allocated to allow for growth – except this time it’s a single space – as objects are added. When there isn’t enough space for an added object, garbage collection is triggered.
Old generation garbage collection involves marking the objects to be retained, then sweeping through the heap and making the memory from non-marked objects available for use.
After a while that process will lead to fragmentation, so a third step compacts the space previously held by the non-marked objects, pushes the live objects together and leaving available memory at the end of the heap.
Old generation uses all the committed heap that’s available. The marking phase takes time proportional to the number of nodes to be marked (the live objects), and the sweep and compact phases can take time proportional to the size of the heap.
Note that the mark-and-sweep and compact processes can be made to be more incremental, concurrent, and in some cases, even run in parallel.
The V8 team is working on a project called Orinoco to do exactly that, have some of the old generation garbage collection phases run in parallel, have less jank and higher throughput.
Sweeping is performed concurrently, and compaction is in parallel, leading to an overall reduction in the time your application freezes for garbage collection.
An additional improvement was being able to schedule garbage collection to take place at idle times.
Remember that every time a new object is allocated, that’s one step closer to triggering a garbage collection, which will pause your application and introduce latency in most cases. The larger the heap, the longer collection is going to take.
A further optimisation was to read in the document in chunks, rather than the whole 85Mb in one go, using Streams.
That had a big impact, causing a 96% reduction in heap size, learning time increasing from 69% to 94% and garbage collection time decreasing from 30% to 1.87%.
Total time to complete improved by about 22%.
More optimisations were introduced, such as using web workers, but it’s clear that understanding memory management and focusing on improving it was what made the big difference.