Overview of the Core Web Vitals metrics

Annie Sullivan at Lazy Load '21

Transcript
Accessible Slides

Hi, everyone!

I'm really excited to talk about the Core Web Vitals today.

Specifically, I wanted to talk more about how we designed the metrics - our guiding principles, and how we applied them.

So first, I'm going to go over that background, and then I'm going to dive into each metric and explain it more deeply.

Finally, I'm going to cover a bit more of how we're thinking about what's beyond Core Web Vitals.

So first, what are Core Web Vitals?

They're metrics for web sites designed to bring a focus on the user experience.

We intend to keep these to a very small set, so that we don't divide developers' attention between too many things, and they all follow the same guiding principles.

First off, each metric should apply to all web pages.

It's really important to us to have broadly applicable metrics that can be shown in all sorts of tools and dashboards.

Second, each of the metrics directly measures a key aspect of the user experience.

Third, every Core Web Vital is measurable in the field so you can get the ground truth of what users experience on your site.

I want to dive more deeply into the second two principles.

First off, let's talk more about direct measures of user experience.

What does that mean?

There's a bit of subtlety here, but I think it's important to understanding what we decided to include.

A direct measure of user experience measures something the user can actually see or experience: things like how long it took for content to display on the page, or how long it took to respond to a user input.

They're different than indirect measures, which are often used as proxies for user experience.

I've listed some examples of these proxy metrics on the right.

Sometimes the proxies are points in time, leading up to something the user can see, like Time to First Byte.

Sometimes they're more measures of best practices like counts of unused bytes and unnecessary blocking requests.

We get a lot of questions about these proxy metrics - why not include them in Core Web Vitals?

Well, first there are a lot of them and each one might matter more or less for any particular web site.

For example, measures of image compression matter a lot more on image-heavy sites than they do on text-based sites.

Second, most of these metrics check whether a web site has employed certain best practices.

And while following best practices is great in general, a site can have a great user experience even if it doesn't follow best practices.

And best practices change over time as browsers and frameworks add new features, but good user experience usually doesn't.

We want to measure how the site is actually experienced by end users, regardless of these underlying technology choices.

So should you monitor proxy metrics?

We believe that you should dig into which ones are most applicable to your site's overall user experience and monitor those.

Recording things like bytes of JavaScript added on every commit can be super helpful for preventing regressions.

But when measuring success, we recommend tracking direct measures of user experience.

So that's where the Core Web Vitals are focused.

The second guiding principle I want to talk about is measuring in the field.

First, let's go over what measuring in the field is.

We use the term 'field data' to mean data that captures what real users experience as they browse your website.

This is also called Real-User Monitoring, or RUM for short.

Some examples of products that show field data are the Chrome User Experience Report and Google Analytics.

Field data is different from lab data, which generally comes from debugging tools.

Lab data can come from an actual device lab, like you might be tapping into when you use WebPageTest, or from a device you're using locally like the phone being tested in the picture.

One thing that's important to remember when looking at field data is that your site will have a lot of page loads, and that means there will be a lot of field data.

It will form a distribution, usually shaped with a long tail like this one.

On the left, you can see that a few users had a lightning fast experience - a few hundred milliseconds!

While on the right, a few users had a very slow experience, waiting several seconds.

But most of the page loads were around 1-2 seconds.

When we have a distribution like this, we usually report it as a single number, generally, a percentile.

For Core Web Vitals, we've chosen to look at the 75th percentile.

Why?

As you can see on the chart, it covers most of the page loads.

If the 75th percentile is good, most user experiences on the page are good.

And you can also see on the chart, most sites have a very long tail of slow experiences.

For sites that have less traffic, higher percentiles in this long tail can be really noisy, swinging up and down wildly from day to day.

We wanted to provide more stable and representative data for these sites, so we chose the 75th percentile to cover what most users experience on the site without too much variance.

If your site has enough traffic that they're stable, we definitely recommend looking at higher percentiles too, like the 95th to the 99th.

You want as many users as possible to have a great experience on your site!

Going back to lab tools, when you run a lab test in a tool like Lighthouse or WebPageTest, it's just going to give you one point on the curve.

And because the devices and networks of each user's site are different, there's no way to predict where on the real user curve that lab run will fall.

So lab tools can't perfectly predict what your 75th percentile will be.

But they do give you lots of deep debugging information and opportunities for improvement you won't be able to get from field data.

And you can set up lab tools to run in your continuous integration tests, which is one of the best ways to catch performance regressions early.

So while lab tools are super helpful, we recommend using them in conjunction with your field data to better understand where to focus.

So that sums up the guiding principles behind Core Web Vitals.

They're direct measures of user experience, measured in the field so that we can bring the focus on what real users are experiencing as much as possible.

So which user experience do we measure, exactly?

I'll walk through them one by one.

The most broadly used measure of web page experience is page load time.

People load a lot of pages on the web, and there are lots of great studies showing they want those pages to load quickly.

So our first metric is a measure of page load time.

Largest Contentful Paint.

But how did we arrive at that metric specifically?

Let's start with an example filmstrip of a web page loading.

First it's blank, and then a skeleton loads in, and then the text and images.

We think if that if we're going to focus on the user experience of this page, we really want to measure the fourth entry in the filmstrip, which shows when the main content is visible.

Let's talk through some existing metrics and where they show up on the filmstrip.

One really popular metric in web performance is First Contentful Paint.

The First Contentful Paint is the time at which the first text or image is painted to the screen during page load.

In this example, it corresponds to the second entry in this filmstrip.

It's a great metric because it shows when the page load starts to become visible to the user.

But for Core Web Vitals, we really wanted to start with just one page load metric, and have that metric focus and when the main content is loaded.

And First Contentful Paint occurs much too early for that.

Another loading metric is Speed Index, the average time at which visible parts of the page are displayed.

We love speed index - it corresponds really well to when the main content is loaded.

But all Core Web Vitals need to be measurable in the field.

It's not really feasible to implement speed index in a browser in a way that doesn't slow down page loads - that's why tools generally calculate it from a video instead.

So we wanted to find something similar, that would give us a good estimate of when the main content on the page was shown, but was fast enough to calculate in real time as every page loads.

Our biggest building block was the ability to know when each image and block of text in the viewport is painted to the screen.

So we took that and tried lots of different combinations - largest image paint?

The last text paint?

We built a tool that loaded several thousand web pages, and recorded filmstrips showing exactly where each variation of largest or last image or text paint occurred.

We looked at the differences between each metric variant, and the pages that were outliers, with really small or really large metric values.

We spot checked many sites and after a lot of analysis, we came to Largest Contentful Paint - the time when the largest image or text block on the page is painted, excluding background images.

We implemented it in Chrome and did a large-scale analysis, which showed that it worked well in practice.

We also did a correlation study with speed index, showing that it produces similar results for most web sites.

Here's how Largest Contentful Paint does on our filmstrip - it identifies the headline image as the largest, which lines up with when the main content was visible to the user.

Here's how Largest Contentful Paint does in our film strip -it identifies the headline image as the largest, which lines up with when the main content was visible to the user.

Overall, it generally does well at identifying when the main content on the page is visible.

So we were really happy that we found a good metric to encourage developers to display content to the user quickly.

But one problem we see when sites hyper-optimize showing content quickly is that they often do it at the expense of interactivity.

They push all the scripts to a giant block at the end, and then there can be a huge amount of work on the main thread from loading them all at once.

This makes the page slow to interact with.

We wanted a metric that would capture this, so that developers ensure their page is both fast and interactive.

The first attempt at an interactivity metric was Time to Interactive.

You can see on this diagram, as a page loads it has network requests (at the top) and JavaScript tasks running on the main thread.

Some of these are quick (in green) and some of them take longer (in orange).

Time to Interactive was intended to be an all-in-one metric that measured the main content loaded and the page becoming interactive.

It measures the start of the first 'quiet window' of 5 seconds with no long tasks and no more than two in-flight network requests.

But time to interactive isn't reliable in the field.

First, some pages never have a quiet window.

They continue to have background tasks right up until they're unloaded.

Second, the quiet window cannot be computed if the user taps or scrolls before it acts.

These problems mean that about half of all page loads in the field don't have Time to Interactive reported at all.

But we said that the problem we want to look at is main thread tasks blocking interactivity during load, right?

So why not just look at a metric that reports main thread time directly, like the longest task or the total blocking time?

The reason is that if a long task happened but it didn't interrupt the user, how do we know that's a bad thing.

Maybe the web page scheduled the work to do those tasks when the user wasn't interacting on purpose.

So we want to go back to the idea of directly measuring a user experience.

And that user experience is when the user clicks, taps, or presses a key, and the main thread is blocked by lots of scripts running.

So that's what First Input Delay measures - the time until the main thread is unblocked, and the browser can start to process the interaction.

Next, another user experience problem that we wanted to tackle.

A really frustrating part of the web today is when content shifts around, when you're trying to read it.

Or even worse, right when you're about to click on something.

So we worked to implement a metric that can surface when this happens.

It's called Cumulative Layout Shift.

I've gotten a lot of requests for a deep dive on how the metric works, so I'm going to cover it in more detail here.

If you don't find all the specifics I'm about to go over super interesting, don't worry.

There are tons of great tools to help you find and fix layout shifts on your site and you don't have to deeply understand how they're computed in order to improve the visual stability of your site.

To understand how it works, let's start by looking at what a layout shift is.

Here's an example of a single layout shift.

The blue image loads, and the text at the bottom of the page is shifted downward.

Each time an element on a page is shifted, we calculate a layout shift score.

The score looks at two things: how much of the page moved, and how far it moved.

Let's work through an example.

First, how much of the page moved?

We take the area of the screen where the element used to be, and the area of the screen where it is now, and combine them.

That's the impact region.

Then we divide that area by the area of the viewport, to get the impact fraction.

In this example, the impact fraction is about 0.4, or 40% of the viewport impacted.

Next, we look at how far the element moved.

Since the element moved downward, we look at the distance it moved down divided by the viewport height.

In this example, we get a distance fraction of about 0.25, meaning it shifted down about 25% of the viewport.

Now that we know how much of the page moved, and how far it moved, we compute a score.

The score is just these two fractions multiplied together.

So if one of them is small, the score will be lower.

That's because a big region only shifting a little bit or a small region shifting a lot isn't as bad of a user experience as a big region shifting a lot.

In this case, the score is 0.1, which is our threshold home for what can be considered "good".

And the metric is called Cumulative Layout Shift because if there are multiple layout shifts on the page, the scores of each are summed together into a cumulative score for the page load.

And one thing that's really important to us is that we capture the frustration users have when the content shifts around after the page is already loaded.

When they're trying to scroll through an article and the text shifts out of view, or when they're trying to click a link and it moves out from under their mouse.

So Cumulative Layout Shift measures the layout shifts throughout the whole lifetime of the page.

But for pages that are open a really long time, the layout shifts can add up a bit too much.

Here's an example that we saw pretty frequently when we did an analysis of pages that had high CLS scores and were open for a long time.

This page has an infinite scroller, and there's a footer at the bottom.

As the user scrolls, the footer shifts down every time new content comes in.

It's a little annoying, but if the user has a page open for several minutes, and the score continues to increase and increase, it's not infinitely annoying.

We thought there should be a cap on the score for experiences like this - they're not perfect, but the user frustration is captured in the first couple of shifts.

So we dove into a lot of different ways to address this in the metric.

And we came up with what is called the session window.

Here's an example of a timeline of a page.

Over time, you see various layout shifts happen.

Those are the blue bars, and their heights correspond to the score for each shift.

You'll notice that they're split into groups.

This is really common on web pages; you'll often get a burst of shifts when loading or scrolling or doing a single-page app transition.

A session window split these bursts into separate groups.

The splitting is pretty simple.

It makes a new session window every time there is a 1 second gap with no layout shifts.

But what about an infinite scroller with a constant stream of tiny layout shifts?

For that case, if there are no gaps, we cap the session at 5 seconds.

After we've split the layout shifts into sessions, we just take the session with the highest score.

You can read a lot more about how we decided on these numbers in the blog post I've linked in the slides.

But here are a couple examples to clarify how the windows work on web pages.

First, let's go back to our infinite scroller.

Every time that tiny bar at the bottom of the page gets pushed down for new content, there's a layout shift with score 0.01.

This happens once every 0.9 seconds, which means we'll never have a 1 second gap.

So the session window grows to its maximum size of 5 seconds during which there are 5 layout shifts.

These sum up to score of 0.05.

That's about half the threshold we consider good, so if this is the only type of layout shift on the page, the page still meets the "good" threshold.

Here's a different example.

This page has a small layout shift when it's loaded, as that little red promo bar at the top, shifts the content down.

Then the user scrolls down until they get an unsized image, and there's a big shift as the image loads in and the text below shifts down.

Since these shifts are more than one second apart, they're split into separate windows.

The score of the larger window is reported - in this case that's the score for the big shift during scrolling.

And that's how CLS is defined.

So to recap, Cumulative Layout Shift is the sum of the layout shifts during the worst period of shifts throughout the page lifetime.

And that's all three of the Core Web Vitals metrics.

In addition to the metrics, we also have recommended thresholds.

I wanted to talk a bit about these.

The idea behind the thresholds is to identify the best user content on the web.

And that's all three of the Core Web Vitals metrics.

In addition to the metrics, we also have recommended thresholds.

I wanted to talk a bit about these.

The idea behind the thresholds is to identify the best content on the web.

We base these thresholds on two things: User experience research about the ideal metric values and achievability of good scores in the real world.

For Largest Contentful Paint, user experience research says that ideally a page would load in as little as 1 second.

But looking at the web today, even the very fastest web content doesn't consistently load in one second.

We find that the best content loads in about 2.5 seconds or less.

So that's where we set the threshold - a value attainable by the best web content, that is as close to what the user experience research recommends as possible.

For First Input Delay, a lot of user research says that a response to user input should happen in 100 milliseconds or less.

On mobile, about 75% of sites do meet this threshold for first input delay.

So it's a bit lenient.

But we didn't want to set a threshold lower than what we saw in existing research.

So we kept the 100 millisecond threshold even though most sites meet it pretty easily.

We plan to extend the responsiveness metric in the future to cover more of the user interactions in the page, which should make the threshold a bit harder to meet, but also cover more of the user experience.

For Cumulative Layout Shift, ideally there would be no layout shifts on the page, so the ideal score is zero.

But when we looked at what the best content of the web achieves today, we thought the 0.1 is a more achievable threshold.

So that about covers the Core Web Vitals.

But I wanted to cover another question we get a lot.

Do you really need only three metrics to measure everything in your website?

And the answer is that no - you know your product best, and a lot of the things you want to measure are product specific.

We thought a lot about that when designing the APIs for the metrics, and we worked to make them flexible to other use cases.

The element timing API, which powers Largest Contentful Paint can show you when any element on your page painted, and it can be useful for product-specific metrics or single-page app measurements.

The event timing API, which powers First Input Delay gives you timings on every event throughout the page's lifetime, so you can build metrics that expad far beyond loading interactivity.

And the layout instability API reports individual layout shifts that you can aggregate any way you like; you could report them only for scrolls or break them down by single-page app transitions.

If you want to learn more about the API design or the standardization process, please check out Nic and Nicolas' talk tomorrow, "What's new at the W3C Performance Working Group?" And that's what I wanted to cover on Core Web Vitals today!

To recap, our guiding principles for the metrics are that they apply to all web pages, directly measure user experience, and are measurable in the field.

The metrics are largest Contentful Paint, which reports when the main content was visible to the user, First Input Delay, which measures interactivity during load, and Cumulative Layout Shift, which measures unnecessary content shifts.

As we've worked on the metrics, we've worked hard on the underlying APIs that power them, to give building blocks to developers looking to write custom metrics for their own products.

Thanks everybody so much for listening.

Header image of the Core Web Vitals logo which features a screen showing data tracking metrics overlaid with three icons representing performance, responsiveness, and stability.

Overview of the Core Web Vitals Metrics

Annie Sullivan (@anniesullie)
Lazy Load

Agenda

Presentation agenda timeline delineated into three sections labelled: Overview, The Metrics, and Beyond Core Web Vitals. Each of the three sections contain further details of each part of the talk as follows:

Overview
Guiding Principles

Applies to all pages
Direct measure of UX
Measureable in the field

The Metrics
Overview of Metrics

Largest Contentful Paint
First Input Delay
Cumulative Layout Shift

Beyond Core Web Vitals
Web Perf APIs

What else can you measure?

What are the Core Web Vitals?

Metrics designed to bring a focus to user experience.

Three images with labels and text underneath as follows (left to right):

Image of the Earth's surface as seen from outer space

Apply to all web pages

Surfaced across all our tooling

Image of a hand using a touchscreen tablet to view images. A laptop computer is in the backdrop

Directly measure user experience

Small number of key, user-centric outcomes

Image of a data management software interface showing a range of metrics in graph and number formats

Measurable in the Field

Ground truth of what real users see

Direct Measures User Experience

Direct Measures

When content was displayed to the user
Time to respond to a user action

Proxies for User Experience

Time to first byte
Number of render-blocking resources
Bytes of JavaScript loaded
Total time the main thread was blocked
Percent unused CSS

Why not just measure proxies?

Different proxies correspond more or less to good user experience on different sites.

Some great experiences don't really rate well on audits of proxy metrics.

For broad applicability, we need to measure the user experience directly.

Image of a question mark

Our guidance

Monitor proxy metrics that are important to your site's performance, but measure success with direct user experience metrics.

Measurable in the Field

Understanding field vs lab

Image of a person using a laptop sitting on top of a wall overlooking a city skyline at sunset

Image of the Earth's surface as seen from outer space

Field

Real-user monitoring (RUM)

Popular field tools:

Chrome User Experience Report
Analytics providers

Image of a hand using a smartphone. A laptop computer is in the backdrop on an office-style desk with a cup of coffee and a pair of eyeglasses on the desk

Lab

Local debugging
Continuous integration testing

Popular lab tools:

Lighthouse
WebPageTest

Image of a distributed line/bar graph showing page load times on a scale from 100 milliseconds to 7600 milliseconds, measured in increments of 500 milliseconds. The graph follows a downward curve, with lines climbing rapidly at the beginning (coloured in green representing fast) and then taper gradually down to yellow (to represent slowing) around the 2600ms mark, then red (representing slow) at the 4100ms mark, with the red lines continuing to lower along the x-axis, forming a long tail on the graph.

An image of a smartphone screen showing a highway is overlaid at the front (tall) end of the graph and is labelled: Super fast!

An image of a smartphone screen showing a garden snail crawling over a leaf is overlaid at the far (short) end of the graph and is labelled: Very slow

Distribution

Image of the same distribution graph as prior slide with the 75th percentile mark highlighted. This is located between the 2100ms and 2600ms markers, as the green bars have begun to taper down toward the yellow zone

Image of the same distribution graph as prior two slides with an image of a smartphone screen showing a person using a laptop overlaid and pointing to the 3600ms marker on the graph. This is located where the yellow bars are beginning to taper down toward red (slow). The image is labelled: Lab is just one data point

Image of the same distribution graph as prior two slides with an image of a smartphone screen showing a person using a laptop overlaid, this time labelled: And you can't be sure where on your field distribution it will fall. A bidirectional arrow is overlaid on the distribution graph and encompassed within the scope of the arrow span is an image of a software lab interface showing a range of data clusters

Image of the same graph as prior two slides with two overlaid images. The first image is of a smartphone screen showing a woman with a child in her lap using a laptop. The image is labelled: So we want to keep focused on real users. The second image is of a smartphone screen showing a different person using a laptop. This image is labelled: And use lab as a tool to help

Direct measures of user experience
+
Measured in the field

Brings our focus to real users, and their experience on the page.

Image of two women laughing whilst watching the same laptop screen. They are outdoors and have notepads with them

Which user experiences?

How we decided on the metrics.

When is the main content loaded?

Largest Contentful Paint

When is the main content loaded?

The filmstrip images of the sequence of a web page loading. In the first slide the screen is blank save for box and section element outlines. In the next it has some smaller images and partial text loaded. In the last all the images and text are loaded and clearly visible

When is the main content loaded?

Same filmstrip reel as the prior slide with the second snapshot (the partially loaded screen) highlighted and labelled: First Contentful Paint

When is the main content loaded?

Same filmstrip reel as the prior slide with the third snapshot (the fully loaded screen) highlighted and labelled: Speed Index

Ideas

Largest Image Paint?
Largest Text Paint?
Last Image Paint?
Last Text Paint?

Image of a Post-it note with a lightbulb drawn on it pinned to a corkboard

Largest Contentful Paint

Time when the largest image or text block on the page is painted, excluding background images.

When is the main content loaded?

Same filmstrip reel as the earlier slide showing intervals in a page loading, here with a portion of the third snapshot (the fully loaded screen) highlighted - the highlighted section features the largest block on the page and labelled: Largest Contentful Paint

Loading interactivity

First Input Delay

Interactivity during load

Why not Time to Interactive (TTI)?

In the field, half of pages do not report TTI.

Why?
•TTI cannot be computed if interrupted by input/scroll
•Many pages have continuing long tasks until unloaded.

Image of the Time To Interactive lab metric interface, which attempts to measure the ideal conditions for interactivity by monitoring JavaScript main thread activity and the network activity from the start of page navigation. In the diagram, the network requests are represented by horizontal grey bars of varying lengths at the top of the screen. The main thread JS tasks are represented by green (for quick, defined here as tasks shorter than 50ms) and orange (for slower, defined here as a task longer than 50ms) bars in the middle of the screen. A vertical Time To Interactive line in the centre of the diagram indicates where a page becomes interactive.

Interactivity during load

Why not a measure of long tasks?

If a long task happens, and it didn't interrupt the user, are we sure it's a problem?

Repeat image of the prior slide diagram measuring Time To Interactive with two overlaid arrows asking why metrics identifying main thread tasks blocking interactivity during load can't be used. The first pair of arrows point to two of the longer JS tasks (represented by orange bars) that represent Total Blocking Time. The second arrow points to the longest bar on the main thread that represents the Longest task

Interactivity During Load

Main Thread Blocking

What happens when the user tries to tap the page during load?

Image graphic of a hand touching a smartphone screen showing a wireframe interface display. As the finger taps, a red star and a warning bar appear, reading: Main Thread Blocked

Interactivity During Load

Main Thread Blocking

Time from when the user clicks, taps, or presses a key, until that event can be processed.

Image graphic of a hand touching a smartphone screen showing a wireframe interface display. The point they are tapping is returning a green circle and a descriptive bar reading: Main Thread Free

Visual Stability

Cumulative Layout Shift

Cumulative Layout Shift:
Example

Example

An unexpected image (blue) is loaded, pushing the text downwards

Animated graphic demonstration showing grey and blue blocks representing text and image elements on a smartphone screen. In the animation, the blue box element appears after the grey box elements and forces the grey boxes to jump down and shift position as the image element is loaded

Individual Layout Shifts:

CALCULATION

Impact fraction: 0.4
impact region / viewport area

Distance fraction: 0.25
move distance / viewport height

Score: 0.4 * 0.25 = 0.1 impact fraction * distance fraction

Animated graphic of the layout shift demonstration from the prior slide beside a static image showing a red outline overlaid on the smartphone screen. The outline encapsulates the area of the inserted blue image element in the demonstration and the grey text elements that moved as it loaded. The vertical axis of the outline (which correlates with the area of the area of the screen where the original text used to be as well as where it is now) is labelled Move distance and the horizontal axis of the outline (which correlates with the viewport width of the screen the contents appear on) is labelled Impact region

Multiple Layout Shifts:
Cumulative

EXAMPLE

Each individual layout shift is summed to a total score.

This page has two layout shifts, which are added together for a cumulative score.

Animated demonstration of Multiple Layout Shifts on a smartphone screen interface. Text elements (represented by grey horizontal bars) move around as different sized image elements (represented by blue and pink blocks) load. First the screen shows only text blocks. Then a pink (smaller) image element appears which shifts the text layout. Then a blue (larger) image element appears which shifts the text a second time

But what about pages open a really long time?

EXAMPLE

This page uses an infinite scroll pattern. As the user scrolls through the page, more blue items are added.

The footer is shifted down, producing a small layout shift.

The layout shift is a little annoying, but if the user scrolls for several minutes should the score continually increase?

Animated demonstration of an infinite scroll pattern on a smartphone screen interface. At the start the page has a black header bar and two columns of blue boxes representing image elements. As the page is scrolled, more blue elements are added under the existing ones. Grey text beneath the boxes is also continuously moved and anchored to the bottom of the most recent boxes

Capping score with window

Maximum gap between layout shifts in same window = 1 second

Maximum window size = 5 seconds

Each window's score is the sum of its layout shifts.

The window with the largest score is reported.

Animated demonstration of how a session window metric can be applied to layout shift. The animation shows an x-axis measuring time and a y-axis measuring layout shift amount. The layout shifts are represented by blue bars of varying heights, (as plotted across the axes by individual scores calculated by multiplying how much and how far the page moved). The shifts are then grouped into three session windows. A new session window starts each time there is a one second gap with no layout shifts.

Details: https://web.dev/better-layout-shift-metric/

Capping infinite scroll

EXAMPLE

The scroller has layout shifts with score 0.01 every 0.9 seconds.

So there is never a 1 second gap, and the window maxes out at 5 seconds.

So there are 5 layout shifts.

Score =
0.01 + 0.01 + 0.01 + 0.01 + 0.01 = 0.05

Animated demonstration of an infinite scroll pattern on a smartphone screen interface. Unlike the prior animation which showed how infinite scroll worked, the scroll has now been capped by only using cumulative layout shifts scores between 0.01 and 0.9, and setting the session of the window to five seconds. At the start the page has a black header bar and two columns of blue boxes representing image elements. As the page is scrolled, more blue elements are added under the existing ones. Grey text beneath the boxes is also continuously moved and anchored to the bottom of the most recent boxes, but the scroll stops after five layout shifts.

Separating distinct shift events

EXAMPLE

A small shift of 0.02 occurs at the page load.

A few seconds later, as the user scrolls, an image pops in and shifts down the text, resulting in a layout shift of 0.08.

The shifts occur in separate windows, and the page score is the maximum window, 0.08.

Animated demonstration of an infinite scroll pattern on a smartphone screen interface. The layout shifts are split into separate session windows. On loading, the page has a.02 second delay before a red header bar appears above a pink box representing an image and several rows of grey bars underneath representing text. As the page is scrolled, a blue image element appears and shifts the text beneath it downward

Cumulative Layout Shift

Sum of the layout shifts during the worst period of shifts throughout the page lifetime

Thresholds

Thresholds aim to identify the best web content. They are based on:

User experience research and achievability

Thresholds

(Loading)

LCP

Largest Contentful Paint

(Interactivity)

FID

First Input Display

(Visual Stability)

CLS

Cumulative Layout Shift

Three visual representations of the Core Web Vitals scoring bars demonstrating the range of respective threshold performance metrics for Largest Contentful Paint, First Input Display, and Cumulative Layout Shift. Horizontal performance indicator bars under each metric are delineated into the categories of Good (marked in green), Needs Improvement (marked in orange), and Poor (marked in red). The associated threshold measures are indicated by black markers on each respective indicator

For LCP, the typical delineation between Poor and Needs Improvement is 4.0 seconds, and the delineation between Needs Improvement and Good is 2.5 seconds

For FID, the typical delineation between Poor and Needs Improvement is 300 ms, and the delineation between Needs Improvement and Good is 100 ms

For CLS, the typical delineation between Poor and Needs Improvement is a score of 0.25 and the delineation between Needs Improvement and Good is a score of 0.1

What about other metrics?

Core Web Vitals aim to cover key experiences user encounter on all web sites.

But most websites have user journeys beyond just loading and content shifting! We want you to be able to create metrics for product-specific user experiences, too.

We took this into account in API design.

Web Performance APIs powering Core Web Vitals (and beyond)

Element Timing API

Measure the time to paint any element on your page.

Can use as base for:
• More fine-grained page load metrics
• SPA transitions

Event Timing API

Measure delay, processing time, next page for all events.

Can measure responsiveness throughout page lifetime.

Layout Instability API

Measure layout shifts for each rendered frame.

Can compute CLS for individual SPA routes.

Can report details on shifted elements.

Recap

Timeline with three delineated sections labelled: Overview, The Metrics, and Beyond Core Web Vitals

Overview of the Core Web Vitals Metrics

Annie Sullivan (@anniesullie)Lazy Load

Agenda

OverviewGuiding Principles

The MetricsOverview of Metrics

Beyond Core Web VitalsWeb Perf APIs

What are the Core Web Vitals?

Metrics designed to bring a focus to user experience.

Apply to all web pages

Directly measure user experience

Measurable in the Field

Direct Measures User Experience

Direct Measures

Proxies for User Experience

Why not just measure proxies?

Our guidance

Monitor proxy metrics that are important to your site's performance, but measure success with direct user experience metrics.

Measurable in the Field

Field

Lab

Distribution

Direct measures of user experience+Measured in the field

Which user experiences?

How we decided on the metrics.

When is the main content loaded?

Largest Contentful Paint

When is the main content loaded?

When is the main content loaded?

When is the main content loaded?

Ideas

Largest Contentful Paint

Time when the largest image or text block on the page is painted, excluding background images.

When is the main content loaded?

Loading interactivity

First Input Delay

Interactivity during load

Interactivity during load

Interactivity During Load

Interactivity During Load

Visual Stability

Cumulative Layout Shift

Cumulative Layout Shift:Example

Individual Layout Shifts:

Multiple Layout Shifts:Cumulative

EXAMPLE

But what about pages open a really long time?

EXAMPLE

Capping score with window

Details: https://web.dev/better-layout-shift-metric/

Capping infinite scroll

EXAMPLE

Separating distinct shift events

EXAMPLE

Cumulative Layout Shift

Sum of the layout shifts during the worst period of shifts throughout the page lifetime

Thresholds

Thresholds aim to identify the best web content. They are based on:

User experience research and achievability

Thresholds

LCP

Largest Contentful Paint

FID

First Input Display

CLS

Cumulative Layout Shift

What about other metrics?

Web Performance APIs powering Core Web Vitals (and beyond)

Element Timing API

Event Timing API

Layout Instability API

Recap

OverviewGuiding Principles

The MetricsOverview of Metrics

Beyond Core Web VitalsWeb Perf APIs

Thanks!

Annie Sullivan@anniesullie

You may also be interested in

What the heck is Edge Computing anyway?

HTTP2: The reckoning

Preact: Into the Void(0)

Annie Sullivan (@anniesullie)
Lazy Load

Overview
Guiding Principles

The Metrics
Overview of Metrics

Beyond Core Web Vitals
Web Perf APIs

Direct measures of user experience
+
Measured in the field

Cumulative Layout Shift:
Example

Multiple Layout Shifts:
Cumulative

Overview
Guiding Principles

The Metrics
Overview of Metrics

Beyond Core Web Vitals
Web Perf APIs

Annie Sullivan
@anniesullie