It’s time to hit record: an introduction to the Media Recorder API

Jessica Edwards at Code 2019

Transcript

(upbeat electronic music) (crowd clapping) - Hi, so yeah, Media Recorder API, so we saw it a bit earlier, just sort of in short.

So what's the problem that Media Recorder API is trying to solve? I think it's best short of captured, a few years ago I was a junior, to be fair it was my first job, and I was working in ad-tech.

Yeah, I know.

Does anyone remember that movie Warcraft? It was really bad. - Oh my God.

- Yeah, that's the right response.

Oh my God, yeah.

I didn't watch it, but I watched the trailer about 300 times because I was building a short, it was basically, the idea was that Warcraft Yourself. Yeah, it was an interesting title.

And the idea is that you would put your face, you'd take a selfie, and you'd put your face in a video, in the trailer.

Not anyone cool, just a random person, because it was much easier that way.

The idea was that you were an individual in the trailer, that's all.

You're on screen for a few seconds.

It was an interesting project, but the big thing was I remember my producer was like, "Oh, cool, and can I just save "that video that it's produced?" I was like, "Oh God no, no, no." And it seemed so odd, there's these pixels on the screen, their face is on there, it was in a canvas laid on top of a video.

But to an end user, it seems like it's a piece of media that is consumed.

Essentially what we would have to do to get that is we'd have to have an FFmpeg running on a server, we'd have to send it a bunch of frames using WebSockets, for example, stream it over.

And this was built in a couple of weeks, so I was not gonna build a server using FFmpeg and build a pipeline.

So yeah, so at the end of the day we just had to call it quits.

I did find at the time, a few years ago, the Media Recorder API which was perfect.

That's exactly what I wanted.

It would just save, canvas, done.

And the browser's support was abysmal.

It just was non-existent, it was just pie in the sky thinking.

And now I work at Canva as a Creative Technologist. So I'm working on the video team, and so the idea is that we're actually trying to, Canva is a primarily web-based design piece of software, so you can create designs using HTML web technologies. Our markup is mostly just stock standard, divs and all that. Yeah, and now we wanna get video happening and so I actually got to revisit the Media Recorder API for the first time in a while, and it was like, oh, things have actually progressed, which is very exciting.

This is just a canvas where we have a bunch of random circles rendered using math.random and I have a thousand frames or so.

So there's a thousand circles there, and that's done.

And this is cool and all, but then the sad thing is that it only exists at that moment in time. What you saw is probably not gonna happen again. It uses request animation frame, for example. So on someone else's computer, which might be not be a work computer and only have four gigs of RAM, then it would be a lot choppier, the colour profile could be different, it would be rendered in a different browser so it would render completely differently.

So the beauty of the web is you can use it in a lot of different ways but unfortunately, a piece is lost.

If only we could record it.

We could record something that happened, rather than just saying if I share this, you'll get a similar experience but not the same. And there is something sort of valuable in keeping a copy, a master copy.

Something that actually, concretely sort of happened. So no surprise, the Media Recorder API helps us do that. So we can actually have a canvas or, basically we need a media stream.

This is the simplest declaration of the Media Recorder, you just parse in a stream.

What is a stream? Specifically the Media Recorder API, the official spec is the Media Stream Recorder API. So it takes media streams, and there's a few different ways to get it, and I think the most sort of obvious, or the newest way to get it is using media devices. So this is input from yourself, from your hardware. So let's say, this is display media, so this actually you have to ask for permission, and then you get access to your screen.

So this is, I like how trippy it is, like.

(makes whooshing noises) Anyway.

This is a stream, right? So I can't seek or anything, but this is, basically yeah, this is a live output, this media stream. So I've got this in a video, the video is consuming the media stream.

In the same way a Media Recorder can consume a media stream. We also have the user media API.

So this is, again, so you'd often see this used in web ITC where streaming makes sense, it's peer to peer, it's a bit more intuitive. We have a good idea that these are streams, but specifically media streams.

So what we can do as well, to get started, we can record, hello.

And yeah, now we actually have something that, this is playing in a video, we can seek, this is something that's captured. Without the Media Recorder, you would need to save at the frame every second or so.

Grab the pixels, create an image, create an object URL, blah, blah, blah, and then manage to convert those images into a video somewhere else on the server.

And this is all done right there, right now. Yeah, it's super exciting.

And the API is quite simple.

To actually get a stream, in this case this is getting the user media, we have to await because it could fail for whatever reason. So it's promise based to get the initial stream, but then once we get a Media Recorder we can instantiate it with our media stream. Then we are required to start it, but there's a bit more to it.

It's very much an event driven API.

This data available, this is really where the magic happens. So you get an event with data, and on the data object will be a blob, or nothing, or an empty blob, but one of those things.

Based on whatever you're recording, that's what the blob type will be.

So this would be a video in this case, because I was requesting a video from the display media and user media.

On stop, that's when we conclude recording. And that's when we know that we're done collecting all our blobs.

And we can also do the same with media elements. This is less known, I think, but videos as well can have streams.

So this served from a file that I have, and this video is here, but then I'm also streaming it over here.

So we can pause this stream on the right.

So this continues.

So I can pause this and this is going, but you see that the time is still increasing. The stream is still reading the original video. And yeah, we keep going.

And the API's quite similar.

What really annoys me is this one is synchronous. So we have some promise based, some synchronous, a bit all over the place, but you know, web standards.

I am really thankful for web standards, but yeah. Basically, very similar API, there's not much too it. And then we can actually start.

In this case, with start, when you start recording, without passing in any parameters, there's a start event fired and at the conclusion of the video the Media Recorder, seeing that the video has ended, it'll just stop by itself.

I don't have to explicitly call stop in this case because there's no stream left to consume.

And there's one blob that we got, and that was at the end of the video.

So that's generally the default behaviour, in that the browser will go, okay you didn't ask me for blobs very often, so we'll just give it to you at the end.

Which kind of works, but you also are recording something in memory.

So this was a 16 second video, but what if you're going for minutes and hours? You might not necessarily wanna be recording that in memory. So the browser would have to store it somewhere. So what we can do is actually, we can provide start with, this is a time slice. So this says we actually want, every three seconds, we want you to fire the data available event. So that's requested, so then that means that say if I am using WebSockets or something, I can send that information at a pace that works for me. Alternatively, there's request data.

We can actually request it at any point in time. And then if we request the data, then a data available event will occur.

So we have quite a bit of control about when we receive these events.

So you pause and play a regular video.

You can do very much the same with a recording. So in this case we pause, we can play, that's our regular video, but if we record and we pause this video, you can see that we're actually still recording. So we keep playing.

Oh no, that's not good.

That is incorrect.

Let's try that again.

Media Recorder maybe not ready for prime time. So if we pause, we're actually pausing the recording even though the video's still going.

We can resume it, and then once we stop, the file that we get, you can see that there'll be a jump, there'll be a pause and then there'll be a random jump. So we can see that.

So we can have these streams sort of operate independently.

Quite interesting is the canvas media stream.

This is because you generally, in the case of, say, get user media, or get display media, just capturing that is probably enough, you probably just wanna record your screen.

But in the case of say recording a piece of video content, serving from a file, you already have that file. You probably don't want to save that file again. You don't need to record that file that you already have. But you might want to change it a little.

So there's capture streams, so similar API to the HTML media element.

So this is a video on the right, a canvas on the left. So similar to what I had before, we're firing 1000 frames, it's random, and now we can actually save it, we actually have sort of a record of what actually happened. Not this one.

And then, capture stream is actually a bit different for canvas, in that we can actually pass it a frame rate.

So this is saying every second I want you to take one frame. The default behaviour is that every time that a canvas updates, the stream will update, essentially.

So we actually have a bit more fine grained control over the frame rate when it's a canvas in particular. What's interesting as well is that you can do zero. So you can barely see that, there.

And on the right it had one, basically, yeah, at the start I've taken one frame, that's all, because I've said I want a frame rate of zero. On the face of it that probably doesn't make much sense, but you can actually request a frame explicitly, and this tells the stream to update itself. So rather than waiting for the events themselves you would do the requests.

This is basically saying that you have this really fine grained control at the frame level saying, I only wanna update when something else outside of frame rates or time is concerned. I can explicitly request that.

So if I record and request random frames, you can see that there'll be a random frame that comes up.

We can see the jump.

You'll notice that the time, the time continues to elapse regardless of when my frame is.

So the recording continues but the video track won't actually sort of send any more information, any more frames. And so there is actually also the options, which it looks a bit like this.

Important to note that the mime type, it can take the container.

So if you had a, like as Phil had, he had audio/webm, that mime type refers to the container.

The mime type also takes a codecs, which is basically incredibly important when you're encoding and recording, because this is what's gonna vary the most between your browsers.

So codecs.

So VP8 is the default in Chrome and Firefox for recording video.

Opus is the default, so that's an audio codec, so if you have a video with audio and video tracks, the video would use VP8 and the audio would use Opus. But, yes, that varies quite a lot as you'll get into it. And Safari as well, they use the webm container, but then the codec is H264, which is just weird.

Yeah, it's exciting.

If you see video webm, they all do webm, but all the browsers produce pretty wildly different videos. You can also parse in audio bits per second, video bits per second, so they're pretty self explanatory. But the video bits per second, I think it's 2 megabits per second if you don't parse it in as sort of the default recording, whereas audio will be based on the audio.

It's adaptive to the actual audio file.

And bits per second is if you wanna set both at once. Yeah, so issues.

There's a few.

Support.

It's actually pretty promising in the sense that Safari technical preview, they've got this behind a flag, and you can play around with it.

It's all right.

It's getting there, but there's quite a few missing pieces.

Like there's no time slice in Safari, there's still a few, the support's not there, and also yes, it is quite a different webm that's produced. It is H264, and Apple have long been championing their H264.

So that means that you probably won't get the same record experience across all the browsers. Bugs, there's quite a few bugs.

So Chrome will just give you a black screen if you want something too big.

It won't fail or anything, it'll just give you black if it's above a certain resolution.

There's a few different interesting ones with mime types, as well.

If you don't parse in the correct, it sets a default mime type in Chrome which it shouldn't do according to the spec. A lot of the times I'll have a data available event that just appears when I don't expect it to appear, and it has a malformed blob that I can't actually do anything with.

What an odd sentence to say.

And webm, webm is an issue, yeah.

And this is the point that it makes it hard to sort of ship this, I guess.

This has been a real problem for us, is that I can play a webm, because I'm a nerd that has VLC, of course, right? But most of my clients don't know what a webm is. If you download a webm on Mac OS it's gonna be like, what is this? On iOS it doesn't know what to do with it.

Yeah, so that's a really big hurdle.

So it generally still means that if you wanna download it and ship this to your customers, you probably still need a server.

Actually yeah, another big Chrome bug is it just does not render, give you any meta data that's useful.

So the concept of duration doesn't exist in the videos. So then you also have to give it to FFmpeg so then it can calculate all the frames.

Yeah, yeah.

But I am still really hopeful.

There's also work on the encoding front to support AV1 which is a quite upcoming codec, so that's supported in, I think, yeah, I think it's shipped in Chrome now, that you can encode, you can record with AV1, and everyone's sort of getting behind it so I'm hopeful that we can move past our webm days and be happy and merry and share and download our videos. Maybe it's not quite yet time to hit record. So as I said to you, we're just sort of there, but it's definitely, if you have limited use, okay, so say if you just wanna play in browser, you can work with webm, say, if your API supports the webm container.