Intl we meet again

(whimsical music) - Okay, I'm here today to talk to you about Interna. (mumbles) Interna. Local, supporting users who live where you don't. Okay, I'm talking about internationalisation. Internationalisation is hard to say, hard to spell, no one can agree whether it's an S or a Z. Everyone ends up just shortening it to "i18n". And the reason is there's eighteen letters between the I and the N. Localization similarly becomes "l10n". You'll see these a lot. Fun little fact, the process of shortening a word like this is known as a numeronym. Or as I prefer an "n7m". (audience laughs) Okay, but what is internationalisation and localization? Simply put it's displaying text and numbers in a way that makes sense to as many people as possible around the world. In computing it's always based on the concept of a locale. You'll see these a lot in various examples. But at the basic level a locale is a combination of a language code and a country code. But a locale is not a language and a locale is not a country. Also a language is not a country, which is why you shouldn't ever use flags to represent languages. But a locale just represents a region that shares a group of conventions. Language is one of those conventions, but there are others just like how do you write dates. Canada is a great example of this, because English Canadian formatting is different from French Canadian formatting which in turn differ from the formatting in England and France. Okay, so getting to the point. Imagine you have a web based product or service as I imagine most of us do. And you support a single market, but you want to expand and go to different countries. First question that comes up is what about translations? Okay yes, translations of text are very important. They're also a huge can of worms and I'm not actually gonna cover that in this talk. It's gonna be a whole talk unto itself. But even if you've got all the text translations what about all the other stuff like dates and times, and currencies. You're going to need to know how to display that for the different people. In order to do that you're going to need lots and lots of data. And if your server rendering your application getting all that data is not a problem. But in this brave new world of shoving megabytes of JavaScript cruft into everyone's browsers, that data has a cost. And even once you have the data you then need to work out how you're going to use the data. So we turn to myriad open source libraries in order to do this. But, now you have another problem. Because the wider you cast your support of languages and locales, the more likely it is that you are supporting users who are not running fantastic devices that can handle all of this data. To steal the phrase from Bruce Lawson We work on the world-wide web, and not the wealthy western web. There are many many people around the world who cannot afford the high powered devices that we use in development. So now you have a balancing act between perfect international support on one side and small JavaScript bundle sizes on the other. Now when you set your operating system to use a particular language or region, and the native applications that are built on top of those. They get their data from a veritable alphabet soup of standardised providers and standards. So you've got the, International Organisation for Standardisation. The Internet Assigned Names Authority. The Internet Engineering Task Force. The Unicode Common Locale Data Repository. The International Components for Unicode. And the World Translation Foundation. Except not that last one, cause I just made it up. (audience laughs) But, when you set your system to a region it looks up data from all of these to know how to display things. Wouldn't it be great if you could do that in the browser. That's where the ECMAScript Internationalisation API has come in. It's actually a separate specification from JavaScript It's ECMA-402. It's actually released in 2012. They don't like spelling internationalisation either. So all of the APIs are available under an I-N-T-L object. Everything can be broken down into one of two main categories. You have formatting and everything else. But before I go into how to use them, just another quick note on when and why you would use them. It's a very simple question, do you have numbers in your application? That's when you'd use them. The formatting APIs are the ones your going to use the most but don't take my word for that. Before I go any further though, I do have to admit something, big disclaimer I've played around with these APIs in local projects, I've never actually used them in a shipped production system so take everything I say with a grain of salt. And taking inspiration from the legion of financial services that are available at the moment and all their disclaimers, here is my great big whopping disclaimer. This presentation's advice is general in nature and may not fit your specific circumstances. Talk to your front-end advisor before committing to these APIs. Past performance is not a reliable indicator of future performance. (audience laughs) Right, so the formatting APIs. They all follow the same general pattern, you create a new I-N-T-L dot something format, you pass in a locale and options, and once you have that object you can call it multiple times, passing your inputs you get a string representing that formatted number or date or whatever it is. They also have format to parts which gives you a list of tokens. So if you want to do further processing it will say here is the thing that it represents a month, here is the thing that represents a separator, and here's another thing that represents the day. Now we'll start off with numbers. So the Intl.NumberFormat API. But I mean there's nothing really special about numbers right? You wanna neaten it up, you add some commas, you pad the decimals and you're done. But anyone whose been to Europe knows that most European countries actually have those separator characters the other way around. And then you've got countries like India that put the separators in different places. You've got different numeral systems like Arabic or Japanese. And then you've got currencies. So just different regions have lots of different formatting specifications for even the same currency. So here I'm formatting the same number for the Euro in Australia, Britain, France, Germany, Austria and Switzerland. And the last three all use the German language and they all have different ways of displaying it. Okay, but hang on you might say in a hypothetical conversation the we are apparently now having. I've already used the JavaScript library to do all this. And it's like 8.2 femtobytes, so it doesn't add anything to the JavaScript bundle. And yeah you're right there are actually libraries that do this. Numeral and d3-format are some of the common ones. There's also larger purpose libraries such as FormatJS and Globalise, that handle multiple of these APIs. But everyone of these has a file size cost, and file size cost is a file size cost. If you're then also adding in the locale data on top you need to work out which locales you are supporting. The instant you load more than one locale's data at once, you're automatically sending code to the browser that isn't needed by some users. And even if you do have it all correct and you've got a lazy loading strategy that is brilliant and therefore complicated. How do you know that the data is actually correct? Because these open source libraries generally get their data from the wider community, they invite contributions. Most of them don't use the same standard data providers that your operating system does. So it's kind of like using Wikipedia as a reference, yeah okay the data is probably correct, but you can't really be sure. If you use Globalise or FormatJS they do use the same standard providers but they are large. The I-N-T-L APIs guarantee more consistent data and don't come with a file size cost. All right, so numbers got a little bit away from me. So, we'll do something now that no one has ever got wrong in the history of computing. Dates and times. All right, so maybe there's a little more to it. I mean after all you've got some choices to make. Do you go day first, or month first? Names for days and months, or numbers or both? In which order with what separator characters and is it two or four digits for the year, in twelve or twenty-four hour time, for which timezone in what language and would you like fries with that? And some of these choices are yours. You can just choose how you want this done. But, some of these choices are dependent on the locale and are you absolutely sure which is which? Now DateTimeFormat has a dizzying list of options which I will not go into here, because I only have twenty minutes. And I prefer the audience to still be awake. This is one area where there is absolutely no shortage of open source libraries, MomentJS being the most well known. The problem with MomentJS that often comes up is that it's quite large. Especially if you use it with Webpack, where by default it bundles all of the locale data. Date F-N's is a much smaller library, because unlike Moment, which isn't all in one library, gives you formatting, calculation, parsing. Date F-N's you can break down and just include only the function you want. Same with Fetcha and DayJS, but they still also need their custom proprietary data to be loaded on top. Newer libraries like Luxon actually build on top of the I-N-T-L APIs. Which means that they take care of the parsing, the calculations, and then they delegate out the formatting to the standard providers. So that's the best of both worlds. But these other libraries like Moment and Date F-N's, they still have the big glaring problem that for the formatting they make you decide what the exact format is. And then they just fill in the slots. But can you guarantee that the format you've chosen is appropriate for all the locales you support. Just to really hammer on this point, different regions have different expectations on what the formatting should be. All right so moving on, but not too far because it's still within the domain of dates and times. The RelativeTimeFormat API is relatively newer, and is useful for all those times when you've got, like you want to say a post or a comment with "posted yesterday" or "three days ago" or "that was like so last year whatevs". But the problem with this one is that, because it's new it's still also quite small, it comes with caveats. It understandably, it's much easier to start with a small API and specification, and make it simple and then add bits in as as more user cases come up. Cause the alternative is to start with a really big API and then realise that half of it's wrong, you can't remove it, you can't break the web, and you're stuck with it and that's how we have the current Date. But, the RelativeTimeFormat can be useful, but I think it needs a bit more work. You can currently only parse in a single number that you have to precalculate yourself and a single time unit. So you can say something happened two months ago but you can't say two months and five days ago. You also can't pass in a date object and have it calculate the difference from now. You'll have to do that yourself. So again we have some open source libraries, we've got Time-a-Go, Humanize-Duration and again Date F-N's, because it's modular. And honestly, with the current limitations I think you're probably still better off using one of these libraries. It is though good to know what's coming down the pipeline and what is available in browsers and what will be getting better. All right so that's it for the main formatting APIs, well actually no that's a lie because there is another one that is even newer than RelativeTimeFormat, it's only supported by Chrome, and the spec isn't even finalised yet. So I'm going to leave that for some future talk. That's actually it for the formatting. So I said there where two main categories, there's formatting and everything else. All right so everything else actually only contains two APIs. So I'm going to ignore another one just to make it easier. So PluralRules is a very useful API, but it is far more closely tied in with full text translations. So I'm just going to say that's out of scope as well. So the last one I'm going to talk about, it's probably best to illustrate it with an example, I've had to do things like this many times, where we have a list of items, it could be a grid, or it could just be a single list of words. And you want to sort them alphabetically, and you also want to allow the user to type to quickly filter the list. But how do you implement the sorting and the filtering? So the very simple way to do this is to take the text that they've entered and make it lowercase and then compare it against a lowercase list of the words you're matching against. Just so it's case insensitive. I've done code like this plenty of times. Likewise for sorting, well you just call .sort on an array. But I very naively assumed that was enough, until I learnt more about it. Because of course things are never simple. What happens when your users don't actually use the same Latin character set that you do. What happens when they have different ideas of how things should be sorted. German even has two different standardised sort orders, one of which is known as phonebook sorting, which swaps some things around depending on I actually I don't even know the rules. And they in turn differ from JavaScript's defaults sort implementation. But even forgetting languages what if you just have numbers in your strings and you want to sort them based on numeric order and not text order. Each locale has huge complicated rules for how characters with accents match against base characters, how things should be sorted. Those rules are known as collation. And so we have the Collator API unsurprisingly. You create a new Intl.Collator object, again passing locale and options. You can then call the compare method on two different strings. It will tell you whether the first one comes first, or whether they're equal, or whether the first one is actually sorted afterwards. The reason it returns numbers like this is so that it can then be passed directly into array.sort as a custom sort implementation. There is a catch though, because those who followed me on Twitter, I know someone mentioned it to me today, when I was researching this I thought that's great, I can implement text matching, turns out I'd misread the documentation, you can't actually do partial text matches with this one. So I might actually end up proposing an addition to this, so we can say does one string contain another, given all of these rules. But that's it for the overview of the APIs. Look this is not in-depth, it's just to give you a taste of what they can do as a launching pad for future research. But then the inevitable question is, can we actually use them? Don't ask me how long it took me to make this. (laughs) In most cases we actually have some pretty decent browser support. So modern browsers all support NumberFormat, DateTimeFormat and Collator. RelativeTimeFormat, PluralRules, ListFormat they're all bit newer so support's a bit more patchy. Node.js supports these as well, but with a catch that Node often only supports a small number of locales by default, so to get every locale data you might actually end up having to recompile Node. So take that as you will. But note that I said modern browsers, and haven't quite yet mentioned what everyone ends up asking about which is the elephant in the room. Or the I-E elephant as I prefer. So you might still need to handle Internet Explorer 11, because Microsoft still does, and Microsoft still will for a very long time. Because IE 11 and the death of IE 11 is tied in with the death of Windows 10, and for those who use Windows you'll know that Windows 10 is still the active operating system, and it's getting six monthly patches, and so IE 11 support will continue for the foreseeable future. It is the browser that refuses to die. Speaking of the undead today is actually Halloween, so perhaps a pumpkin is more appropriate. But here's the surprising thing, because the first ECMA-402 spec came out in 2012, that's seven years ago. IE 11 actually implemented version one. You get NumberFormat, most of DateTimeFormat, and Collator. If you want to then use some of these new ones, you'll have to use a polyfill. So FormatJS, which I mentioned earlier, actually provides a lot of the polyfills. It's up to you as to whether you want to then take on that cost because we go back to square one, if you're using a polyfill you're suddenly loading data, cause those sizes are without the actual locale data, that's just the polyfill itself. But there are a couple of advantages to using a polyfill, instead of one of the other open source libraries. One, the data comes from the exact same source as the browser APIs. So you're guaranteed consistency there. And two, the API is exactly the same. So once all your supported browsers support that API, you can drop the polyfill, no other changes to your code. Ultimately though I can't tell you what to do with your own code base. The choice is yours, because like so much in web development, it always depends. Now I'm not saying that you should always use these APIs at every opportunity, they might be complete overkill. The product I work on is legally only set up to service Australian citizens. These APIs would be overkill for us, we know that there is only one region we are servicing. And that's a perfectly valid choice, you can look at this and go no we don't need it. I'm also not denigrating all of these open source libraries they do a good job, they do a lot more than the APIs that I'm talking about here in a lot of cases, because they where designed for a lot more use cases. And you might look at your usage of these libraries and weigh it up and go actually no we still need that. And that's a perfectly valid choice. But just as equally you might look at adding yet another library and go, maybe we don't actually need that, maybe we can just make a few tweaks and use what the browser already gives us. The choice is yours. Thank you. (clapping) (whimsical music)