beyond-stringly-typed

Ever gone to a website only to click a link and have it turn out like the talk title? Or written code that introduced this bug? Can types prevent this? Absolutely! Stringly typed is a term that defines most peoples usage of of types today, thanks to a horrible introduction through Java at uni.

Through real world examples that nearly every developer will have hit in their career, we can start to move beyond stringly typed code to make impossible states impossible before they become bugs for our users, or even before we run the code or tests. Along the way you will learn how some terms like ‘Parametricity’ have real world applications and not just the work of academics and will actually help you write bug proof code that is easier to reason about, and understand.

WDS18 Day 2

People tend to associated types with C and Java, which is a shame. Jared works on one of the oldest and largest codebases on the planet – Photoshop. It has millions of lines of code, some of it persisting from the first version. Yet it only has six unit tests, essentially no tests. It just has types.

Let’s define what types are. Most people think this is roughly what types give you:

  • String
  • Bool
  • Int

…and it’s tempting to say you don’t need types because you can already work out what’s a string or number. But what is a Bool, really? It’s an enum of true and false.

(Jared will use Flow from here on..)

While most programmers use those three types, they usually combine them into classes.

It’s risky to use unmarked Bool values – will you remember that true is for is_admin, or are we at risk of accidentally creating admin users because we copy and pasted?

Another problematic pattern is that people write localised type checks inside functions, which get copied into other functions; and eventually they get out of sync. Or an error is thrown way outside a useful context.

We can avoid a lot of troubles with enums. Instead of ruling out the huge variation of bad values you can’t accept, you just define the specifics of the values you want – like 1 | 2 | 3 | 4 | 5 for a star rating.

Types and tests are best of friends. Types give you feedback quickly; and they reduce the cases you have to test for by preventing bad values getting passed in.

Typed code can be optimised much more efficiently.

Problem case: URLs in strings. They break so much!

With types (specific example using phantom types) you can define absolute and relative paths, so your code won’t try to combine two absolute paths – this is how you get the broken link www.google.comlinkyoujustclicked.com

Keep your untyped code as small as possible, use types to prevent impossible states from occurring in the first place. Write tests to cover user input you don’t control.

(upbeat music) – Yes, so the talk title is a bit weird.

If you don’t hang around people that talk about types regularly, it’s sort of a thing that exists inside the Haskell community.

It’s a play on words on strongly typed.

It’s the habit that developers tend to get into where they will just shove everything and everything into a string.

Even if it’s not a string, a date or if it’s JSON it goes into a string. And this causes problems cause if you’re trying to concatenate two strings together that contain JSON, you’re not going to get valid JSON back.

Yeah.

So, I think it’s, it’s a bit unfortunate that I am standing in front of you today as a c++ developer because I think c++ is responsible for more java script developers than anything else.

It’s because this is one error.

It’ll stop eventually.

It’s one error from two lines of code.

And this is not uncommon in c++.

I mean, this is an extreme example, but we get very big esoteric errors whenever we try and do anything advanced with the type system. And it’s unfortunate because a lot of people did start their career or they started their education in programming with a language like c++ and in their mind they equate this with types. Or they started out with java.

(audience laughs) I didn’t make these up.

These are actual examples from spring.

And java developers are so attached to their IDEs that I once worked on a project at Atlassian, that you could not build outside of a specific IDE. I think it’s the first and only case of coupling to an IDE that I’ve ever seen.

Hopefully the last.

And again, people sort of equate types with this because the type system within java is quite simplistic. It’s not state of the art by any chance.

And so you end up putting every single design patent into one class to get around all limitations of the type system.

Again, people equate that with types.

I think this is a bit of a shame.

Because I work on arguably one of the oldest and biggest code bases in the world.

We have many many many many many millions lines of code.

And probably a few millions lines of code that don’t run anymore.

It’s nearly as old as I me.

I regularly fix to do statements that were put into the code base when I was in primary school.

The Photoshop code base has survived this long, there is actually a meeting room in San Jose that has Photoshop 1.0 source code printed on the wall, I was standing there and I was like, that function, that variable name is still in the code today.

And it’s done this, it’s survived 30 years and we have about six unit tests.

(audience laughs) Someone attempted to put them in and just never went anywhere.

But we’ve managed to not only pull it to a new platform, we’ve survived Apple ripping out entire programming, entire programming libraries out from under us.

It’s gone from mac out, to carbon, to coco, and now to whatever the hell they call the Ipaddy thingy. But it’s survived all this without the need for tests. And when I sort of think about it and I think about my time as a javascript developer, I think I’ve rewritten more javascript in my life, then I have Photoshop code.

Because we can refractor safely without stress because we have these types.

And the code that uses types better than the code that doesn’t is much easier to work with.

And it’s often the harder code, like the way that the Ipad stuff was done was easier to work with than the old Photoshop code.

So this talk is not going to teach you the ins and outs of type systems and stuff it’s just trying to make you excited about what you can do with types.

So to do that I need to define what types are. These three types represent what most programmers think type systems can do for you.

They can tell you if something is a string, a bool, or an int.

Now, this is actually a quote that someone once told me, “I don’t need a type system, I already know “what a string is, I already know what”.

That’s yeah, they’re much more than that.

But what is a bool? A bool can take two values.

What are those two values? Anyone? True and false.

Which is sort of true.

The rest of my talk I’m going to be talking about flow, I’m going to be using syntax that works in flow. All but one part of this will run in type script as well.

So to define bool in flow we’d use what is called an enum.

And it’s just two values, true or false.

In flow you have to use strings to represent new values, for some reason.

In other languages you’d write it something like this. Now because we can already, because we are only allowed to represent bool as two values, we’ve already fixed one of the biggest things that annoys me about javascript and that is this.

Anything is bool if you try hard enough.

So, again an int.

An int is similar.

We can define it as zero, one, two, three, four, five, all the way up to int max.

Which is depending on how many words your computer is, you know, a few million up to a few billion. But it also goes the other way.

Int min So we can define it something like this.

Yeah, you get the point.

And again, by defining an integer as only allowed to inhibit these values, we do away with this.

Every number inside iaaa754 can also be not a number, it can be negative infinity it can be infinity. So by defining an integer we suddenly don’t have to worry about this anymore.

And a string is pretty simple.

It’s a to zed and various things.

Now I say that programmers tend to use these three types. It’s not technically true.

We do like to group them into things called classes. And who here has written a class that looks something very similar to this? I think every hand in this room needs to come up. And on the surface this looks perfectly fine, and I actually see three problems here and we’re going to address this by introducing new types into our system.

The first one is bool is admin.

This reads okay, it’s pretty self describing. Until we go to use that.

What does true mean? When you read this code in five days time you’re gonna remember that that’s creating an admin user? Probably not.

And I’ve seen plenty of errors introduced like this. A common pattern that I’ve seen throughout this is someone puts a comment next to it.

Now, I don’t know about you, but I copy and paste more code than I probably write. So if I copy and paste this code I could easily accidentally introduce a security problem because I copied and pasted, I haven’t bothered to read what the function’s doing and I’ve created an admin user.

And in my head I know have to carry around that extra weight because I have to remember that true equals admin.

But, we can let the type system do this for us. So in flow we can define a new user type of admin and normal.

Put it in our class and now suddenly our class reads much better. Now I don’t have to mentally map true to admin. It’s already done for me.

We get another benefit, which is, if someone tries to call my API with something that we’re not prepared to deal with, the compiler’s gonna tell us that.

Flow errors are quite horrible verbose, they’re without adding the information you need, but basically all this is saying is, super cannot, we cannot use that super value with user type because it’s not something that we know about. So instantly, we’re getting feedback on how to use our API while increasing readability.

Next one, int.

Again, an integer is a very large number of values A few billion values.

Who’s written something similar to this? Where we take something that we aren’t and turn it into something else? Fairly common.

And I think everyone here would agree that this is probably a way that we could write this function.

We check whether it’s null, false, or you know, any number of values that it could be.

We check whether it’s an integer between one and five. So instantly this is telling me we’re using a value that’s way too big for what we need it to be.

Because I’ve got four billion possible values that can go into this function, but I only need five of them.

Something’s wrong there and then we switch and we do that.

Now, this code by itself is fairly innocuous. It’s not too bad.

But what I have seen time and time and time again is this happening.

Where that error check just gets thrown throughout our system, like everywhere.

And that’s fine.

I mean, if you’re a good developer, you probably notice that, refactor it down to a function and then put that function in, but then something happens.

Someone comes along and they forget part of the check or they forget to check that value with something that we’re allowed to deal with.

Now, I’m guilty of this.

This is from java, but you get the point.

That value that’s being created all the way up the top here, but our code is all the way down here.

So the problem isn’t that nulls coming into this function and we return an error, but the problem is that is was created with null in the first place.

So this means that the locality of our error is way outside of this function.

This creates problems with debugging, we just sort of shift the problem and we practise what I call coincidence driven development. Where it’s just a coincidence that returning, checking null and returning an error here doesn’t blow up the system here, but you know, in five days time it might cause a problem.

So what we could do is instead, we could check it at the source where we create that user and return and error there and then.

And then we rewrite our function in terms of person. But this now creates coupling to our person, which means that if we ever want to create a star rating for a car or anything, any other entity inside our system, we have to rewrite create star rating That’s not ideal.

Instead we can use enums again.

And we just tell it, I only ever want to deal with one, two, three, four, or five.

They’re the only values that I know how to deal with so only allow those to come in.

This gives me a great chance to talk about something that always comes up on social media. And that is the argument of times versus tests. People seem to think that times and tests are in constant conflict.

I don’t know why the raccoon is fighting with a dog, but I like raccoons and I like dogs, so you get a raccoon and dog gif.

This isn’t the case, like, in reality, types and tests are best of friends.

Types can give you feedback quicker.

For tests to run your app often has to be running. I have gone weeks with Photoshop not even compiling. And if I’m not getting that compiler errors, I’m not getting feedback, this means that I’m building, potentially introducing more and more bugs into my system because I’m not able to test that code.

The other thing is that, by reducing the amount of values that are allowed into our function, we can start to delete tests.

So on the left we have the code that is checking, you know is it between one and five.

And on the right we have the test that tests to make sure that that code does what it is meant to. But, I’ve told the compiler I only want one, two, three, four, or five, so those tests can just disappear.

And again, I’ve told the compiler I only want one, two, three, four, or five so there’s no null, there’s no undefined, there’s no false, there’s no empty string, there’s no other type coming in.

So now, I lost half my code.

But you get the point.

Suddenly my tests are much more clear.

There’s less of them.

Tests take time to run.

They take time to maintain.

They take time to update.

And we also get another interesting benefit with this. Because create star rating had two cases where it was possible to return an error or something that wasn’t a string.

The way that V8 works is, it watches your functions and every time you call up a function in javascript it says what was the type it was called with and then if you call it again what was the type it was called with and it keeps going and going.

And if you answer that question enough times, with the same type and in this case it’s actually what V8 will call a small integer, it’ll say, okay cool, I now know this is only ever gonna be called with small integers so I can optimise this.

So it writes a symbol that is highly optimised to only ever deal with small integers.

But, if you ever invalidate V8’s trust, and return something else, so in our case if it hit that error case, suddenly V8 goes, I don’t know how to deal with this. It then switches back to the de-optimized version, the slower version and you have to pay the cost for the recompilation again.

So, by introducing a type, we’ve now given V8 a code that it’s able to optimise much much better because we’re not, we’re not violating it’s trust. Now we’ve got one more thing that I wanted to talk about.

And this is that string url.

I think without exaggeration, I have fixed bugs about a billion times that involved strings and urls or strings and file system paths.

So this one is near and dear to my heart.

So, there is a concept in type systems, this is the part of the talk where type script isn’t really relevant.

I apologise for that.

But, it is what it is.

Type systems come in two types and the way that type script decided to do things doesn’t really support this, but anyway.

So if we think about our types, if we think about paths, we’ve got two different types.

We’ve got a relative path.

That is home slash something.

And we’ve got absolute paths, which is usually things that start with like a slash or in Windows c: or a url that starts with https or http.

So using a concept called phantom types, which the Haskell wiki helpfully defines as a parametrized type where the type doesn’t appear in the data members.

It doesn’t make sense to anyone, it doesn’t. And I think is a problem that the type, like Haskell and the other communities have is that they use academic terms that aren’t really approachable.

They are very simple concepts and I hope that this will do that justice.

But, if we think about it, like I said, we have two different values.

We have absolute values and relative values. So we can declare a new class that has this strange little thing here called T.

Now the reason we call it a phantom type is it only appears on the left hand side.

It doesn’t appear anywhere else inside our class declaration.

Or we could declare the same thing for our url. Cause it’s two things that are basically the same. Now what this allows us to do, it allows us to start to write functions based around what things are.

So in our case, when we want to join two strings together, we can join absolute and relative ones, that gives us back an absolute path.

Or we can join relative paths so it gives us back a relative path.

Might be a bit confusing about this, but if we were to write this at runtime, all this is doing is just saying, the compiler is just saying, what is that T thing that you’ve given me does it match with what is on the right hand side there.

So this gives us an interesting thing.

What is interesting about this is not what is actually defined, but what’s not defined.

So I have an absolute url and then I try and join two absolute urls together.

Who has ever clicked on that link and had it go like, http slash google.com slash https the link I just clicked on the whole time, yeah.

At least once a week, and that infuriates me because I know exactly what the cause is and it’s because you use a string to reference url. Stop it.

So, in this case, I’m trying to add two absolute urls together, doesn’t make sense cause it’s just a state that cannot exist in outside our system.

And flow helpfully says you can’t do that.

It says absolute is incompatible with relative. Because I defined the function, absolute to relative.

So it’s said well I’ve got an absolute in the first position so I’m going to look at the second position.

Oh okay, that’s not relative, so it’ll throw an error. So we’ve just now, by simply introducing a new type into our system we’ve now just completely eliminated an entire class of bugs from our system at compile time.

Again, if you want to see what it looks like here. But it also increases readability.

Because now, how many times have you read an API that takes a file and had to go looking does it take a relative one or does it take an absolute one? Now it’s extremely clear.

You can write functions in terms of what they need to take.

So I can say I only want to deal with absolute paths. Or you can reference at different coordinate systems in space.

So browsers have relative and absolute positions, but you don’t wanna add those two together, you wanna add them together in terms of what they actually are.

Another thing you can do is, when we create a person instead of returning just a person, we could return back an admin or we could return back a normal user and then we can start to write functions in terms of the type of user that they are.

Again, making sure that admin, non admin users, can’t call into functions that an admin needs to use. So, again, start to think about your systems and try and you can start to use your type system to literally make impossible stents impossible. Now web developers, we have to deal with a lot of stuff that comes out of like, various networks, it comes off user input, it comes like this, my recommendation is to draw a line down your system, this side of the code is all typed, this side is my user input.

Nothing unvalidated, nothing coming from that user. Nothing unvalidated can come across here, so if you take a path from a user do all your dynamic and runtime checking to make sure it’s an absolute path and then turn it into a typed absolute path and then that’s allowed across your type boundary. So try and keep that untype part of your code as small as possible and then write your functions in terms of types.

Hopefully you see it increases readability. We can suddenly say instead of this thing can potentially take four billion integers, I only want five.

So this helps people understand your system, it helps people read your code, and reading code is one of the things that we do most.

And by using types we can delete tests, we can delete code, which means that it cost less to maintain that code, it’s easier to maintain that code, and yeah, fun gif.

Yeah, that’s the end of my talk.

So hopefully you’re now as excited about types as I am and that dog.

(upbeat music)

Join the conversation!

Your email address will not be published. Required fields are marked *

No comment yet.