Welcome to our lecture on JSON. Next, we're going to talk about Postgres. But for now, I want to talk a little bit about JSON, give you a bit of historical context. The URL for these lecture notes is https://www.pg4e.com/lectures/06-JSON. So one of the things that JSON is very much about is data serialization. And data serialization is a problem when we are transfering data structures between programs in different languages or just programs in the same language running across the network exchanging data. And so the two most common data structures that we use inside of programming are linear structures and key-value structures. And in Python, we think of those as a list as a linear structure, just 0, 1, 2, 3, and then a dictionary is the key-value structure. But if you look at languages like JavaScript or PHP or Java, they all have these. And while we have more sophisticated data structures, they kind of all fall into a in general you can kind of model them as either key-value pairs or as linear lists. The only other data structure is actually a tree of information and we'll talk about that in a second. But the problem is and what the word serialization means is if you have a dictionary in Python and we need to send that to PHP or to JavaScript where it needs to be an object in JavaScript, you need a format that both Python and the JavaScript can agree on as the interchange format. We call it serialization because it was what was sent across the wire back when we had networks that were made of wires. And so it was a serial set of bytes, a set of characters that were sent across the wire and if you cut it in the middle and you watched what went back and forth between the two systems, you'd say that's our wire protocol and it goes by serially. And you'd say, well, okay, we're sending a dictionary to an object, but what's going across the wire? And so the serialization format. And an another word for this is marshalling and unmarshalling. So you marshall the data to prepare to send it and then you unmarshall it when you receive it. They're basically identical terms. Now in the early days, so from sort of 1990 like 1990s when HTML came out. And so in the early days, we looked at this less than, greater than HTML format and we're like we can use this to represent data. So we came up, the world came up with the thing called XML which stands for Extensible Markup Language. And it's basically a tree of tags like array and slash array. And then now we have some entries. And array, they don't mean anything special, you just put stuff in. But we were using this format to serialize or to marshall and unmarshall data between systems. You also would tend to serialize it when you put it in a database or wrote it to a file and then read it back from the file you tended to serialize it. So you could think of as you're editing in Microsoft Word, for example, Microsoft Word is like you're moving stuff and it's updating its internal structure, but then it has to serialize that data into a docx file. And in that case we use XML. So XML was the format of choice for serialization and we used it a lot. There was all kinds of libraries, frameworks, and strategies for XML that were very, very, very popular. But then what happened is as we moved from sort of servers talking to servers on networks and then instead we started having servers in the back end talking to browsers and browsers talking to JavaScript. Originally, we just had request/response cycles where you would go to a page, it would paint, and then you'd click on an href, an anchor tag, and you would go to a next page. But as we wanted to add more interactivity, we would write more of our application code in JavaScript. And sometimes we would talk to the server, pull data back, and then update the document object model without actually refreshing the whole page. That's how things like Facebook has a little red thing with the number of messages that either update or just show up when something happens. It's not changing the whole page, it's just changing a little corner of the page. And this pattern when it first came out. Well, we still call it AJAX, the pattern was basically let's take XML and have the JavaScript in the browser read the XML from a server. Now that server often didn't have it in XML. It just took whatever dictionary or list or whatever it had and it would turn it into an XML and then in the browser you'd take the XML and turn it back from the XML into an array or list or whatever it was. And that was how we started. The problem was is that XML is a hierarchy, it's a tree, and it's really good at representing things with trees and it's a bit self-documenting because you can choose the names of the tags in XML not like HTML, but in XML you choose the names of the tags. So people could look at the XML, and go like I kind of have a guess as to what that is. Although lots of XML is hard to read, this particular one says, I'm an array and I got an entry and each entry has a key and a value. I kind of know what that means. This kind of looks like an array of dictionaries or an array of objects or a list of dictionaries or an array of objects. But the problem is in general, it was complex to go through that and reconstruct the arrays when you actually just wanted to send an array or send a list or send a dictionary or send an object. And so Douglas Crockford, and I really encourage you to watch this video like video interview that I did of Douglas Crockford in Yahoo a number of years ago. Douglas Crockford said, you know what, there is this format that we use to specify object and array constants in JavaScript. Why don't we just use that as the serialization format? So that's why it's called JavaScript Object Notation. It should really be JavaScript Object and Array Notation, but that would make it JSOAN, but it's not. JSON, J-S-O-N. And so the idea was is that you have this way in JavaScript of just saying, you know, and in Python, we do the same thing. We can say x equals curly brace blah colon blah and you can make yourself a dictionary. And so this is like making an object in JavaScript, which is the same as making a dictionary in Python. And they can be nested so you can have an object that has a key that the value of which is a list or whatever. And so Douglas Crockford said, let's just use this as our serialization format. It's nice. It'll be nice to pretend that in 2000 Python was part of the reason he picked it. He picked it because it was JavaScript. And it just so happens that Python looks pretty much identical to it. JSON is a little more restrictive than either dictionaries or JavaScript code. JavaScript is actually looser. So what he did was he basically came up with a subset of the JavaScript syntax, the constant syntax format, that was a little easier for serialization deserialization, because he really knew that we were all going to have to build a bunch of libraries and then those libraries for Java and for PHP. JavaScript kind of already existed, although by now we've actually built libraries instead of just like executing JavaScript, which was kind of dangerous. The earliest parsers were you just ran the code which is a little scary actually, but now we actually parse it. So, he knew that we were going to build all these back-end languages like PHP and Java and ASP and we had to teach them JSON, make really cool JSON libraries. So in Python, you just say import json and then, poof, your serialization, your load s and dump s and doing serialization and deserialization, and it's awesome. But he basically said we'd better not make it as flexible as JavaScript because JavaScript allowed constants and variables and all kinds of substitution. And so he was just like look, it's got all be strings etc., etc., etc. So he put up a website called www.json.org. It's kind of fun to look at, it's crude, it's simple, but it's precise and it was a way that allowed people to carefully develop. It's very well thought out. Everything Doug Crockford does is really smart. And so it allowed people to start building libraries and testing those libraries. And if one library like the PHP library and the Java library disagreed with one another, then you could at least go to json.org. And so it really kind of brought some order to this general notion of using JavaScript constants as a serialization format. And like I said, we can't claim that Python was too particularly influential. But it's really nice that Python and JavaScript are the two languages that are probably the most, going to be the widely most widely known languages and they look the same. And JSON looks like Python, it looks like JavaScript, and it looks like JSON, so that's kind of nice. So ultimately, JSON became very dominant. Later things like JavaScript in the server with NodeJS made it so that JSON was even more natural there and then databases that are based on JSON started to emerge like MongoDB. So if you had JavaScript in the client, and Node in the server, and then a MongoDB in the back end, you were basically doing JavaScript everywhere and JSON everywhere. And it's really, really pretty. It turns out that it's still difficult to write applications in that environment. But it was nice because you were using one language and one serialization format from the database on back. And so we'll talk a little bit about the emergence of these NoSQL databases, which was really a code word for all intents and purposes for JSON databases for many situations. But then what happens is that as these NoSQL databases became popular, things like Postgres, MYSQL, and other databases are like, you know, we could just add a JSON column and kind of do what you fancy brand new databases that are nowhere near as mature as us. And so in an upcoming discussion, we will talk about some of the what's going on between the NoSQL and the SQL movement, and how you can somewhat get the best of both worlds in a current generation relational database like Postgres. [MUSIC]