as well. So basically, this particular project is a is an interesting project because let me just first of all, you can look at another video I did on this earlier. But we'll do it again. Is that this is a CloudWatch timer that will run at a period, periodic interval. And then what will happen is it will read inside of the dynamoDB database, which is a key value data base. And then, this thing will then put messages into an SQS system. An SQS is a place where you can put messages where you can essentially never overwhelm it because it can take an infinite amount of messages basically. And then I have a Lambda that's triggered so that there's a one to one where every single time I get a message in SQS, it fires off a Lambda, so it basically executes my python code. You could even, I think, tell it to get batch messages like ten at a time. But conceptually, it's one to one. And then from here you can do anything you want, you can do comprehend. You could do natural language processing yourself or whatever service you want to call, and then I put this into a bucket. But this general architecture is really basic, serverless architecture that that's fairly straightforward to set up. So what I'm going to do is I'm going to go through here and set this up, so I'm going to go to council here and just started spinning this thing up step by step. So the first thing that I would do is I would go to cloud9 and get this thing working. And I think I had set one up earlier. I'm just going to use this project because it's probably already got my ssh keys. Okay, so well, that's spinning up. What I'm going to do here is look at this notebook. So that's the architecture. But let me open up another. And if we look at this notebook, the main thing that I'm going to do is I'm going to create a Lambda function well in the console. So go back to the console and go through and create a new Lambda function. And we'll just we'll call this one, producer 919 or something like that. And then I'll change the runtime to Python. Although I think in this case, I want to do Python 36 because that's what Lambda supports. And then this is the part that's tricky. Is that the Lambda itself if it needs to do anything, like talk to another service, it has to have also a role just like elastic bean stock or EC 2 or codebuild. And so I've already had one earlier. Somebody used an existing role. I'm going to select this one right here. I have an admin role that I use for prototyping. If you want it to be very secure, you would only give it the permissions that it needs to execute what it is you're building. But for prototyping, I like to give it an admin role. Okay, so I've got this sample project here. What I'm going to do is I'm going to go to this code here, which I think I have some producer code here, and I'm basically going to need to populate SQS. And so I'll just copy this in and throw this inside so we can take a look at it. So copy this and then per put this inside and let's just walk through. What it is that this actually does. So I import boto3. I use Jason just to do Jason stuff. And then, this Dynamodb library allows me to talk to a key value data base, which I'm going to create in a second. And it's going to talk to a table called Fang for Facebook, Amazon, Netflix, Google, And then I'm going to have a QUEUE called a producer, and I'm also going to make an instance. So because boto3 is an easy way to talk to AWS, I can just in a line of code, make an instance of either one. And then I set up some logging, and this is always a good idea for serverless to create some logging so that you know what's going on. And then from here we will scan a table, and all this does is it goes back to this dynamo table right here, and we will, grab the information inside of it and get everything out of it. So this is like a lazy way to retrieve the data, which is like just give me everything that's in this table and then once I've got that the other thing I'm going to need to do is I'm going to need to send messages to a queue that will hold all of the things I find out of the database. And that's what this line of code does, is just sends it to a queue. And then the next thing I'll do is I will say for everything that's in the DynamoDB table, put it into one by one into the queue. Now I could have done additional things, right? So because I'm pulling it from Dynamo, I could add additional information or delete certain things. I mean, I can do anything I want, but for ease of use in explaining this, I'm just going to take something from the table put into the queue. And then the thing that does all the work is this is that this is the entry point where it doesn't actually accept an event that it does nothing. All it does is it pulls data from the table and puts it into the queue. That's it. So that's really all I need. The only thing, though I'm going to need to verify is, do I have a table in Dynamodb? And do I have a queue in SQS? So let's go check that out. Let's go to first to dynamodb and take a look at this. And from here we can look at the tables that exist, and there is actually a table called Fang already. But let me just show you how to create a table. It's really it's almost like a Google sheet. There's not a lot to it. You don't need to be overwhelmed by it. You can just say like demo, right? And then for here, this just has to be a globally unique ID so that you can It's like a python dictionary. So I could just say guid globally unique ID and then say create. So again it's almost as simple as creating a spreadsheet. Once you do that, you can add things by hand. It takes, let's say, 20 seconds to create it, but actually already done. So now I can go through here and I can say create item and I can just say, for example, apple append. I could say, you know, banana and I could say like more. I can put whatever I want, you know, inside of this thing. I could have nested ones. Or I could say, you know, it is literally just a dictionary. Basically, it's a dictionary data structure, and I also can delete it easily as well, by just going to demo and saying delete table. And so I'll just type the word delete will be lazy and not type it. I'll just paste it. Enter the word delete. Okay, I have to enter it. Okay, so now this fang table, because it's got just a name inside of here. These are just names of companies, Amazon, Google, Microsoft, Netflix. I can put whatever I want. I can put Oracle or whatever company I want to put it inside of here. And that's it. So I've got DynamoDB table set up, and it's very easy in Boto to read and write to DynamoDB. Again, think of it like a spreadsheet. Now, I'm going to go to SQS. And SQS, I need to have a queue caught up called producer, which I don't have one set up. But let's just first show you how to create a queue, it's pretty easy, there's two kinds. There's FIFO and standard. This is probably not what you want when you're using a queue, because it's a first in first out. And it's really like a specialized queue process. More likely, what you want is something where you can just throw a fire hose of stuff about it. And this is more sloppy, I guess, and this is more precise where if you really needed to have first in first out delivery like you're doing payroll transactions or credit card transactions or something. Okay, maybe you would use this, but what's great about this is that you literally can't blow up the queue. There's no way you can send more information than I can accept, like it is not possible. It takes just a tremendous, like hundreds of thousands or millions per second or something just absolutely ridiculous that you'll more than likely never run into. And so, inside of here I would just say, producer like this as the queue. And you can just leave all the stuff default and say create and that's it. And once you have the the queue created to use it, you just say send and receive messages and you can just say like, hello and just send a message. And then it's pretty straightforward is that in this queue here, if I want to look inside, I can say poll for messages and look you can see there's a message and what is in the message body, hello, right? So it's a fairly straightforward process but the main idea is that it's almost like an email inbox like you just collecting a bunch of information. So I'm going to delete this here, and we can say stop polling because there's nothing inside the queue, right? You even can purge the queue as well. So if you wanted to purge, all that means is that just if there was something inside the queue like you had 1,000 messages, you can also just delete it, which is really only used for development, right? Okay, so we got queue set up with DynamoDB set up so pretty much we can get this lambda cooking here, right? because we know that the lambda this is all of the work, right? It's very straightforward to poll all the data from DynamoDB is just producer table scan, that's it. So I'm going to go ahead and save this. And then to run it, I just need to add a trigger. So [COUGH] I can either run it myself, or I can run it manually inside of here, testing it. So, let's go ahead and run it this way first,. So I'm going to go through here and just say, fire, and then this will just be an empty. We'll just say fireQueue, and this will just send an empty payload because it doesn't accept anything. We'll format this and then this will actually run, it'll scan that table, it should, No module named pythonjsonlogger. So we did run into one problem which is irritating is that I do use a third party library that it means that I didn't install. And so, we can fix that, though, by importing this into the Cloud9 environment and fixing it and I'll show you how I typically do that. And this is a little bit clumsy, and it's irritating that they do this, that there may be a newer way that they do this. But, this is the way that I install packages is if we go back to here, what I can do is just close all this other stuff. I don't need any of this anymore. This is some other project I was working on. But, I can go to this AWS Resource tab here, and I can go to whatever we call this thing, which was producer, 919 and I just refresh. And it should populate over here in the piece. Where is this thing? Refresh this, More. There we go. So this one, and if I right click on it, I can say Import and I can literally import it into my project, which is pretty nice actually. And so once I've imported it, notice that this is the problem is this library. So how do we fix this? Well, the way I do this is I will install the package in a certain way so that it's bundled inside. Which is again, I hope they fix this, that there's an easier way to do it, but that's the way I do it. So I would cd into multi-cloud. No, not that part. [LAUGH] I would cd into producer. And then, Inside of here, we would look at this template. There's a couple ways we can solve this, let's see. So this is the template. I guess they change this a little bit because, It's going to require another package that's one level above it. So, let's see if this works. So this doesn't have a virtual environment as well, which is kind of irritating. Let's see if this works. So this is one way that it might work is that you say requirements.txt. And then inside of requirements, you say python, you put in the package that needs python-json-logger. Which I think that's the name of it. Let's double check. Yeah, I think this is the name. Yeah, python-json-logger. Okay, so that's the library. And let's see if this works. If I create a virtual environment so I'll say python3 -m virtual environment, and I'll make it above because I don't want to check this virtual environment inside of there. I'll say, ~/.producer919, I will source this, so I'll say, I'll source this environment and this is the tricky part. That's kind of weird. And I'll show you what I do is that, and again, there may be easier way to do this is I don't actually installed into the virtual environment. I just used the Python packaging tool from the virtual environment. I installed it one directory above which should be something that can pick up. So let me see if this works, so I do pip install, dash our requirements and I say, dash, dash, target let's just double check I have this command, right? Yeah target I do this, i do target above and does this work, so the reason why is that? It needs to look a directory above this to find its its library. And so I think it will grab all this inside of here, which is which, again is a little bit weird that it would grab all that in the directory above here, but okay. And now if I go here again, I can actually deploy it and it should grab that directory above, say deploy. That was easy and let's see if that works, so if I refresh this yeah, well, that was old. Let's see if this works here So it did give it the new file let's see what that does. Did that work is still having a problem, yeah usually that works if it's in a sub package. And this is the thing that's irritating about the land environment, so what I'm going to do to fix this is I'm going to do it a different way. I'm going to grab a copy, this code that I have in here, and I'm going to just recreate it from cloud nine, which is a little bit easier. So instead of this, I'm going to deactivate this environment, and I'm going to just make one from this wizard. So this wizard here, I can click this button, and what it does is it creates the package structure that makes it easier to package the dependencies with it. And again, this is a little bit clumsy, but weird, so let's just call this producer and I'll say, cloud nine. Cloud nine pretty sure cloud nine there we go and if I go through here, I can say Python 36 and for trigger. For now, I'll just say none the role I will want to use the an admin role, and I'll finish it. So this is a little bit trickier initially than I expected it to be, but we can easily fix this, so this has given us an empty function. I'm going to put that same source code in there, and the only thing I'm going to need to do is do the same thing that I did before. But I'm going to CD up here, I'm going to C D into this producer cloud nine, and notice that the directory structure is a little different because this is where the Lambda lives. And so what I'm going to do is I'm going to source this structure they set up for me here, and then I'm going to do the same thing I just did. So I think the problem was there wasn't the upper level directory, and that was what the issue is. So I'm going to say source the envy being activated, and then I'm going to see the into this, and then I'm going to do that. That fancy thing that I did before, so go producer cloud nine requirements will put in here Python Jason lager. Okay, so I got that working, and then I'll do that same command, so I'll just say pippen so requirements. I'll do that command but let me just double check where I'm at okay? Yeah so I want to do it, but I want to do it one level up because the requirements file is up in the next level up as well. So I'm going to say, install the requirements from the directory above me and then also put the packages in the directory above me okay, so? So if we look in here, it put all that stuff inside of this this directory, so we have an outer folder that holds all the packages. And then inside we've got our lambda, now I should be able to play it so a little bit confusing. But we found a fix, a right click, and I do deploy, and it should take a little bit longer because it's copying a bunch of files there. Now, one thing we can do whilst deploying is that you can actually test this locally. Because if I go here and I say run, run local no, remember, it doesn't take a payload. It's just an empty payload but it will call out to the cloud nine Instance, a call to dynamodb. Pull this up from the table and put into the SQS this will run this, so let's just go ahead and run it here there we go look, so it did all that It went, went over there. And how do we know, let's go back to Sqs and Sqs, we should see if I pull it If I say send receive messages and I say pull, there should be, like five or 10 messages in here. So there we go there's a bunch of messages and look at what are the messages? They're the names of these companies, right? So I can't even test this locally just from in this environment. So we got the first part set up, and we figured out why you need this certain directory structure, which is pretty irritating. But for now, that's how they want you to do it unless they fixed it. So the only other thing I'm going to need if I go back to this get a project again, is we've set up this part of it, which is we set up this and we set up this and we set up this. So we set these three pieces up, but what we haven't set up is tell it to run, at some interval, probably one minute in a real company is too much. But for demoing, it's good because it will just keep cranking things so I can see it in action. So I'm going to go back here and I'm going to go to this, function here and I'm going to add a trigger, the trigger is pretty easy. I just go to Cloudwatch and I tell it to I think it's called events. It's called let's see event bridge Cloudwatch events and you can just set a role already got already have one set up in one minute timer, but you can see it's just this It would be rate one minute. But I would probably say it once a day for something like this is better but just so we can see it working, I'm going to say add. And it's going to pump the Q full of messages over and over and over again. Okay, so while that's working, we need to build the second piece now. So we got this and this we've got four of them run now. We need to consume the messages inside this Q and then do some kind of natural language processing API and then put the results into S3 buckets. So I need to do step two and then for step two is pretty easy, actually, I would just go to the example source And find, I think it is going to be this one maybe, let's look at this. Yeah, this looks like the code, and I'll explain it in a second. So I'm going to just grab this and copy it, and then I'll do one more. I'll create one more inside of this area here. So again, I just go to this plus icon, I can close all these other ones, I don't need them anymore. And then I can say, okay, make another lambda and we'll call this one consumercloud9. Next, runtime will be Python 3.6, go through no trigger. We'll use a role, the Existing Admin role and say finish. Now that this is running here, I'm going to just paste it in. And the only thing that we really need to do, first, let me just explain what the code does. So the code, we're going to need to install boto3, wikipedia, pandas and then that pythonjasonlogger, those are the pieces of code that we'll need to install. And then this is just utility functions that I'm going to use to talk to the queue. And so what we do is we make a connection to sqs queue. In fact, I think I can even get rid of this. I don't think I use this, which is a bad software engineering practice. I don't know why I left it in there, but this I do need, which is a delete queue message. After you're done grabbing the message from the queue, you should delete it so you don't rerun the job again. And then what this code does is it grabs the names from the Wikipedia URL, and then it puts it into a data frame. And then what we do is we call out to AWS comprehend service, do natural language processing, and then apply that to a data frame. And then I write that information to S3 bucket. And then here is an example of a lambda handler just like this and says, make Pandas dataframe with wikipedia snippts, perform sentiment analysis, and then write it. So basically it's going to call Wikipedia based on the name of the company that appeared in SQS database, grab the page information, put that into a dataframe, perform the sentiment analysis, and then write that out to to an S3 bucket called fangsentiment. So that looks pretty good, but the biggest question mark is I'm going to need to do the same thing I did before, which is get those packages installed, which again, is a little bit irritating. This is the most irritating part of all this. So I'm going to go back here, and I'm going to go into consumercloud9 and again, we see it gives us a virtual environment, which at least they do that. So I can say source the venv/bin/activate, and then I will cd into consumercloud. Actually, first, let me put the right requirements in here, so we know that it's going to need, python-jason-logger. It's also going to need pandas, it's going to need boto3, which I think we already have installed. And then it's also going to need wikipedia. So it looks like those are the main libraries, four libraries. And then I can do the same thing. I can just run history, and I can be lazy and just copy this. So I can I can now cd into this, run this command pip, install, the requirements fall above me and then put the packages above me as well. Now this one I can't test locally because it only gets triggered based on messages that appear in SQS. So what I will do is I will just deploy assuming that this works, and I will go to consumer, right click on it and then say Deploy. And this should take a second. Now how then do I get this thing to trigger on that queue? And and by the way, before we do that, let me go back to the queue, and there should be a lot of messages into this queue now because it's been running for a while. And so there is, there's a bunch of messages in this queue, and Messages available 30. So we see that there's there's a bunch of messages that are appearing inside of here because it's constantly pumping these messages into the queue. So what I'm going to need to do is go to this function here and not the producer, but the, I'm going to need to go to the other function, which is the consumer and I need to add a trigger that tells it that every time there's a message in the queue two, which will be the producer queue to grab a message at a time. In this case, we will see the maximum number of messages. We'll just do one at a time. Grab it and then and then put it through all the stuff that we have. So we'll go ahead and say, Add. And it should be, [SOUND] instantaneous. It should instantaneously just start draining the queue. And it should in fact, be so quick that there should be nothing left here very, very quickly. And let's see here, though, if we say, Poll for messages unless there's a problem Messages available, it should really quickly just drain everything in the queue. Let's double check, although it does take a second for this to get set up, so yeah, it's still creating that's why. But as soon as it's done creating, then it should process all of the messages in the queue. There we go, so it's done. So if I go back here, there should be a theory, let's see what's available. Poll for messages. I don't see anything. So it looks like they already got rid of all of them, which is what I would expect. So they're gone. They just process everything in the queue. And then what we would do next is we could look at the logs. So this consumer here, we can go to monitoring, which is always very, very important to know about monitoring. We can look at the logs and CloudWatch and we can see exactly what this did, and we notice that it is actually processing the events. There we go, that looks good, so it's able to do all this stuff. That looks good that it was able to process this writing messages to the bucket. So now we can just check S3. That would be the next thing to check. So I'm going to go to Amazon S3 here and let's look in that bucket called fangsentiment, there we go. And we can see that we can even sort by Last modified. There we go Google sentiment, and we can actually see that it's putting stuff in there. So what we could do to make things a little bit trickier, Is, I could go back to my cloud nine environment because I'm lazy. And I could just pull this into here so I could just kind of get rid of all this stuff which is irritating and just do, like, literally aws s3 cp-r. This would be Fink's sentiment, fink sentiments would it be s3. Does that work? Let's see, actually, before I do that, let me see. I want to be into the environment directory which is, like, we'll show all this stuff and let's see if that will work. Is this the command? Unknown option. So I didn't do anything. Let's look up the documentation. So aws s3 cp--help. What is it? Okay, let's see example would be cp, that looks good. So okay, they're giving me, here we go s3 sink. There will be local to remote. I think I s3 cp--recursive. I think that's all I need to do is add a dash dash recursive. Okay, so let's try this. Let's see if we can get all those files here in this dash dash recursive. Will that work? Okay, there we go. So we're able to copy those files locally, and then I can even just look at them inside of here. So we can see, I don't know when these were from, but if we go back here, we can look at the latest time stamps and last modified Google was new, so we can go back here and we can look at Google. We can just even look at it inside of here. We can see that it has a CSP format, and then it has the snippet of code, which is what I got from Wikipedia, and it also has the sentiment associated with it. So we have a full Indian data pipeline and we can communicate back and forth with the data in s3. So what do we build, we built this. We built a timer that is constantly going through and triggering this producer lambda. This producer Lambda reads the data which is the company names from dynamodb. When it grabs it, it puts it into sqs. And then this is an event based workflow where every time a message appears it immediately calls automatically the consumer lambda which goes through and grabs the message from sqs. So passes the message does whatever we told it to do in this case, do comprehend analysis it then extracts that and puts it as a file name into s3. So all these things could be swapped out for different pieces. This obviously is probably too much when probably once a day is fine. And then additionally this doesn't have to be comprehended, it can be your own natural language processing system or whatever it could be video, it can be anything you want. And so what I'm going to do now to clean this up and this is very important to do, is I'm going to go back to everything and just delete it all so otherwise I'll get charged for these services. So I need to go to Lambda and I need to find the producer. That's probably well, first of all, delete the old producer that we didn't get working will delete that. Then I'll delete the producer that has the trigger every minute. So first I'll go here and disable that trigger so we can just delete it. And then I'll also just delete this whole function. Just go here, this actually, I think I can just delete the whole application. Just say consumer and delete it. So we could either delete the whole stack here, that's one way to do it. And then I could also go here to the producer and delete their stack as well, because it has infrastructure as code automation associated with it as well. And then once I've got that working, I also should go back to sqs as well and delete that q and just kind of clean that up, delete it would just say delete and delete the producer q as well, just to be safe to it that. And I just saw it's a best practice once you've gone through and created a bunch of stuff to clean everything up and that's it, that's the final project