- [Raf] Depending on the use case, you may not have the need to ingest data via streaming. A typical use case for using batch data ingestion is when data has already been produced and is sitting somewhere, such as on-premises data centers or external hard drives. You may also want to use batch data ingestion if you're producing large amounts of data and the time required to produce insights is not time sensitive. For example, ingesting data periodically and running reports once in a while. There are many AWS services and services from AWS partners that can be used towards ingesting your data in batches. In this video, I'm going to talk about AWS Transfer Family. AWS Transfer Family is a serverless AWS service that gives you a file transfer endpoint in AWS without the need of running any server infrastructure. Let me show you how it works in a demo. In this demo, I am going to create an S3 bucket in my AWS account and an SSH key pair on my laptop with a private key and a public key. Then I will go to the AWS Transfer Family Management Console and create a publicly accessible SFTP server, configure it to allow file transfer over SSH, also called SCP. Then next, I am going to create a SFTP user called raf, associate that user with my public key and use an IAM role to grant permissions to the S3 bucket I created. Once everything is configured on AWS, I will open my FTP clients and test everything out. I'm going to be using FileZilla for the FTP client. So let's go. So the first step is opening my terminal and doing the command, aws s3 mb, then I give an S3 endpoint for the AWS command line s3://sftp-test-raf mb as in make bucket. So this is an AWS CLI (command line interface) command that will create a bucket for me. If we go to the AMmanagement Console and click in S3, we will see that the bucket had been created in my account. Now, let's go back to my terminal and I will be going to the folder /tmp, a temporary folder in my computer. I'm going to create a directory called FTP and inside this directory, I'm going to create the SSH key pair, the public key and the private key. And I can do that with this command. ssh-keygen -t rsa Here I specify what will be the path of the file that I will want to create, id_rsa. And in this case, I'm going to put no passphrase. So the public key and the private key had been generated on /tmp/ftp. And the private key is called id_rsa, and the public key is called id_rsa.pub. If you see the contents of those two files, the id_rsa contains the OpenSSH private key, and the id_rsa.pub contains the OpenSSH public key. This is the key that we'll be interested on copy and pasting on AWS Transfer Family when we create the user Rafael. So now let's go to my Google Chrome again and go to AWS Transfer Family. What I would like to do is to click here in Create server. And here I can choose what are the protocols that I want to enable for my AWS Transfer Family server. It says server, but AWS will be managing that server for you under the hood. So that's why we end up saying that this is a serverless service, even though there is a server running behind the scenes. In this case, I will choose SFTP, which is the file transfer over Secure Shell. Then I only am going to click Next. Here is if I want to choose a custom identity provider. In this case, I'm going to have the AWS Transfer Family to manage the users and permissions for me, but I could use API Gateway with a custom identity provider. Here is what I want my server to be publicly accessible, or if I want my SFTP endpoint to be only accessible from within an AWS VPC, which is a virtual private cloud. In this case I want it publicly accessible and I will be relying on the key authentication. Here is where I could provide a custom host name. So if you already have an FTP server in your organization, and you want to use that custom domain name, you can use Route53, which is AWS domain service to use the custom domain of your preference. So if you have, for example, ftp.yourcompany.com, you can use a Route53 DNS here. In this case I'm going to use none, and I'm going to use the endpoint that is provided by AWS Transfer Family. I'm going to hit next. I'm going to pass over those, keeping everything on the default. And here it uses a role for logging. So remember that AWS Transfer Family can log everything to your CloudWatch logs, such as the connection that had been made, connections that are closed, put object, get object activity, or put file, and get file on the FTP nomenclature. And it writes that on CloudWatch logs using a role, an IAM role. You can choose an existing one, but in this case, I'm going to ask the service to create one for myself. Tags are important to keep your infrastructure organized. So let me put some texts here, such as Creator, Cost Center and other things that could help me doing billing reports based by tags. I'm going to hit Next. Then this screen here gives me the option of reviewing everything that I've chosen so far and creating. So the server creation can take a couple of minutes. Meanwhile, we can add users to that server. So we click here on the server ID, we scroll down to users, and we hit Add User. I'm going to create a user called raf, and here I must choose a role that I want the AWS Transfer Family to assume on behalf of this user, when an FTP connection from this user comes. I have already previously created the IAM role that will allow that, which is called Allow-SFTP-User-To-Specific-S3-Bucket. In AWS, we'll talk a lot about the least privilege principle, which is only allowing users what the permission they need to do the specific action they want. So after I create this user, while the server is being created, I am going to the IAM console to show you how is the anatomy of this role. So here I have the role previously created, I just choose the role, and here I choose what will be the SFTP home directory for this user, which is the bucket that I had created. Now on the SSH public key, I can go back to my terminal and copy the public key and paste the public key here. Which means that this public key has this private key as a corresponding key, so whenever we need to authenticate to that SFTP server, this is the private key that we will be using because it corresponds to the public key that we are informing to the service. Then we hit add, and now I have the SFTP server being created, it is still starting, and this SFTP server already has a user associated to it. Now let's go to IAM, so as you can see how that role is created. If you go to IAM and click here in Roles, you're going to see that I have a role called Allow-SFTP-User-To-Specific-S3-Bucket, which is the role I chose for the user raf. And every IAM role needs to have a trust relationship, which means who can assume that role and the permissions, which are the powers of that role. In this case, the trust relationship of this role is transfer.amazonaws.com, which is the name of AWS Transfer Family in terms of API to IAM. The permissions are allowing that user only to a specific bucket. I don't want to allow that role access to any bucket because we want to reinforce the least privilege principle here. So I created an inline policy and edit that inline policy to this role. If we expand here, the JSON document for that role, we are seeing that this specific role has a permission that will allow this role to ListBucket and GetBucketLocation only for that specific bucket. And also only for everything inside that specific bucket, I want to allow, PutObject, GetObject, DeleteObjectVersion, DeleteObject, and GetObjectVersion. This is reinforcing the least privilege principle, because now the user raf on AWS Transfer Family will only have those permission sets in the specific bucket. Now, let me go back to my AWS Transfer Family and see if my server had already been created. It says that the server is online, so now it may be time to open my FileZilla and test everything. If you click here, you also have the option to stop and start the server. And if you click here in View logs, you are redirected to the AWS management console of CloudWatch logs, where you can see which users connected, which files had been uploaded and downloaded via FTP. Before opening FileZilla, we would like to copy the endpoint where we should point our FileZilla to. Now let's open my FileZilla, and I'm going to click here and create a new connection pointing to that specific host name. I'm going to be using a key file, which I already have populated here in my FileZilla, but /tmp/ftp/id_rsa is the very same file as this one. So once I click in Connect, I click in OK to accept the key negotiation handshake. It is using the username raf authenticated by that key. And you can see that on the remote directory, I have my home directory, which is sftp-test-raf. Now, if I go back to my terminal, I can create a file. This is a text file. I save the file. Now, if I go back to my FileZilla, the file will be there, file.txt on /tmp/ftp, which is the file that we just created. And if I drag and drop this file from the local directory to the remote directory, what I have happening under the hood is an FTP boot to a server that I don't need to manage, and that server is assuming a role in AWS and doing a PutObject operation to Amazon S3 on behalf of that role. If we go back to the AWS Management Console and go in S3 and open our bucket, we are going to see that file.txt is inside our bucket sftp-test-raf. This file is not public by default because the bucket is not public. Private buckets is what you would like to have for your data lake assets. See how simple it is. There are some other features you can use such as custom domain names, custom identity providers for user federation and much more. AWS Transfer Family logs all the activity in CloudWatch logs for further auditing, if needed.