Hi, everyone. I'm Mo Yuan from Alibaba Cloud Kubernetes team. Today my topic is Saving Cost with Autoscaling on ACK. Here's the outline of this topic. First we will talk about what is autoscaling and why we need autoscaling, then I will introduce how to do autoscaling on ACK, give some examples of typical scenes. Finally, I will show a HPA demo based on external metrics. In a recent search of over 900 tech leaders, we discovered that, migration from IDC to public cloud and the private cloud is still accelerating. The adoption of cloud native project is more and more common. Kubernetes adoption is growing so fast. It is said to jump from 20 percent to 30 percent, 10 [inaudible] increase by 2020. More than 56 of tech leaders care about how to improve resource utilization. Kubernetes is a standard cloud native interface to accelerate IT infrastructure transformation. Saving cost is the top of one factor to be considered. How to save cost on Kubernetes. Here's a chart about capacity planning and the burst resource usage. The yellow area is actual usage of your workload and the red one is about capacity planning. When the red line is upper than the yellow area, that means you have enough resources to run workload. But the larger the spacing, the more you waste. When the yellow area is breaking through the red line, that means you don't have enough resources to run workload. Even the running workload may be affected of insufficient resources. Is there anyway to solve the conflicts? The answer is yes, autoscaling is the chosen one. On the right side, we have two real custom utilization charts on ACK. In the first chart, it weighs about 40 percent resource to guarantee the availability of the custom. It's not good enough but it's real in a lot of companies. The second chart is about resource utilization, we use autoscaling on ACK. The spacing between capacity planning and the burst resource usage is changing with each other and they keep it at a low level. It helps to save about 30 percent of the cost. Autoscaling is a game between capacity and planning and a burst resource usage. The better autoscaling you did, the more cost you save. In the next section, I will introduce how to do autoscaling on ACK. Here's the architecture of autoscaling on ACK, we have two kinds of us autoscaling categories, application autoscaling and the resource autoscaling. HPA is horizontal pod autoscaling. You can use HPA to autoscale pod based metrics. VPA is vertical pod autoscaling, it's not that popular and we will not talk about it in details today. Cron HPA is another part of scaling method. You can autoscale pods based on Cron tab rules. It's useful where you can know resource usage trend [inaudible]. In resource autoscaling, we can autoscale plan VM Instance, EBM Instance, GPU instance, and a Spot Instance. We also have a unique resource autoscaling method called Virtual Node. You can regard Virtual Node as a giant node with a limited resource. When pod schedule to the Virtual Node, it will finally be created as a civilized part without any node. HPA can autoscale pod based on three kinds of metrics. CPU and the memory are the most popular resource metrics. Custom metrics is for the user defined metrics such as the number of people on your application and so on. Typical external metrics is about cloud service, such as QPS of the load balancer. The richness of the metrics determines the strength of the elastic scalability. In ACK, we provide all kinds of metrics based on multi-dimension monitoring architecture. We have four kinds of monitoring cost service for different application types. SS is the log service in a collected application log, control blind log, and a getaway log. ARMS is a APM Cloud service for application performance timing. You can use ARMS to virtualize the status of GBM and deep dive into your application timing performance. AHAS is architecture awareness Cloud Service. When you are using micro-service to structure system, AHAS can help you to visualize the network flow between different mixed service. Cloud monitor is that traditional resource monitoring service. We also have Manager Premise Service as outdated with choice to the open source world. Based on the monitoring architecture, we transform all kinds of monitoring metrics into three categories. In resource metrics we provide a CPU, memory, network and so on. In custom metrics, we not only provide the standard metrics interface, but also technical solution about how to use custom metrics with our application. In internal metrics we can ward all kinds of metrics we collected, such as a lock, APM, micro-service, and others as a scaling factor. A comprehensive monetary architecture make HPA shiny. HPA is great, but not perfect. It may not scale up in time due to metrics scrapping. If you can know the resource utilization trained in advance, cronHPA would be all great supplement to the plain HPA. cronHPA can also work with HPA. You can defy the time-based scanning plan like cron rules. CronHPA is a standard CID implement, so it's easy to use and to do integration of a system. Node autoscaling is a kind of resource autoscaling method. The Cluster-Autoscaler, would watch pods changes from API servers. When the pods are un-scheduled because of insufficient resources, cluster-Autoscaler would choose a scaling group to provide a new nodes for the cluster. When the new nodes join to the cluster, the autoscaler pod will be rescheduled to the new OS. In Alibaba Cloud, 500 Kubernetes node can be scaled up in 90 seconds. Virtual-Node is a very unique resource autoscaling method. It is serverless autoscaling rather than the API one. You don't need to care about what is the node and how many resource I need. What do you need is to bond the pod to the Virtual-Node. You can use node selector and tolerance to do that. When the pod scheduled to the Virtual-Node, it will finally create server as a ECI Pod. This kind of autoscaling method is very useful when you do ci/cd, big data, and jobs. It's fast, simple and with low learning costs. In the next section, I will introduce some typical scenario. AI training with GPUs autoscaling is very common on ACK, where you create a TF job with GPU request. If the cluster can not scheduled the pod, cluster-Autoscaler will choose the best autoscaling group based on simulation of scheduling and the lowest cost. The GPU scaling group would progression a new GPU node with a media continue run-time in stock. When GPU call, bonded to a single pod is the most common way. But sometimes it's not the best way, because of the low resource utilization. We support the GPU share scheduling and the autoscaling. You can use GPU share feature to save cost. Multi-pods share the GPU name of one GPU card. Isolation between pods is also supported. Now, it is demo time. In this showcase, I will introduce a real user scenario. Ingress is the most popular gateway in Kubernetes. For most of online surveys, CPU is not always the best choice to do HPA. QPS may be a better one. How to monitor QPS of your workload in Ingress Gateway and how to do autoscaling based on QPS. In the previous section, I introduced the log service can collect the log of Ingress Gateway. It can also do some analysis on the log, visualize the net-flow and convert them as metrics with the help of Alibaba Cloud metrics adaptor. Finally, HPA can calculate the replica of the workload based on QPS. Let's start with a demo. First of all, you need to create a Kubernetes cluster. We have a lot of [inaudible] of Kubernetes cluster type. Standard managed cluster is the most popular one. This type of cluster fully manage your master node and to help you save computing resource. You only need to create worker node to run your business. Fill the blank to your cluster name. Select Region. Kubernetes Version, Container Runtime, and we choose VSwitch of VPC. Select Network Plug-in, Pod CIDR, Service CIDR. Be careful, if you want to manage your cluster on your own laptop, you need to choose a public size Expose API Server with EIP. If your cluster is very big, you need to create advance security group. Then you need to choose instance type of your worker instance, and select the size of the cluster operating system, [inaudible]. If you want to use Ingress in your cluster, you need to check this box, Install Ingress Controller and choose correct SLB network type. Typically, we will choose Public Network because if you choose Internal Network, you cannot access the Ingress Controller from the Internet. If you want to use Ingress as external metrics to HPA, you need to check this box, Enable Log Service, and Create Ingress Dashboard. Then confirm order. Check this box and create a cluster. While your cluster is running, you can jump into the Manage tab to get us some basic information. You can get API Server Public endpoint, API Server Internal Endpoint and access log project, Nginx Ingress SLB and the important one is you can get KubeConfig here. Public access KubeConfig, Internal access KubeConfig. They are different. If you want to manage your cluster on your own laptop from the Internet, you can copy the content, and paste it to the kube folder. Create a file called config and paste the content. Then you can manage your cluster from your own laptop. Let's create a deployment. Create from template. Select the correct namespace, use the default deployment template. Create. Then you can get a deployment here with two replicas. Let's create a service. Name the service as nginx, select cluster IP as service type. Select a backend we just created, and name the port. Create, we will get a service called nginx, and then create ingress service, name ingress service as nginx, select a domain, fill the paths, pick any service, okay, choose port, create. Okay, then lets visit the website. Okay, it works. We can found, there is a link here called Ingress Overview, what is it? Lets click it. That is ingress dashboard, you can select a new time range, set auto refresh, you can view the PV, UV, success rate, error rate, latency and other useful information from this page. Then lets create an HPA based on QPS. Before we create an HPA, we need to install Alibaba Cloud metrics adapter. You can create Alibaba Cloud metrics adapter from application catalog. Click ack-alibaba-cloud-metrics-adapter, and where are the parameters? Typically you don't need to change any parameters here, just click quit, then you will get a deployment called ack-alibaba-cloud-metrics-adapter in kube-system in namespace, and lets create an HPA. First we can view the HPA yaml. You must use HPA versionautoscaling/v2beta2 and use external type metrics. Fill the required parameters, sls.project, sls.logstore and sls.ingress.route. The sls.project is Kubernetes, log, class ID, and sls.logstore is nginx ingress, sls.ingress.route is namespace, service name, and the service port, and the target value is 10. That means if every port has more than 10 QPS, then HPA control will scale up new port, kubectl apply hpa.yalm. We can watch this HPA, and we can use ab to simulate a stress test, ab, we can cite request to a very big value and connection to 10. Copy the Ingress in point and start stress test. We can view the ingress dashboard to check the stress. We can use ingress dashboard to ck the stress. Here we can see the PV is increasing. Okay, last check, HPA. We can find HPA is working, current [inaudible] on the other side is four and the five is desired, and if the stress it stopped it, HPA control will scale down later, before few minutes, we will find the deployment has scaled down to two [inaudible]. I thank you for watching for this topic.