Hello everyone, I'm LiShan, I'm senior solution architect, Lead Cloud and Native Solution. Today we'll talk about migration application to Alibaba cloud, if the applications are containerized, that is easy to lift and shift. But when we're in migration applications, we also need planning communities cluster with cloud resources. Such as how many worker nodes we need, how many resources capacity we need? Which network and the storage is it fitting for our requirements? So this topic will cover migration steps overview, cluster required or communities planning. Network planning guidance, storage planning guidance, checklist before production goes live. First of all, let's take a look migration steps. From our experience, migration application includes five steps. Step one, cluster required or resources creating under configuring, we should plan which type of communities cluster we will adoption. How many worker nodes we need, which region to provision the North and center? Step 2, date-time migration, includes a docker image migration, pick pace migration and storage migration. In each part, may also provide tools, help you to easily migrate. Step 3, application migration, either aerial CICT platform to make, continue deploy to cloud. Or rely on the tools which is provided by Alibaba cloud, help your application migrate, helps to tell information you can get from the Git repository. Step 4, testing the application, config or test helps to make it can be accessed, test other features of applications. Make sure major features can be work, such as connected database, fill storage, etc. Check application locks can be connection, check cluster and application metrics can be monitoring and alerting. Step 5, switch traffic, update device configuration to switch the traffic. If your application code includes URL, don't forget update code, or configuration of your applications to make sure that URL is right. Next, I will briefly introduce docker image migration, pick pace migration, and the storage migration. And then I will dip down on cluster required resources planning, Docker image migration, Alibaba cloud provide tools, named the image-syncer. And they open source it, you could get it from this Git repository showing on screen. This tools be able to synchronize docker images from Harbor, DockerHub, Aws ECR and GCR to Alibaba Cloud Container Registry. If your application running on VM, firstly, you need make your application to containerized. You can be done from capture VM image, convert to docker image, if you want to change the docker image, you could update the Dockerfile. And build a new docker image, and then push to docker container registry. Data backup and restore to persistent volume, it can be done from backup volume. Restore the volume on cloud, generate PV, persistent volume. If your application provide a multi access entry, since I do slice list, for list page. Slice detail to display a detailed page, you could define virtual service by istio, or configure or you assign the ingress to image Aren't you know network path can be accessed in the communities? Cluster. It faced migration. And come back to cloud to provide more than 20 services and tools, Congo OLTP or our AP. No cycle with base management pillars like crazy. This is one of the solution. If your data is strong Oracle features you could migrate to polo DB Oracle. If your database is weak Oracle features you could make migrate to my cycle or postgre SQL. Some tables also kind thought you SS and then takes be be be able to analysis data which is all OSS interface. For our PP had all our VP, we provide a large scale stories and high performance and night pics ability. Now great migration may provide online and offline data migrate to Alibaba Cloud. Offline migration, the scenario is for local IDC cannot connection to Internet or bandwidth limitation. So we can transport with provided by AlibabaCloud that can achieve local data migrate to cloud. The data sauce supported then I advice nurse HDFS faster the advice or CFS in stranger. For online migration, they provide the migration service. Play Pacman network or leslye. Migrate data put Alabama Cloud OSS on us. But it sauce supported object stories. Or fail stories next class. About may briefly introduce Docker Image Migration, big based migration stories, migration next, less sleep time to communities cluster required a resources planning. Firstly, you should determine all application will migrate to Cloud. Pull a doctor pepper it cloud strategy. If a doctor hybrid cloud has stretched it, which type of hybrid cloud strategy we will use? For example, apart over applications running on cloud. Apart of applications running on my PC and use CI CD platform diploid applications to market cluster, you GTM managed global traffic or all applications, Migrate Cloud, but it pays. Keep running the IDC. All metal applications running at DC by skill up to Cloud. And next select our supercluster. Managed or dedicate or civilize cluster or GPU. Bear mental cluster or windows edge computing cluster. I calculate resources capacity. Now Prince employees about stress testing to calculate how many capacity, inconsistent and how many capacity will auto skill to cover. Stress testing each microservice. There was just testing. We will know the microservice maximum workloads. What kind of metrics is fitting for auto scale? So it's hard to keep CPU or computers or latency. Also need a stress test in business module. Firstly, mission know the meaning of this business module. So what is the meaning? For example our ecommerce platform by inch and Jackson part that provided by the commodity to shocking hard you want free calculation. Other preview. Rocky went re assign order etc, etc. So those features come achieve our business module. So through stress testing the business module. We should know the maximum scalability. Because we know the normal cookies plus how many module you have. So we can calculate how many consistent nodes may need and how many nodes will well auto skilled coverr. And we also will know what kinds of scalar is better. Cluster code scalar or watch your node scalar. Cluster scalar means that will add more node to the cluster which will know the means skill poured running on ECI, elastic container instance. For example, if we have one Android and micro service. Her microservice need five instance hurriedly? Each instance is two to four gig bit. And the redundancy is 10%. So the total capacity is 1104 and a 2200 gigabit. How can we select a node? This have four steps to reference. First one, choices suit for instance time according to applications and corrector. Alibaba Cloud provide extensive instance type for your choice. Sometimes it general computing optimized memory optimized, and schedule. ECI instance is local SSD. Or ECS etc, etc. Next, if you are creating dedicated cluster, the master know their capacity according to how many worker nodes running. For example, if you have one to five work nodes, the master node should be full call eight gig bit. If they have six to 20 work nodes, the master node should be full call 16 gigabit. If they have 21 to 100 work nodes, the master node should be a call 32 gigabyte. Full worker nodes how many capacity of worker nodes, we could from previous mention that method to calculate? Do remember from stress testing to get consistent capacity plus how many microservice we have. So if we calculate the capacity more than 1000 core, we suggest using EBM. He says for mental as worker node if less than 1000 call experience is a single instance capacity as higher as you can. That will reduce worker nodes on mount. Therefore, increase master scheduled performance increase resource utilization because remain fragments are nice. Also reduce our prison costs. Redundancy suggestion remain useful capacity can be running pause which are drifting from any ECS has been done. For example, if there have term pause running on any sense, the redundancy used as be able to run the term pause, in case they ECS had been done. Last one, don't forget attach data disk to worker node, because docker image says team lock data will consumption data spaces. If do not have data disk those files will study assisting disk that will make system disk broken cause overstock, but they you attach the data disk, I think he will automatically mount to specific folders to store those data. That will protect the system disk broken. If you protect the environment, may suggest assistant disk and 100 gigabit. Next time, let's say network planning bandits. Alibaba Cloud container service provider two type of network plug in. Flannel and Tremy, under the Tremy network, we also provide the two type. Apps mode. Let me see in the left side of terrain network. That is, a market where Key always said. Watch YP mode and can be achieved by where all I PV land in the right side that is exclusive mode. That means the port will directly connection to network do not have any performance consumption. So how to choice them? There have some experience you could reference. If your application will use network policy. Or your application need high network performance. Or your application will use some special features, settings shared with PC. So we suggest you should choice for a network. But if you just want to use network equivalent in communities cluster, do not have any specific demands. We suggest you should choice fly no network be cause that is easy to use. Let's see the performance pasta in detail of each network mode. Orange color is Ian I exclusive mode. Yellow color bar is IP with IPV lawn mowed the black color bar is a fly, no mode. So we can see even an exclusive no mode aids have a network of term in network. Have a good result and are you DPPPS? That means if your application is gaming or high performance computing, you should try staffing network. Let's see how to configure network in detail. If you plan on at work. May you create a Acunetix cluster. You could define port. And services see idea like you see the image on website. There are certain item mask of. Determine how many ports can be hosting our cluster. He says Houston stack affect how many poles can be running the airport. Yeah, node. That according to uphold, require the resources. For example, I ECS instance type is full core CPU. Each of your code required one core CPU. So that means in the ECS camera for paws or three parts. So how to select IP address per node like red highlight that relevant made to ECS use has had an opposing crowd resources capacity. If you select a large number of IP address per node, but your is is just like ask edule vocals. The IP will be risk will be missed. Intervene network not on the pod appearance defined by the switch. That means you should create a two way switch. One is full node, another one is full port. How many IP can be assigned to port determined by B. And I have exclusive mode and the virtual IP mode, the assigned IP or Mount is different as well. Instance can be binding marketing in life. When you select out by insistence on the table will show how many you like can be bending to the excess. Less than IC stories, planning guidance. ACK support all kinds of a story as cognitive persistent volume. Each story is a fitting for different scenario. For example, Cloud Disc Institute for stateful app did face mask is fitting for blog share. The data says it's good for media and genetic data. CPFS is suitable for each PC or deep learning. Local disk is good for the face or Hadoop. So twice feeding stories for your applications, the right color box highlight Mills. Either you should know that. Notice all that are important features. For example, cloud disk is zone aware. That means the ECS and the cloud disk cannot cross through. Let's imagine a situation. You import yaml file defined a PV. If the PV in domain, that means the pod will be scheduled to domain. But if domain do not have the the port schedule will be filled. Another NAS support subdirectory mount. That means ACK will automatically create a subdirectory of NAS and automatically mount at the pod it created. The subdirectory is isolated. It will not affect other directory besides using for other applications. That way, if you use NAS, we suggest you use version 3. OSS is a fitting for read more and write less scenario may provide distribution cache acceleration ability and be able to achieve 4 Gigabit per second rate speed. By default, ACK provide five star rating class for cloud disk. Cloud disk includes SSD, ESSD, efficiency. But notice the minimum value is 20 Gigabit. So far, we talk about how to calculate the resource's capacity, how to select a network, and storage. Next, let's see checklist before production goes live. Firstly, we should config CPU, memory request, and the limit in your pod. That will avoid error of overscheduling pod or node. Enable image security scan in NCR container registry. That protect your base images, do not have secure risk. Config logging, monitoring, and alerting. That allows you know your applications run your status, microservice and cluster nodes are running well. Config liveness, and readiness. That make application can be accessed after it is truly running. Enable ECS auto renewal. That's avoid application cannot be accessed because of ECS stop with expired date. Adjust the SRB instance type and bandwidth. As a table on the right side shows, the QPS and max connection of each SLB instance type according to your application access situation to adjust SRB instance type. At last, don't forget enable auto scaling and config autoscaling such that you use HPA or CronHPA. Use cluster auto scaler or virtual kubelet auto scaler. Use CPU memory trigger scanner, or through QPS, latency trigger scanner. For autoscaler, both interface layer and service layer should be scaling together. Otherwise, if you just scale the interface layer, it be able to support much more workloads. But the server layer cannot support sufficient workloads. The service layer will become workload's bottleneck. Let me see the demonstration. There's two deployment pods that are accessed from Ingress. Its access log can be stored in log service. HPA trigger by QPS metrics. For the service layer, the two deployment pods, log store, the log service, and application real-time latency can be. So the HPA can be triggered by latency from log service or a RMS. By the way, A CK to achieve those external metrics. Very easy to do. Let me see that image in red side. Just notify external metric CHP you morfil. This defined the latency before the file seconds that will trigger its PA to scale up. Like I mentioned previously. Config logging, monitoring, and alerting that allows you know your application. Run newspapers, microservice and a cluster nodes are running well. Ingress and application log starting log service where NPD to awareness cluster event. All that humans can be more training events, center of log [INAUDIBLE] so its application performance may arms or primitives to monitoring. There are some dashboard of log service. By default log service provider more than 10 baseboard to allows you easy know your cluster and applications healthy status such as ingress dashboard that shows your applications PV UV latency, which area has been access your services and so on. For Event Center may be able to see what events had been hyped in a classical, such as a port. Election polls starting filter image pool filter not discuss space insufficiency etc. The monitoring service I'm too and workflow like this demonstration shows. You monitor your level, enable cloud monitor to monitoring resources and I shall be. For Ingress controller we can wear log service. I'm super meetings and tracing to monitoring and analysis. For microservices, give me and each microservice, wake news arms or arms primitives to monitoring application metrics. In configuring level 4 SRB. We should make sure I shall be used as type. It is sufficient for workloads. We suggest SRB bandwidth upgrade to 5 gigabit. For ingress controller we suggest that before I separate ingress for internet and the intranet traffic. Config CPU, memory requests and the limits for port enable [INAUDIBLE]. For microservices, beat me with suggest isolated deploy. Also need to config CPU memory requests and the limits, configure APM, application performance management, sanitize a RMS or other APM sites. For each microservice, config CPU memory requests under limit enable NP APM. Configure did face catch connects and pull. [INAUDIBLE] I did buy the part of our suggestion that it's immigration application to uncover cloud hope that is useful for you. Thank you for your attention.