Let me shortly introduce myself: My name is Thomas Walter and I’m one of the architects who designed NW Cloud. During the last months I got a lot of questions about how NW Cloud is working behind the scenes. In this blog post I will try to answer the most common ones.While I cannot outline all the details for several reasons (security and competition etc.) I hope you still get a good overview of what is happening within our infrastructure.
To structure this article we will look into four different kinds of interaction with NW Cloud:
- Deploying an application
- Starting an application
- Running an application
- Stopping an application
1 Deploying an Application
1.1 High level Architecture
Below you find a simplified high-level diagram showing some entities within NW Cloud (click on it for better readability). The yellow walls show firewall boundaries. A request from your browser starts on the left side of the picture in the ‘Internet’ box, walks through load balancers in the DMZ and finally reaches your application which is located in the ‘Your NW Cloud Applications’ segment.
In the box labeled ‘NW Cloud Services’ you find all the services your application can potentially use.
There are entities, which you do not directly interact with your app, but which are important to keep NW Cloud up and running. They are located in the segment called ‘NW Cloud Infrastructure’. The most prominent piece there is the ‘Orchestrator’. We gave it this name because it manages the complete NW Cloud landscape.
1.2 The Deploy Process
When you deploy an application via the Eclipse IDE or via the NW Cloud command line tool you let NW Cloud know there is a new application. Technically, the Orchestrator offers a Deploy Service which listens for deployment requests. The protocol used to communicate is REST-based via HTTPS. During deployment you connect to this service and you’re authenticated based on the supplied credentials.
During this process mainly two things are happening:
- Your JVM byte code and the attributes you specified are transported into the central NW Cloud Repository.
- All the NW Cloud Services are notified that there is a new application, which potentially wants to use them. To provide an example: As the NW Cloud Persistence Service gets aware of your application being deployed it creates the necessary resources for you like a schema in the DBMS.
Now that your binaries are in the NW Cloud Repository and all NW Cloud Services have prepared resources for your application to use the application is ready to be started.
2 Starting an Application
Before I explain what happens at startup let me dive into some of the infrastructure fundamentals. Each running application process gets at runtime on a dedicated virtual machine (VM). You don’t have to deal with these VMs directly, instead you only see the application process. We decided for VMs and not for other approaches because virtualization gives us a lot of advantages we wouldn’t get otherwise.
Some of them are:
- Applications become “immortal”: When we have to replace servers with newer ones we can simply move the running VMs between these servers. So your application continues to run although the hardware is exchanged. We do the same if we need to maintain our servers or other components.
- Additional security: Each VM runs as a guest on a virtualization host – often referred as ‘hypervisor’. We can restrict the network traffic of the VMs on the hypervisor level where it is out of reach for the process running a particular NW Cloud application.
- Saving hardware and energy: Most of the applications we see on NW Cloud are only stressed for comparably short time frames. With virtualization many applications share the same CPUs so we can run more applications on less hardware and save energy and costs. Of course, we make sure that the physical CPUs don’t become the bottleneck for any application. So there’s always enough room for sudden load of an application.
As we are committed to open source and open standards we decided to use Xen as virtualization technology and Linux as our operating system. Of course we are observing new trends like Linux Lightweight Containers and other exciting developments. So it might be that at some point of time we introduce new virtualization technologies.
2.2 The Infrastructure as a Service underlying NW Cloud
NW Cloud – which is a PaaS offering - is decoupled via an abstraction layer from the underlying IaaS.
The most prominent task of the IaaS is to create (virtual) machines & storage resources and manage their lifecycle – all this is done via an API. We completely hide the underlying IaaS and its complexity from the end users as NW Cloud is clearly focused at PaaS.
The SAP internal IaaS we use lives within our data centers and is the same IaaS which is used already for many other SAP On Demand applications like SAP Business ByDesign. This doesn’t mean NW Cloud is based on Business ByDesign – it just uses the same infrastructure services. The advantage of having one shared infrastructure is obvious: We need only one 7x24 team and one skill set to care for several of SAP’s applications.
NW Cloud is not only a platform for SAP external developers. We used NW Cloud even before we opened it to the public within SAP for developing some of SAP’s Java based OnDemand applications. Here’s a document that provides a list of applications using NW Cloud: http://scn.sap.com/docs/DOC-32389
2.3 Steps happening during Start
But let’s get back to the interaction between NW Cloud and our SAP internal IaaS.
Do we request a new virtual machine from this IaaS each time somebody starts an application? The answer is ‘No’. We have decoupled the request for a VM from the physical instantiation. So we have always a pool of prepared VMs we can use.
So what exactly happens if somebody starts an application?
- You already learned there is a central managing entity in NW Cloud we call the ‘Orchestrator’, which cares for keeping the landscape up and running. When a user starts an application this Orchestrator gets a request with the account and the name of the application.
- The Orchestrator takes a VM out of the VM pool. If the pool becomes too small it will be filled with fresh VMs in the background. These VMs have already most of the infrastructure needed installed on them.
- The Orchestrator provisions the correct execution environment on the VM and the requested version of the Java application server. After this, it takes the user application from the NW Cloud repository and deploys it on top.
- After this provisioning step the VM is registered with several NW Cloud internal services: A monitoring agent on the VM collects the logs and forwards them to a central log service. You can find the log service in the architecture diagram in the ‘NW Cloud Infrastructure segment’. As a developer you can access these logs in your NW Cloud Account Page when clicking on ‘Logs’.
- Finally the application server becomes started. An agent on the VM checks its state during the startup process. You can monitor this state in the Account Page: First it is ‘Pending’ while provision is happening, later it switches to ‘Starting’ and finally to ‘Started’. If there is an error in the application which prevents the start, the state switches to ‘Error’.
- Once the Orchestrator detects the application is started and ready to receive request it registers the application with the load balancers so it becomes externally available.
3 Running an Application
3.1 Calling into the Application
The first thing a request from the Internet hits is our load balancer infrastructure. We are using highly available hardware solutions. As one would expect they reside in a DMZ. The idea of a DMZ is that each incoming connection is terminated here at a proxy – in this case the load balancer. This allows the load balancer to check the content before it forwards the request to the application, which sits behind the DMZ (see diagram in the beginning). This is as well the reason why we allow only https and not any other protocol.
If you have started several processes of the same application the Orchestrator registers all your processes with the load balancer. The balancer will distribute the requests coming from the Internet among these application nodes. If there is a new client it will be forwarded to the node, which currently has the least number of connections. We support session stickiness: If a client is returning it will be always connected to the same node it was in contact with in the past.
As you see there is no direct connection from the Internet to your application, but the load balancer is in between.
3.2 Outgoing Calls from your Application
What about the NW Cloud services which your application is using during runtime? Can they get accessed from the Internet? The answer is ‘No’: Only your application living in the application segment is allowed to connect to these services. As you can see in the first diagram they are living behind firewalls in their own realm.
If your application wants to connect to a destination in the Internet the load balancers are not involved. Instead http and https traffic is forwarded by an http-proxy (see first diagram). It is additionally possible to have a direct route to SAP external servers bypassing the proxy. But we enabled this kind of access only to very few trustworthy destinations. One example is the servers Apple hosts which you need to access if you want to use Apple’s Push Services.
3.3 Impact on NW Cloud Updates to a running Application
The NW Cloud development team follows Lean principles and works in Scrum mode. As a result of that we produce new functionality on a bi-weekly schedule (see Release Notes). This means twice a month we have some updates for our productive landscape. Normally this requires an updated version of our Orchestrator.
We inform you about an upcoming update via an infrastructure downtime announcement. The standard message in the announcement is “Development operations like deployment, starting and stopping of new components will not be possible. Productive applications will not be affected.”
During this time – normally only for minutes – we take the central Orchestrator down. This is why you can’t deploy or start anymore. Your running applications are not impacted whatsoever: they are still registered with the load balancers and can continue to use the NW Cloud services. After we updated the Orchestrator we restart it. During restart it rediscovers all the running applications automatically. As you can see the Orchestrator is no single point of failure but taking it down is daily business.
4 Stopping an Application
After you read the chapters before this step is easy: If you trigger a stop command your application becomes unregistered from the load balancer. After that the application server gets a stop request. Final step is to deregister the VM from the logging infrastructure. Your deployed application stays in the NW Cloud Repository so you can restart it at any time.
I hope I could contribute to a better understanding of the basic NW Cloud infrastructure concepts. Many interesting topics have been omitted like monitoring, auditing, connectivity to SAP OnPremise systems using the SAP Cloud Connector and so on. Keeping it simple hopefully makes the fundamentals clearer – and there is always the opportunity for a follow up blog.