In my last post, I described server recovery solutions. They are manual and slow, and used for disaster recovery purposes. For mission-critical applications with rigorous high availability requirements, one must rely on automatic and quick solutions like failover clusters. Failover cluster solutions achieve high availability by managing applications with redundancy of hardware.
Failover cluster solutions are provided by leading vendors, for instance: Symantec, HP, IBM, Microsoft, ORACLE SUN and Red Hat Enterprise Linux. All failover cluster solutions with the exception of Symantec’s cross-platform cluster software, are proprietary in nature and specific to the vendor’s preferred Operating System. According to Gartner , the vendors mentioned above accounted for more than 75 % of the clustering software market in 2009. SAP ERP landscapes (including SAP ASE database server) can be secured with these clustering Business Continuity solutions.
First I will describe how failover cluster solutions are structured and how they work. Then in part 2 of this topic, I will detail the integration of SAP ASE with failover cluster solutions from various vendors.
Failover Cluster Solutions – An Overview
All cited cluster solutions use distinct terms but they all deal with basic concepts or components. Below are definitions of the major terms used by providers of cluster solutions.
The main purpose of a failover cluster is to ensure availability of resources. What is a resource? A resource is an application or whatever is needed to have an application running. Typical resources are IP addresses, logical volumes, file systems, database systems and applications themselves. Operating systems manage resources as well but the scope is “local”; they provide interfaces to hardware resources, and schedule logical resources like processes. What is unique with failover cluster solutions, is that they extend resource management beyond a single host. In a cluster, resources are managed in a hardware-independent way. Resources are not exclusively bound to a single host but can be managed (imported, started, stopped, exported) by the other hosts belonging to the cluster.
Resource Groups, Packages, Services or Application Groups:
Usually resources are gathered into containers and organized hierarchically. For instance, let’s consider a production SAP ERP landscape as a resources group. SAP ERP application relies on SAP NetWeaver processes and SAP ASE database server process. NetWeaver processes (Enqueue, Message, Dispatcher, Gateway and Dialog) in turn rely on other resources like IP addresses and file systems. SAP ASE database server relies on IP addresses, file systems and logical volumes.
Resource groups are managed by the cluster software in an atomic way. At any given time, all the resource items within a group are attached to one and one host and cannot be migrated individually.
A failover cluster is based upon a set of redundant computers. They are called physical hosts or nodes.
Failover Clusters – Structure and Working
The following schematic from Klaus Schmidt’s book “High Availability and Disaster Recovery”  effectively depicts how a cluster is structured and how it works.
We have 3 architectural layers. Each layer is responsible for a specific area.
The cluster layer is responsible for managing cluster resources. The main task of the cluster layer is to start/check availability/stop resource groups.
- The cluster software runs on each physical host of the cluster.
- The cluster software periodically triggers a heartbeat check on all the nodes of the cluster. The heartbeat technique employed depends on the specific cluster solution from a vendor.
- If the cluster software detects and confirms a failure of a resource item belonging to the resource group on a node, it disables the resource group on the node with the failed resource and generates a failover of the resource group to the next available physical node in the cluster.
- The hardware layer is responsible for running resources; it consists of the physical nodes belonging to the cluster (node A and B). For the sake of simplicity, we have only 2 nodes here, but additional nodes can be added. The maximum number of nodes depends on the cluster solution being used.
- One of the physical nodes of the cluster is active at a time (here node A). It supports the resources group and runs the application on top of it.
- Usually, there is a “preferred” physical node for a application where the application runs by default.
- The physical nodes can be hosted in the same location or in distinct locations. Depending on the nature of the cluster solution and the distance, distributed clusters are known as stretch, geographic, metropolitan, continental, or global clusters.
- The persistence layer is responsible for data. It features a shared storage capability available to all the physical nodes in the cluster. It is the “glue” allowing the applications of the cluster layer to work.
- Logical volumes of the databases can be accessed by all the physical hosts. But data is only presented to the active node.
In case of failure, logical volumes are released (exported) and taken up (imported) by the next available physical node. A database recovery process takes place to guarantee the consistency of data.
Setting up SAP ASE on a Failover Cluster
Declaration of the database resource:
A database server like SAP ASE is considered as a resource inside a resource group (or a service). It depends on other resource types: logical volumes for data and transaction log devices, file systems for its software distribution, and event logs and one or more virtual IP address(es).
Setting up SAP ASE resource involves registering the following information:
- ASE database server instance name, associated Backup server name, distribution path, monitoring login/password […]
- Startup script to start the resource
- Shutdown script to stop the resource
- Probe script to check the availability of the resources, probing delay, login […]
Some vendors provide frameworks (or agents) to manage SAP ERP application or SAP ASE database resources (like HP, Symantec VERITAS, ORACLE SUN). When an agent is not available, it is possible to register SAP ASE as an application resource. The declaration step is done once.
Validation of the setup:
The final (and most important) step of the setup is to validate the behaviour of the database server resource in the failover cluster. Before delivering the database resource, the following tests must be carried out at a minimum on each node forming the cluster:
- Start/stop of the database server resource as part of the cluster (with cluster monitoring)
- Start/stop of the database server resource without cluster monitoring
- Graceful failover
- Hard or abrupt shutdown of database server, physical hosts
These validation tests must be done in order to check if the database server cluster is properly set up and also to adjust cluster parameters like monitoring timeout. Cluster validation tests are often performed by the cluster vendor’s consultant. Involving system administrators and DBAs ensures that best practices and key operational requirements are incorporated into the clustering solution.
Documentation and training:
Knowledge is often the weak link of a cluster. Human error is a common source of failures when implementing a cluster. Thus, it is highly recommended to maintain comprehensive documentation for cluster management. Basic tasks like like starting, stoping, enabling or diabling monitoring must be documented.
Once the declaration is done, the database server is managed by the cluster software. This means that regular administration commands to start/stop the database server (like startserver –f RUN or isql/shudown commands) must not be used anymore (when cluster monitoring is on). Otherwise, they will interfere with the monitoring process of the clustering solution. As soon as a database server is controlled by the cluster layer, only cluster commands must be used to start/stop a database server. This said, it remains possible to turn off cluster monitoring temporarly and use regular administration commands.
The information in this post is generic. In my next post, I will discuss vendor specific cluster solutions.
 “Competitive Landscape: Clustering Software Market, Worldwide” GARTNER December 2009.
 "High availability and disaster recovery: concepts, design, implementation", Klaus SCHMIDT, Springer 2006, ISBN 3540244603, 9783540244608
 1650511 SYB: High Availability Offerings with Sybase ASE
Stay in the conversation by following SAP Services on SCN