What’s New in SPS10? SAP HANA remote data sync

tom_slee · ‎06-25-2015

SAP HANA remote data sync is a completely new option that is included with SAP HANA SPS10, and it is also a mature technology that has been in the market for more than a decade. This blog post explains what it does, outlines three major use cases, and describes a bit about how to use it. And because SAP HANA remote data sync is too much for me to type I’ll refer to it as RDSync from here on.

So let’s get started.

What it does

Technically, the point of RDSync is to synchronize data between SAP HANA and many remote databases. Let’s take a second to say what each part of that sentence means.

First, synchronization. Data replication is the copying of data between two or more databases so that each has an up-to-date copy of the data. Data synchronization is a particular form of replication, built to replicate data across high latency or intermittent networks. It is session-based: each remote database initiates a synchronization session, uploads any changes made at the remote site to SAP HANA, downloads any changes from HANA to the remote site, and then the session is completed.

In many cases the remote database does not need all the data in SAP HANA, or in the same layout, so RDSync provides ways to subset and transform data as it moves between SAP HANA and the remote databases.

Next: SAP HANA. RDSync is based on the SAP SQL Anywhere data synchronization technology called MobiLink, but RDSync integrates the MobiLink synchronization server into the HANA Platform, with many benefits for IT staff who have to manage the server.

Third: “many remote databases”. The databases at remote sites are either SAP SQL Anywhere databases, which provide a rich client/server database on Windows, Linux and other operating systems, or are SQL Anywhere UltraLite databases, a compact database library that can be built into applications for mobile operating systems such as iOS or Android.

There are existing deployments of SQL Anywhere MobiLink with tens of thousands of remote databases, each with hundreds of tables. Scalability, performance, and correctness in managing these complex and large-scale deployments is a key part of RDSync’s role.

You can see from this description that RDSync joins the family of data provisioning technologies for SAP HANA: software that moves data between SAP HANA and other applications or data stores. In the diagram below, the technologies on the right move data between SAP HANA and other enterprise data stores: real-time replication with other relational databases (SAP Replication Server), complex and high-performance extract, transform and load operations (SAP Data Services), and data exchange with Business Suite (SAP Landscape Transformation).

The two technologies on the left can also exchange data between SAP HANA and external data sources, often over public networks. SAP Smart Data Streaming provides a fast way to collect data from multiple sources, and RDSync is more useful when there is a need for structured data at the remote site.

What it is for

The mainstream of the computing world is moving, as we have often heard, to a centralized cloud computing architecture in which all data is accessed over the Internet. Clearly RDSync is built for a different paradigm, in which data is stored at the edge of the network as well as at the center. Why would one use such a technology when Cloud is the mainstream?

There are three common use cases for RDSync that show why a distributed data model still has a role to play, even in this age of cloud computing, and I’ll look at each in turn. One is what we call “satellite servers”: remote workplaces where enterprise systems might not be available in an acceptable way, but where work needs to carry on. A second is one category of mobile applications: those that drive an employee’s whole day, and which are key to their productivity. Third, and maybe most importantly right now, is a set of Internet of Things (IoT) applications: while simple data acquisition is enough for some, other IoT applications need significant amount of computing, and of structured data, at the edge.

Let’s look at each one in turn.

Satellite Server use cases

Satellite servers are remote workplaces, where important business processes take place but which need to be able to operate even when not connected to a network. One extreme example is oil rigs, where an SAP offering called Transaction Availability for Remote Sites (TARS) has shown a way to bring SAP functionality to the most isolated of workplaces.

Oil rigs are technically sophisticated workplaces, but the network connectivity between the rig and the enterprise is a satellite link (very high latency) which can be interrupted at any time, especially when the rig is moving to a new site. Yet maintenance operations, which involve collaboration among workers, must carry on, and in fact are often best carried out when the rig is moving (so not actively drilling).

The TARS solution is built on RDSync and MobiLink technology, and makes SAP transactions available through UI5 web applications on the rig at all times, using a local SQL Anywhere server. It extends SAP business processes to new parts of the company.

Other industries where the TARS solution is finding a use include retail (an in-store server guarantees stores can keep running), other resource industries (mining, forestry), and transportation.

Mobility use cases

Instead of a whole workplace, the remote site may just be a tablet, a phone, or a laptop computer. Most mobile applications do not need a database on the device, and fewer still need a data synchronization architecture, but those that do need it are key applications which drive an employee’s whole day. Examples include inspections applications (asset management) for railways, direct store delivery applications for consumer goods companies, and inventory management applications for the retail industry, as well as industry-specific Customer Relationship Management appliations.

Internet of Things use cases

As with satellite server and mobility use cases, not all IoT solutions need RDSync, but some do. SQL Anywhere runs on “IoT gateways”: boxes containing a cheap single-board computer, like the Raspberry Pi and others designed more for industrial uses. It can collect data from sensors over low-powered radio or Bluetooth networks and relay it to HANA for analysis. Some set of IoT applications carry out significant computation at the IoT gateway, and rely on having structured data available, so a data synchronization solution makes sense. In one recent engagement, a 50-table database is being put onto an IoT gateway: even though the data collection feeds almost entirely into a single table, the other tables provide valuable metadata and also enable a smart application at the IoT gateway.

There’s a common thread here: the mainstream of computing deployments may be cloud-based, but that should not lead us to neglect these valuable niches where reliable, highly-available data at the edge is available to help extend business processes to new parts of the organization.

How it works

This is not the place for detailed product documentation on how to build solutions (see http://help.sap.com/hana_options_replication for more), but here is a brief description of how data synchronization works.

Each SQL Anywhere remote database has a built-in data synchronization client which can pick up changes made to the remote database, and communicate over public networks with the RDSync server. Periodically the synchronization client initiates a session: it uploads changes, receives an acknowledgement and then ensures they are not re-delivered, and then collects changes from SAP HANA. The protocol is a proprietary and high-performance communication mechanism that works over TCP/IP or HTTP protocols, and which takes full advantage of network security options.

The synchronization server does not have significant data storage of its own; all the metadata for an application is stored in a special schema in SAP HANA.

The server implements an event-based model: each synchronization request fires a number of events (eg, events to handle the upload data for each table) and the synchronization server runs so-called “synchronization scripts” for each event. It’s up to the developer to provide those scripts. Often, synchronization scripts are written in HANA SQL Script, and can be as little as a single SQL statement (eg, for an “upload_insert” event, the script may be a single INSERT statement).

The synchronization scripts can take advantage of built-in parameters that come from the synchronization client, so that each remote database may have a different set of data.

A framework of over 60 separate events allows for a large amount of customization, so that RDSync can handle the most demanding of applications. And a single server can handle multiple versions of an application or even multiple applications.

Being built on a mature technology, RDSync comes with a set of advanced features. Its design provides transaction integrity guarantees as well as proven performance and scalability; it provides end-to-end encryption both over the network and at the remote site; there are comprehensive logging and error-handling options, and the ability to integrate with a range of enterprise authentication systems.

(In this initial release, development time assistance is limited: this is not the easiest-to-use technology, and it takes an investment of time and learning to make the most of the powerful capabilities RDSync provides.)

SAP HANA integration

What is new in RDSync is the integration of the synchronization technology into the HANA platform. Instead of being a separate server that has to be managed by itself, RDSync is integrated into HANA’s capabilities.

Lifecycle management, from install to system rename and upgrade, or moving from one host to another, is all carried out using the SAP HANA lifecycle management tools
The SAP HANA name server utilities provide a single point of configuration, and also provides auto-start services as well as high availability in case the host running RDSync becomes unavailable.
Integration into HANA Cockpit and HANA Studio makes sure that system configuration and monitoring can be carried out as part of the broader picture of managing the HANA landscape.
The port assignments, monitoring capabilities, license management, and metadata schema are all created at install time in a systematic and reliable fashion, in a way that fits with SAP practices, for operational ease of use.

In conclusion: SPS10, SAP HANA remote data sync is a first release, but the core functionality is mature and proven, and the integration with HANA platform brings a new level of operational consistency for SAP customers.

How to buy it

SAP HANA remote data sync is a part of the SAP HANA real-time replication bundle, which also includes SAP Replication Server and SAP Landscape Transformation.

Customers using RDSync must also buy SAP SQL Anywhere for use at the remote sites; usually in the form of SAP SQL Anywhere remote database client.

The software is available from SAP Service Marketplace along with other SAP software. Look for it under the letter H for HANA in the SAP Software Downloads.

Support issues should be filed under the application component hierarchy codes of HAN-SYN and HAN-CPT-SYN for the monitoring components.

You can find out more by reading the documentation at http://help.sap.com/hana_options_rdsync, and the SQL Anywhere documentation at http://dcx.sap.com.