Today’s enterprise is transforming from being reactive to proactive and predictive, with Big Data applications. The challenge is to bring together and make available a high volume, of a variety of data, at high velocity, for active insights. With SAP® Replication Server®, your disjointed data from disparate sources can be aggregated into Hadoop Hive Warehouse, in real time to insure that business data is always available, for your Big Data Applications.
You Can |
---|
|
SAP REPLICATION SERVER
SAP Replication Server supports your Big Data enterprise data management environment by replicating and synchronizing transactions originating in SAP Adaptive Server® Enterprise (SAP ASE) and Oracle databases into Hadoop Hive File System (HDFS), as shown in Figure 1 and Table 1 below. It provides the high-performance, secure, reliable, and consistent delivery of data across the enterprise, addressing:
Figure 1
DELIVERING KEY FUNCTIONALITY
With SAP Replication Server, you gain all the benefits of:
A key characteristic of the SAP Replication Server is high-performance. This performance can be attributed to the following features and functionality:
Database Replication Matrix for Big Data Environments is shown in Table 1, below.
Table 1
Sources
| Targets
|
SUPPORTING MISSION-CRITICAL ENTERPRISE DATA MOVEMENT NEEDS
SAP Replication Server provides the continuous movement of mission-critical application data. Once established, this environment can be automated to help ensure that information is replicated to meet changing business demands. Whatever the environment, no matter how complex or broadly distributed, and whatever the time constraints, SAP Replication Server can meet your organization’s most demanding data movement requirements. SAP Replication Server provides high-performance, secure, reliable, and consistent delivery of data across the enterprise.
BIG DATA REPLICATION ARCHITECTURE
Supported Capabilities |
---|
|
IMPLEMENTATION MODEL
The replication model into Hadoop Hive follows a warehouse approach of writing all operations as INSERTS. So DELETES and UPDATES are also written as INSERTS, with all rows containing two special columns OP_TYPE (INT Datatype, a 1 indicates an INSERT, a 2 an UPDATE, and a 3 a DELETE) and OP_TIME (Timestamp Datatype) to indicate the type of operation that created that row and a timestamp to capture the source operation timestamp (NOTE: These columns have to be created manually). This is illustrated in Figure 2 below.
Figure 2
Based on this approach, two combinations of data representations are possible. One as shown above in Figure 2, a log of all operations on both the source tables with each operation in each table being represented as a row in the target Hadoop Hive warehouse. The replication architecture has to take into consideration the datatype mappings between the sources and the target so as to insure that all source datatypes are adequately represented in the target. A second model allows users to replicate each source table into a Hive Target (as shown in Figure 3 below).
Figure 3
In Figure 3 we see the source table T1 is mapped to target Hive table T1 and source table T2 is mapped to target Hive table T4. In this model all changes in source T1 are initially materialized into target T1 (#rows_T3 = #rows_t1) and then every operation on the source table T1 will be represented as an insert in the target table T1 (similarly with T2).
ASE TO HIVE DATATYPE MAPPING
This table identifies the ASE datatype, the Replication Definition datatype, and the Hive datatype.
ASE datatype | Replication Definition datatype | Hive datatype |
bigint | bigint | bigint |
int | Integer | int |
smallint | smallint | smallint |
unsigned bigint | unsigned bigint | varchar |
unsigned int | unsigned int | bigint |
unsigned smallint | unsigned smallint | int |
tinyint | tinyint | smallint |
int identity | integer | int |
numeric | numeric | decimal |
decimal | decimal | decimal |
float | float | double |
real | real | float |
double precision | float | double |
money | money | decimal |
smallmoney | smallmoney | decimal |
datetime | datetime | timestamp |
smalldatetime | smalldatetime | timestamp |
bigdatetime | bigdatetime | timestamp |
date | date | date |
time | time | timestamp |
char | char | char |
varchar | varchar | varchar |
unichar | unichar | varchar |
univarchar | univarchar | varchar |
nchar | char | varchar |
nvarchar | varchar | varchar |
image | image | No Default |
text | text | No Default |
unitext | unitext | No Default |
binary | binary | binary |
varbinary | varbinary | binary |
bit | tinyint | tinyint |
timestamp | timestamp | No Default |
sysname | varchar | varchar |
longsysname | varchar(255) | varchar |
bigtime | bigtime | timestamp |
Note: When no default value is provided, the successful replication of the ASE type requires you to choose the customization tool of your choice to map to the Hive datatype based on your business needs.
ORACLE TO HIVE DATAYPE MAPPING
This table identifies the the Oracle datatype, Replication Definition datatype, and the Hive datatype.
Oracle datatype | Replication Definition datatype | Hive datatype |
Integer | rs_oracle_decimal | double |
Number | rs_oracle_decimal | double |
Number(10) - Number(18) | rs_oracle_decimal | Bigint |
Number(5) - Number(9) | rs_oracle_decimal | int |
Number(2) - Number(4) | rs_oracle_decimal | smallint |
Number(P,S) | rs_oracle_decimal/varchar | decimal/varchar |
Number(1) | rs_oracle_decimal | tinyint |
FLOAT | rs_oracle_float | double |
BINARY_FLOAT | rs_oracle_float | double |
BINARY_DOUBLE | rs_oracle_decimal | double |
DATE | rs_oracle_datetime | timestamp |
TIMESTAMP(N) | rs_oracle_timestamp9 | timestamp |
CHAR | Char | varchar |
NCHAR | Unichar | varchar |
VARCHAR2 | Varchar | varchar |
NVARCHAR | univarchar | varchar |
NVARCHAR2 | univarchar | varchar |
BLOB | image | No Default |
BFILE | image | No Default |
NCLOB | unitext | No Default |
INTERVAL | rs_oracle_interval | No Default |
ROWID | rs_oracle_rowid | No Default |
UROWID | rs_oracle_rowid | No Default |
TIMESTAMP with time zone | rs_oracle_timestamptz | No Default |
TIMESTAMP with local time zone | rs_oracle_timestamp9 | No Default |
ANYDATA | opaque | No Default |
VARRAY | opaque | No Default |
NESTEDTAB | opaque | No Default |
Object | opaque | No Default |
REF | rs_oracle_binary | No default |
RAW | rs_oracle_binary | binary |
Note: When no default value is provided, the successful replication of the Oracle type requires you to choose the customization tool of your choice to map to the Hive datatype based on your business needs. Refer to Oracle Datatype Compatibility and Oracle Datatype Restrictions of the Replication Agent Primary Database Guide for additional Oracle datatype information.
ENVIRONMENT ENHANCEMENTS FOR REPLCATION INTO HADOOP HIVE
Besides the usual environment variables including LD_LIBRARY_PATH or SHLIB_PATH or LIBPATH (depending on the platform), new variables introduced for replication into Hive are:
RS_HIVE_TEMP_FILE_PATH:
The value in this variable represents the directory where the staging files are written into by the replication server. This value is also used by the Hive service to load the data into the warehouse. Appropriate permissions and share/export must be enabled to insure that both the replication server can write into this directory and the Hive service can access and read from this directory.
RS_HIVE_AUTH_TYPE:
The value in this variable informs the replication server of the authentication mode to use – NoSASL or SASL (LDAP and anonymous are supported today).
MapReduce overhead can be managed by increasing the values in dsi_compile_max_cmds and dsi_cdb_max_size properties, to stage more commands for each load data operation.
TROUBLESHOOTING
ECON trace dsi_buf_dump can be enabled to trace the Hive query command that is being sent to Hive by Hive Connector
OPERATING ENVIRONMENTS
SAP® Replication Server® platform/Operating System certification can be located at https://websmp203.sap-ag.de/~sapidb/012002523100008725382014E
Hive on Linux AMD64/SUSE 11.3 & RHEL 5.5 has been certified, but in general Linux on X86_64 (all versions of SUSE/RHEL) can be supported.
LEARN MORE
For more information about how SAP Replication Server can help meet your data challenges, visit http://scn.sap.com/community/sap-replication-server.
To learn more about the Hive replication capability and usage, please refer the New Features Guide below:
http://help.sap.com/Download/Multimedia/zip-rs1571204/Replication_Server_New_Features_Guide_en.pdf
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
35 | |
25 | |
17 | |
13 | |
8 | |
7 | |
6 | |
6 | |
6 | |
6 |