NOTE: It is assumed that the reader has experience with setup, configuration, and administration of The SAP® Adaptive Server Enterprise® (ASE), The SAP® Replication Server® (SRS), and HDP®.
Introduction
The SAP® Adaptive Server Enterprise® (ASE) is market leading database management system for Online Transaction Processing - it is a major part of the SAP® Data Management Portfolio for end-to-end, real-time data management (as shown in Figure 1 below).
Figure 1
The SAP® Replication Server® (SRS) enables business continuity with active insights, accelerating data recovery and access, with real-time replication, across the extended enterprise. SRS is an important part of the SAP® Enterprise Information Management® Portfolio as shown in Figure 2, while also being a standalone solution.
Figure 2
HDP is based on the open source Apache Hadoop Data Platform, architected for the enterprise. According to Hortonworks®:
Hortonworks® Data Platform enables the deployment of Open Enterprise Hadoop – leveraging 100% open source components, driving enterprise readiness requirements and empowering the adoption of brand new innovations that comes out of the Apache Software Foundation and key Apache projects.
This comprehensive set of capabilities is aligned to the following functional areas: Data Management, Data Access, Data Governance and Integration, Security, and Operations.
NOTE: Replication to Hadoop Hive is supported on Linux X86_64 platform only.
Steps to Install HortonWorks HDP Sandbox
Steps to Setup SAP Adaptive Server Enterprise with HortonWorks
Figure 3
Figure 3 shows a simple replication environment architectures with a source SAP Adaptive Server Enterprise (ASE) as the primary source and a SAP Replication Server (SRS) with minimum requirement is version 15.7.1 SP204, co-existing on the same node and HortonWorks HDP sandbox on a second Linux node, with a shared folder for staging Changed Data Capture files for a load into Hadoop Hive.
To setup the replication environment the user has to perform the following key configurations:
# mkdir /remote/repeng10 /* On the SRS host */
# mount -t nfs -o rw,nolock 10.173.20.12:/repeng10 /remote/repeng10 /* On the HDP host */
# vi /etc/group
gzhong:x:1020:
sybase:x:200:root,hue,gzhong,hive,flume,hdfs,storm,spark,mapred,hbase,zookeeper,kafka,falcon,sqoop,yarn,hcat,ams,knox
NOTE: Create sybase group and add id gzhong to HDP running machine - this is required because in this example SRS is running as id gzhong and its belongs to sybase group, to insure NFS access with permissions. More generally, this is to insure that HDP users like hue in this example have access to NFS mount files (CDC staged filed) in 640 operation permission, for upload into Hive.
#
# --- These entries added by Sybase InstallShield Installer ---
# --- for the sample Replication Server ---
# In this interfaces example, ZPDS represents the Primary Data Server, REP_hive represents the Hive Server and PIDS the Replication Server
#
ZPDS
master tcp ether replinuxb25 6002
query tcp ether replinuxb25 6002
REP_hive
query tcp ether 10.172.219.254 10000
master tcp ether 10.172.219.254 10000
PIDS
query tcp ether replinuxb25 13515
master tcp ether replinuxb25 13515
#### On Source ASE, assuming that primary/source database pidb exists
% isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:6002
> use pidb
> go
> create table dl_mat_tb (
col1 bigint,
col2 int null,
col3 smallint null,
col4 unsigned bigint null,
col5 unsigned int null,
col6 unsigned smallint null,
col7 tinyint null,
col8 numeric null,
col9 decimal null,
col10 numeric (10,4) null,
col11 decimal (10,4) null,
col12 float null,
col13 real null,
col14 double precision null,
col15 money null,
col16 smallmoney null,
col17 datetime null,
col18 smalldatetime null,
col19 bigdatetime null,
col20 date null,
col21 time null,
col22 char (20) null,
col23 varchar (20) null,
col24 unichar (40) null,
col25 univarchar (40) null,
col26 nchar (40) null,
col27 nvarchar (40) null,
col28 binary (2) null,
col29 varbinary (2) null,
col30 bit,
col31 sysname null,
col32 longsysname null,
col33 bigtime null)
> go
> sp_setreptable dl_mat_tb, true
> go
> grant all on dl_mat_tb to public
> go
> insert into pidb..dl_mat_tb values (0, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '2012-10-01 10:30:01.123' , '2012-10-02 12:31:11.123' , '2012-10-03 14:32:21.123' , '2012-10-04' , '16:34:31.123' , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL, '2012-10-05 20:35:31.123456' )
> go
> insert into pidb..dl_mat_tb values (1, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, '2012-10-01 10:30:01.123' , '2012-10-02 12:31:11.123' , '2012-10-03 14:32:21.123' , '2012-10-04' , '16:34:31.123' , 'col22 char', 'col23 varchar', 'col24 unichar', 'col25 univarchar', 'col26 nchar', 'col27 nvarchar', 0x2828, 0x0029, 0, 'col31 sysname', 'col32 longsysname', '2012-10-05 20:35:31.123456' )
> go
> insert into pidb..dl_mat_tb values (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '2012-10-01 10:30:01.123' , '2012-10-02 12:31:11.123' , '2012-10-03 14:32:21.123' , '2012-10-04' , '16:34:31.123' , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL, '2012-10-05 20:35:31.123456' )
> go
> insert into pidb..dl_mat_tb values (3, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, '2012-10-01 10:30:01.123' , '2012-10-02 12:31:11.123' , '2012-10-03 14:32:21.123' , '2012-10-04' , '16:34:31.123' , '', '', '', '', '', '', NULL, NULL, 1, '', '', '2012-10-05 20:35:31.123456' )
> go
> insert into pidb..dl_mat_tb values (4, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, '2012-10-01 10:30:01.123' , '2012-10-02 12:31:11.123' , '2012-10-03 14:32:21.123' , '2012-10-04' , '16:34:31.123' , 'col22 char', 'col23 varchar', 'col24 unichar', 'col25 univarchar', 'col26 nchar', 'col27 nvarchar', 0x2828, 0x0029, 0, 'col31 sysname', 'col32 longsysname', '2012-10-05 20:35:31.123456')
> go
# Validate the inserts in the primary database
> select count(*) from pidb..dl_mat_tb
> go
#### On HDP/Hive, assuming that target database garydb exists with user hue and password sybase (if not please create this database garydb with a user hue with the passwd sybase). Note the current implementation to Hive replicate will treat it as a append only repository (supporting only 'IUD' operations). That is all source Inserts, Updates, and Deletes will be treated as Inserts on the target.
#### OP_TYPE represents the CDC operation type (Insert, Update, Delete), and OP_TS the source operation timestamp.
% beeline -u jdbc:hive2://10.172.219.254:10000/garydb -n hue -p sybase
> create table dl_mat_tb (col1 bigint, col2 int, col3 smallint, col4 varchar (20), col5 bigint, col6 int, col7 smallint, col8 decimal, col9 decimal, col10 decimal (10,4), col11 decimal (10,4), col12 double, col13 float, col14 double, col15 decimal (20,2), col16 decimal (20,2), col17 timestamp, col18 timestamp, col19 timestamp, col20 date, col21 timestamp, col22 char (20), col23 varchar (20), col24 varchar (40), col25 varchar (40), col26 varchar (40), col27 varchar (40), col28 binary, col29 binary, col30 tinyint, col31 varchar(30), col32 varchar(255), col33 timestamp, OP_TYPE int, OP_TS timestamp);
##### Create connection to primary DB on ZPDS.pidb using rs_init utility.
##### On SAP Replication Server, set up the connection profile to the replicate Hive database garydb, with user name access hue and passwd sybase
% isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:13515
> create connection to REP_hive.garydb
> using profile rs_ase_to_hive;standard
> set username "hue"
> set password "sybase"
> go
> admin who_is_down
> go
> create replication definition dl_mat_tb_repdef with primary at ZPDS.pidb with all tables named
dl_mat_tb (
col1 bigint,
col2 integer,
col3 smallint,
col4 unsigned bigint,
col5 unsigned int,
col6 unsigned smallint,
col7 tinyint,
col8 numeric,
col9 decimal,
col10 decimal,
col11 decimal,
col12 float,
col13 real,
col14 float,
col15 money,
col16 smallmoney,
col17 datetime,
col18 smalldatetime,
col19 bigdatetime,
col20 date,
col21 time,
col22 char (20),
col23 varchar (20),
col24 unichar (40),
col25 univarchar (40),
col26 char (40),
col27 varchar (40),
col28 binary (2),
col29 varbinary (2),
col30 tinyint,
col31 varchar (30),
col32 varchar (255),
col33 bigtime) primary key (col1)
> go
> create subscription sub_dl_mat_tb for dl_mat_tb_repdef with replicate at REP_hive.garydb without holdlock direct_load
> go
> admin who_is_down
> go
#### On the Hive server, run a select to verify the initial load (data synchronization)
% beeline -u jdbc:hive2://10.172.219.254:10000/garydb -n hue -p Sybase
> select * from dl_mat_tb;
Continue replication with delete and update operations
#### On the ASE server, simulate the CDC operations
% isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:6002
> insert into pidb..dl_mat_tb values (5, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, '2012-10-01 10:30:01.123' , '2012-10-02 12:31:11.123' , '2012-10-03 14:32:21.123' , '2012-10-04' , '16:34:31.123' , 'col22 char', 'col23 varchar', 'col24 unichar', 'col25 univarchar', 'col26 nchar', 'col27 nvarchar', 0x2828, 0x0029, 0, 'col31 sysname', 'col32 longsysname', '2012-10-05 20:35:31.123456' )
> go
> update pidb..dl_mat_tb set col2=20 where col1=1
> go
> delete from pidb..dl_mat_tb where col1=2
> go
##### RepServer
> isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:13515
> admin who, sqm
> go
#### On the Hive Server, validate the changes
% beeline -u jdbc:hive2://10.172.219.254:10000/garydb -n hue -p Sybase
> select * from (select *, max(OP_TS) over (partition by col1) last_ts from dl_mat_tb) newtable where newtable.OP_TS = last_ts and newtable.OP_TYPE != 3;
Resources
The SAP Adaptive Server Enterprise:
http://www.sap.com/pc/tech/database/software/adaptive-server-enterprise/index.html
The SAP Replication Server:
http://www.sap.com/pc/tech/database/software/sybase-data-replication-software/index.html
Hortonworks:
Acknowledgement
I would like to sincerely thank Ye, Guo Gang (guo.gang.ye@sap.com) and Zhong, Gary (gary.zhong@sap.com) for their contributions and review.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
37 | |
25 | |
17 | |
13 | |
7 | |
7 | |
7 | |
6 | |
6 | |
6 |