SAP Real Time Loading to IQ
or
How I learned to stop worrying and Love Real Time Loading
From the beginning, the dream was the ability to make decisions based on information in real time. An airline wants to close out cheaper fares based on reservation demand for a market segment using historical and real-time activity. A big box retail store wants to selectively lower prices based on a sale at several competitors. A hedge fund analyzes market fluctuations in trade pricing and volume data to determine risk to exit a position at just the right time.
For decades, the concept of gathering data and executing reports on aggregated data has whetted the business appetite for better and faster delivery of information on the business processes affecting companies. Much like the brain and nervous system of a living organism, the data needs to be delivered, aggregated, analyzed, and actionable information drawn from it.
The principle goal of analytics platforms is to deliver that actionable information in real-time.
SAP Sybase Replication Server has a 20+ year history of data replication to reporting platforms with minimal latency. SAP Sybase IQ also has a long history of delivering lightening-fast query response to aggregation of large data sets for reports.
As of Rep Server version 15.5, a new feature was introduced called Real Time Loading (RTL) for IQ and its ASE companion, High Volume Adaptive Replication (HVAR). RTL and HVAR are similar technologies but with different end targets: RTL is optimized for and targets SAP IQ. HVAR is optimized for and targets SAP ASE.
For this article, we will focus on Replication Server 15.7 with Real Time Loading to populate an IQ 15.4 server. The intent is not to provide a step-by-step implementation since those steps exist in the Heterogeneous Replication Guide (http://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc36924.1571100/doc/pdf/rshetrep.pdf ), but to discuss specific configuration and implementation lessons learned. (Another source for a step-by-step implementation guide / checklist is the presentation slides from ISUG-Tech in Atlanta. http://my.isug.com/p/do/sd/sid=3094&type=0 )
The Performance Dilemma
For the last 30 years, database system performance has always proven challenging. Additionally, as the demand of OLTP performance has increased, so has the demand for reports for decision support. The two tuning approaches proved increasingly difficult to resolve on the same platform as loads increased.
The solutions to the performance dilemma are:
With the first option, the performance advantage of offloading workload and tuning for reports provided advantages. However, since reports usually aggregate from very large data sets, optimized platforms are required.
The second option was implemented in ASE’s engine optimization strategies allrows_oltp, allrows_dss, and allrows_mix.
Sybase’ solution was the columnar analytics platform IQ. While IQ addressed the query performance issues very well, there was no way to replicate directly to IQ efficiently so data was loaded to IQ in bulk fashion. Replication Server’s traditional replication technique applies transactions to the target in a row-by-row fashion, but IQ’s optimal load technique is via bulk load.
Replication Server 15.6 addressed this with the Real Time Loading feature. This feature applies transactions to IQ in a bulk fashion instead of row-by-row. This approach is implemented in the HVAR and RTL functionality of Replication Server. The implementation and operation of both appear similar. Both use a “Net Change database” called the CDB to collect, sort, and group transactions. The resulting group of transactions are applied to IQ or ASE in an optimal bulk fashion.
There are 3 ways to accomplish work faster in a computer:
Computer systems have literally reached melt down on CPU clock speeds. Rep Server Real Time Loading is a combination of the last two, working smarter through optimized algorithms and also using parallelism, i.e. multiple threads to increase system performance.
IQ Advantages
At heart, IQ is a Relational Database Management System, so the known concepts of tables, columns, and SQL still apply as they do in ASE or Oracle. However, IQ’s query speeds are 10x-100x greater than those of a traditional RDBMS. IQ supports heavy ad-hoc query and was designed for analytics on terabyte and petabyte scale databases. IQ is a disk-backed database, i.e. not an in-memory db. With its low TCO, SQL language interface, Multiplex feature, and ability to scale to 1000’s of users, it is an extremely competent analytics platform.
IQ uses a column-based storage strategy which means the data is not stored physically as rows but each column is stored as a separate structure: an index. The indexes use compression techniques (bit maps, dictionary lookup) and only the unique values are stored (cardinality is important for tuning, more on tuning for RTL later).
IQ’s optimal method of data loading is via bulk insert. Performance for single row insert, update, and delete is generally very slow. This is where Rep Server’s Real Time Loading features take advantage of loading and updating strategies that favor IQ’s preference for bulk data load.
IQ Loading Before RTL
Before RTL, loads to IQ were typically staged to an intermediary database server using replication. At scheduled times (off hours), the data was bulk loaded to IQ.
Figure 1 - Staging Database Replication to IQ
Another means of populating IQ was using an ETL (Extract, Transform, Load) approach using a staging server. Data is moved to and prepared in the staging server. Other custom processes for table lookups and scrubbing execute in this intermediary server.
Figure 2 - ETL technique for replicating to IQ
However, neither the ETL nor the staging approaches are considered real-time replication methods.
What is RTL?
RTL enables the constant stream of replicated data to IQ without the intermediary servers described above. RTL puts a minimum load on the IQ server, minimizes locking, and so enables concurrency of the reporting applications. RTL removes many of the moving parts of the staging and ETL solutions and facilitates a more familiar Replication Server implementation.
Figure 3 - Rep Server RTL to IQ
RTL is a licensed feature of Rep Server 15.5+. It uses a ‘Compilation’ and bulk-apply strategy to populate the target IQ server. Insert, update, and delete activities are grouped to fit IQ’s optimal bulk loading method. RTL sorts and groups transactions to build a ‘net change’ database. The ‘net change DB’ (CDB) is an in-memory database within Rep Server, think of it as a transaction ‘holding tank’ that exists until a grouped set of transactions are committed.
Once the grouped transaction is applied, the CDB’s memory is released and the next group of transactions are sorted and assembled into a grouped transaction in a new CDB.
Inserts are applied directly to the target table from the CDB using the insert..location statement. Updates and deletes are optimized by inserting the changes to a temporary table in IQ, and performing a join-delete or join-update to the target table.
Terminology for this transaction grouping and bulk application is “compilation”. During compilation of a group of transactions, only the “net change”, i.e. final outcome of a set of transactions is applied to IQ. For example: if a row is repeatedly updated within the group of transactions, only the final result of the updates is applied to the row in IQ.
RTL does not necessarily apply transactions in the same order they occurred in the source database. Certain transactions may require application in the same order. If replicated transactions fail, with a duplicate for instance, but no duplicate occurred in the source, those tables may need to be removed from RTL processing. If referential constraints are defined in IQ, the replication definitions should reflect this because RTL will follow replication definition defined constraints and order the transactions appropriately.
The basic processing RTL executes is:
To enable RTL, set dsi_compile_enable to ‘on’ at the global level. dsi_compile_enable can also be set at the connection and table level. Problematic tables can be excluded from RTL by turning dsi_compile_enable off just for the problem table.
Special RTL Configuration
The following lists the parameters and areas that require special attention when setting up RTL to IQ.
Prerequisites
Build IQ server
create externlogin db_dbmaint to RRS
remote login db_dbmaint_rs identified by ’pa$$w0rd’;
set option public.Append_Load = ‘On’
Configure Replication Server
Create connection to IQ_DS.IQ_DB
Using profile rs_ase_to_iq:standard
Set username to iq_maint set password to ‘pa$$w0rd’
Create database objects
Create Repdefs & Subs
Materialize subscription
Verify & Monitor
What prevents RTL function?
There are a number of events that can stop Replication Server from compiling transactions. Duplicates that violate primary key or unique index definitions, foreign key constraints, and data conversion issues. When these events occur, Rep Server reverts to row-by-row application of transactions. The performance of non-RTL replication is detrimental to replication performance and typically causes transactions to back up in the queues, so errors must be resolved to continue RTL replication.
Certain replication errors are mapped to invoke retry logic in Replication Server when RTL is active. This ‘retry logic’ is an enhancement to Rep Server’s normal function. When Rep Server receives one of these errors, the defined logic causes Rep Server to split the group transaction in half and retry each half. If a failure occurs in one of the halves, that half will be split in half again, and retries of both halves execute. If both halves succeed, the grouped transaction has completed and RTL releases the current CDB and moves on to begin compiling a new group of transactions.
Other events that can cause Replication Server to drop out of RTL mode and perform row-by-row updates are configuration parameters such as dsi_cmd_batch_size and dsi_compile_max_cmds set too small resulting in lack of resources needed for RTL processing.
IQ Indexing Issues
IQ index implementation is a key difference with other non-columnar RDBMS’s physical structure. By default, flat Fast Projection (FP) indexes are created on columns when a table is created. The default index (if no other is specified) is a Flat Fast Projection.
Flat FPs do not have a look-up table, however, optimized FP<#> indexes have a tokenization (dictionary) look-up table. The three types of optimized Fast Project indexes are detailed and contrasted with the Flat FP in the chart below.
Figure 4 - Optimized FP index details
Take note of the numbers for FP3 indexes which are for cardinality ≥ 65,537 to ≤ 16,777,216 unique values. While FP1’s and FP2’s benefit performance and the look-up tables are small, FP3’s tends to be a problematic for RTL performance due to the size of the lookup table. The entire lookup table must be read into memory. This can cause extraordinarily slow performance with RTL.
Globally set minimize_storage = ‘on’ when configuring IQ or when creating the table schema. This parameter will cause index ‘rollover’ when the number of unique values exceeds the capacity of an FP1, the index will roll over to an FP2. When the number of values of an FP2 is exceeded, the index will roll over to a FP3. This is when the performance problems begin. Reading in a 16 million row look-up table takes time, memory, and cycles.
To avoid rollover to the FP3 index, a special configuration parameter was created. core_options67 = 65536 will cause FP2 rollover to skip the FP3 and go straight to a flat FP index.
To avoid index rollover entirely, implement Sybase IQ 16 which changes to n-bit indexes.
Indexing Recommendation
Each primary key index should have a unique HG index on the composite keys of the index. Each one of the columns making up the composite key should have an LF or HG indexes.
Diagnostic Tools
All of the usual Rep Server admin who commands in Replication Server for queue and transaction movement continue to display the same information with RTL implemented. SAP has added new commands and enhanced the existing tool set to provide information specific to RTL.
The Index Advisor provides run-time advice based on updates.
set option public.index_advisor = ‘ON’
set option public.index_advisor_max_rows = 1000
Collects in-memory and use sp_indexadvice to view the heavy-hitters.
select * from sp_indexadvice()
order by Ninst desc
The ‘Advice’ column will tell you what indexes the optimizer recommends. Index changes can be scripted straight from the text of the sp_indexadvice() ‘Advice’ column.
trace ‘on’, dsi, dsi_trace_groupsize
trace ‘on’, dsi, dsi_trace_group
Command Mapping
Command mapping is a new feature that maps commands to a different sequence or to nothing. Set dsi_command_convert on the connection:
alter connection IQSRVR.iqdb
set dsi_command_convert to ‘i2di, u2di’
To display current setting on a table:
admin config “table_name”, <ds>, <db>
Making the Case for RTL to IQ
One question that frequently comes up is: How do I sell an IQ and RTL solution to upper level management?
First, start with a Proof of Concept (POC) on SAP IQ without replication. Chose a report based on its run-time. Choosing the most run-time intensive report will make the POC have impact.
Only move the tables required for that report to IQ and populate with insert…location. If at all possible, chose a report that requires little to no SQL changes. T-SQL compatibility in IQ helps with this.
Once the speed and ease of use are demonstrated, the need to real-time data will follow.
Summary
This has been a brief introduction to some of the key configuration issues and methods of diagnosing performance issues with Replication Server’s Real Time Loading to IQ feature.
Credits & Thanks
Special thanks to Jeff Tallman for his efforts on the Sybase Replication Server Performance and Tuning white paper, many Powerpoint presentations, and the extraordinary contribution he makes to this technical community. Thanks to Mark Mumy for his Powerpoint presentations and assistance over the years. Also a very special thanks to Rob Verschoor for the time and effort he continues to put in to produce the ASE, Rep Server, and IQ quick references.
About the author
Phil Adams has been an independent consultant since 1999. A highly resourceful and motivated professional, Phil has brought numerous high profile projects to completion in government, health care, pharmacy, and legal fields. Phil provides database, data warehouse, data movement, and business intelligence professional services specializing in the SAP Adaptive Server Enterprise, SAP Replication Server, SAP IQ, and Microsoft SQL Server platforms.
Phil's professional services experience spans over 25 years with companies such as American Airlines, Fidelity Investments, Sybase Professional Services and Sybase Education prior to starting his own consulting firm, FireSteed Consulting. Phil currently works with Dobler Consulting, LLC.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
8 | |
5 | |
5 | |
4 | |
4 | |
4 | |
4 | |
4 | |
3 | |
3 |