SAP IQ Developer Center in SAP Developer CenterWhere is this place located?

Log in to follow, share, and participate in this community.

Overview

sm_competitive2.jpg

Build big-data analytics solutions with the market leader. SAP IQ
is more powerful and efficient than traditional systems for extreme-scale data warehousing, advanced analytics, and business intelligence applications.

Top Liked Content

Refresh this widget

You can't create discussions here, but as discussions appear they might have answers you need.

You can't create documents here, but watch for documents with information you can use.

You can't post to a blog here, but watch blog posts for new insights and ideas.

You can't create polls here, but watch for polls you can vote on.

Test Drive IQ Today!

Download

SAP IQ 16 Express Edition or SAP IQ 16 Evaluation Edition

IQ_Download.jpg

Formatted Text

Featured Links for SAP IQ:

Formatted Text

Popular Blog Posts

This month I spent some time testing the new row level versioning (RLV) feature of IQ and looking at the implications for real time loading of data. This blog has some notes about that and some high level figures from the testing I did.

 

Sybase IQ has, at least in recent versions, always been able to load data at an incredible rate – if you have not already seen it, take a look at this link, Sybase IQ breaks (smashes to pieces) another record loading data at an almost unbelievable 34TB per hour.

http://scn.sap.com/community/developer-center/analytic-server/blog/2013/05/30/sap-sybase-iq-16-sets-guinness-world-record-with-343tb-per-hour-for-big-data-loading

 

When loading, IQ likes things in big chunks – this is where it gets the best performance. When it comes to smaller “nibbles”, performance is modest. However, perhaps the biggest challenge historically, has been dealing with the Table-Level-Locking nature of IQ.

 

This is what the new RLV feature is about; improving the OLTP performance of IQ and providing Row-Level-Locking capability. This extends the range of applications IQ is able to cater for and in many cases will simplify the application design.

 

At this point it is worth remembering that while IQ 15.x already had pretty good OLTP performance for a column based database, and that 16.x improves on this by 4-5 times, it’s OLTP performance is still to be considered modest; this will go nowhere near the fantastic transaction rates of Sybase ASE (which now holds almost all of the SAP SD benchmarks: http://www.sap.com/solutions/benchmark/sd2tier.epx).

 

Setting up the RLV was easy enough. The RLV itself is transparent to the IQ user – a dbspace is created of type RLV STORE, and a new configuration parameter (rlvmem) controls the amount of memory available for RLV operations. The RLV data is held in-memory with persistence provided via a log on the RLV STORE dbspace (put this on a fast disk). Row-Level-Locked tables are actually just normal tables that exist in the standard IQ space (referred to in 16 as TLV) that are enabled to utilize space in the RLV – the choice of whether a transaction uses the TLV or RLV portion is done at run time by setting a new option; SNAPSHOT_VERSIONING is set to either table-level or row-level. In this way, users can still access an RLV enabled table in a non RLV way and get all the high performance characteristics of TLV tables and use the RLV features only when they need to. Tables can have data in both the TLV and RLV portions active at the same time. A merge process (either automatically or manually triggered) brings data from the RLV into the TLV portion.

 

So, what kind of performance did I get? In the test I set up insert, updates, and deletes were around 3-4 times faster on RLV tables than TLV tables – taking the modifications rates into the thousands per second (low thousands on the modest hardware I was using). I also noticed here that the transaction rates in 16.x are higher than in 15.x.

 

The second area I looked at was concurrency. I did this by loading a set of data via a single connection, and then doubling the number of connections and halving the number of rows inserted by each connection. Moving from a single connection through to two, four, eight, and sixteen connections loading into the same table showed a very pleasing constant loading rate.

 

So to me, based on these tests, it looks as though the RLV does exactly what was intended. It would be interesting to hook up Sybase Replication Server to IQ16. Maybe that is what I’ll do next (unless someone already has – spoiler alert ?).

Are you new to SAP Sybase IQ?  Have you used the product but are still not clear on what indexes to use or when to use them?  Have you just migrated to SAP Sybase IQ 16 and are wondering if there are any indexing changes?  This blog will discuss indexes and strategies for SAP Sybase IQ.

 

Basic Indexes

When creating a table in IQ, a default index is placed on every column.  This index is referred to as a Fast Projection (FP) index.  In order to enable optimized FP indexes, one must set the Minimize_Storage option to ON.

set option public.minimize_storage=’on’;

 

From this point, all new tables and columns will use n-bit encoding for highly optimized and compressed data storage and access.  Each FP index will be encoded as 1 to 31 bits depending on the data cardinality (2^1 to 2^31 distinct values).

 

Advanced Indexes

In addition to the basic FP indexes, IQ supports nine other indexes:

  • High Group (HG) – used for join columns and searchable columns whose data has more than 2,000 distinct values
  • Low Fast (LF) – used for searchable columns whose data has fewer than 2,000 distinct values
  • High Non Group (HNG) – used for range searches (except for date, time, and datetime data types) and for single column aggregations
  • Word (WD) – Use for whole word searching of character based data
  • Text (TEXT)– Use for full text searching of character based data
  • Date (DATE) – Use for fields that have a DATE data type (ranges and datepart queries)
  • Time (TIME) – Use for fields that have a TIME data type(ranges and datepart queries)
  • Datetime (DTTM) – Use for fields that have a DATETIME/TIMESTAMP data type (ranges and datepart queries)
  • Compare (CMP) – Use for ore-computing the comparison of two columns in the same table

 

When are indexes used?

In most relational database engines, the optimizer chooses the best single index for a given table in the query.  SAP Sybase IQ takes the optimization process to the next level by looking not only at each column in the query, but at all joins and search arguments.  For example, assume that a date column is used in a join and to search for the month datepart.  The optimizer would prefer to use the HG index on the date field for the join.  It would also prefer to use the DATE index to search on the month datepart.

 

In short, IQ will look at each join, where clause, aggregation, etc. to determine the best index for each operation.  Not just one index per table and not just one index per column.  But rather, one index for each use of each column in the query!

 

Should we be concerned about over indexing?

The short answer is an emphatic NO!  As mentioned previously, typical relation database engines pick the best single index to access data in a table.  Second, when adding indexes to most RDBMS engines the load times tend to slow down due to how data is processed.  Third, most database engines store data and indexes in different structures so that as more indexes are added the table size swells.  All of these lead to trying to minimize the total number of indexes on a table in most engines.

 

SAP Sybase IQ is not prone to these limitations.  We pick as many indexes as needed to process the entire query.  Indexes in IQ are not as large as the data being indexed.  Each of the indexes employs various levels of vectors, bitmaps, data representations, and compression to dramatically decrease the database size.  Finally, when loading data, IQ can consume all CPU resources.  We currently released a fully audited benchmark in which SAP Sybase IQ loaded data at over 34.3 TB/hour on an 80 core Linux system (http://scn.sap.com/community/developer-center/analytic-server/blog/2013/05/30/sap-sybase-iq-16-sets-guinness-world-record-with-343tb-per-hour-for-big-data-loading).

 

 

All of these SAP Sybase IQ features lead to an engine that can handle as many indexes as necessary for all query (canned or ad hoc) activity.  It is quite common to have 30-60% of all columns indexed with the advanced indexes (and in some cases multiple indexes per column).

 

When to use Indexes?

Now that we have some basics about indexes, the big question is when to use them? Here are some simple guidelines that will help cover every situation.  The general idea is to ask a series of questions about each column.  The answers to the questions will dictate whether or not indexes are needed, and if needed which one(s) are best.

 

  • Will this column be used as a JOIN clause?
    • NO: Continue to next question
    • YES: Use an HG index
  • Will this column be used as a search column in the WHERE clause?
    • NO: Continue to next question
    • YES: Use an HG index if the column has more than 2,000 distinct values.  If less than 2,000 distinct values use an LF.
  • Is the column data type a DATE, TIME, TIMESTAMP, or DATETIME?
    • NO: Continue to next question
    • YES: Use a DATE, TIME, or DTTM index.  All date/time/datetime fields should have this index as systems quite frequently use date fields for searching, filtering, and aggregations.
  • Will whole word searching be done on the character data?
    • NO: Continue to next question
    • YES: Use WD (WORD) index
  • Will text searching be done on the character data?
    • NO: Continue to next question
    • YES: Use a TEXT index
  • Will this column be compared to another column in the same table?
    • NO: Continue to next question
    • YES: Use a CMP (compare) index that includes both columns
  • Will this column be used for range searches?
    • NO: Continue to next question
    • YES: Use an HNG (high non group) index
  • Will this column be used in aggregations with no other columns?
    • NO: Continue to next question
    • YES: Use an HNG (high non group) index

 

There are two types of indexes in IQ: those that have detailed statistics and those that have minimal statistics.  The HG, LF, TEXT, and WD indexes contain detailed statistics.  The HNG, CMP, DATE, TIME, and DTTM indexes contain minimal statistics.  After walking through the above checklist, you may find that an index with minimal statistics was all that was warranted for the column.  Should that situation arise, it is also recommended that an index with detailed statistics is placed on that column.  Use an HG or LF index for these detailed statistics.  The choice will depend on the current or expected cardinality of data in the column.

 

The process to choose indexes may seem daunting especially if the data model has a significant number of tables and columns.  The process outlined above assumes that very little is known about the application and its use of the data.  This is, generally, not the case as most DBAs and data modelers do know more about the data and its use.

 

Should you know more about the model and data, we can simplify the process.  I prefer to look at columns that are known to be used in WHERE clauses of most queries and reports.  Indexes on join and search criteria are the single biggest factor in performance, so focusing solely on those will allow IQ to be indexed for nearly every situation.

 

Certainly some columns and indexes will be missed with this approach.  However, since the focus was on columns that are heavily used, the impact of missing a few minor, or less frequently used, columns should be relatively low.

 

Lastly, know that IQ provides a safety net for us!  IQ has an index advisor that can be enabled by setting the option Index_Advisor to ON.

set option public.Index_Advisor=’on’;

 

Once on, IQ will print index advice to query plans (if they are on) as well as to a system table that can be accessed via the stored procedure sp_iqindexadvice.  The index advisor output will note under what circumstance the index is needed.  The stored procedure will also aggregate the number of times the advice was made.  Should that procedure be used to trend data over a period of time (days, weeks, even months) the number of times the same advice was made can help focus efforts on the most recommended advice.

SAP and its business partners, BMMsoft, HP and Red Hat, have generated a new world record for loading and indexing Big Data with SAP Sybase IQ 16, the latest version of SAP Sybase IQ, which implements a new, fully parallel data loading capability  and a next generation column store. The audited result of 34.3 TB/hour greatly surpasses the previous world record of 14 TB/hour, also achieved by the same team using a previous version of SAP Sybase IQ.

 

This new record also easily eclipses published data loading results by companies such as Oracle and EMC. It demonstrates the ability of this solution to efficiently handle extreme-scale EDWs and Big Data analytics by providing high speed data loading, indexing, immediate access and  deep analysis of both structured data (i.e. database records) and unstructured data such as emails, documents and multi-media files. SAP Sybase IQ makes it simpler and more cost-effective to exploit the value of massive amounts of data at the speed of business, helping to transform companies through new insights that were previously impossible or impractical.

 

Way to go team!

Looking at IQ, you may realize that there are many options and features which affect sizing, setup and configuration. This is great for customization, but it can be a lot to look at when getting started.

 

For detailed hardware sizing information and some configuration details, there is a SAP Sybase IQ 16 Hardware Sizing Guide document. This will give detailed information on what is required and how the IQ system works. Now, a checklist is provided which consolidates information from the detailed document. Specific sizing requirements are provided and suggested sizing guidelines are recommended. A series of videos have also been recorded to walkthrough the document and provide details.

 

View Checklist Document

 

View YouTube Videos

We have had so much interest in the new Guinness World record, set by SAP Sybase IQ 16 for the fastest loading of Big Data, that I thought I would share some further details with you all...

 

This test of SAP Sybase IQ 16 was run on a standard HP ProLiant DL980 G7 server in an 80 core system (8 processors, 10 cores per processor) with 1TB RAM running Red Hat RHEL 6.2. The EDMT analytics archive from SAP partner BMMsoft, certified with SAP Sybase IQ 16, was used to ingest and query the following data:

 

  • 6.5 million Documents: Records containing a document and metadata about the document. The document portions of the records were made of images, video clips, audio clips and office type documents. 

 

  • 500 million transactions: Records produced as a result of conventional transaction processing. These records are made of traditional data types such as numbers and character strings.

 

  • 5 million short messages: Records produced by capturing short text messages. These records were made of a combination of email messages and SMS type messages.

 

The complete load process included 50 batches of data. Each batch was made of 16 files containing raw data representing the various data types outlined and totaled above.

 

The test recorded a load speed of 34.3TB/hour (loading and indexing 33.13TB of raw data in 57.91 min). Queries were executed throughout the load process to verify the accessibility of the newly ingested data.

 

This record is significantly faster than published claims by Oracle for Exadata (12 TB/hr- June 2012) and EMC for their Greenplum-based appliance (10TB/hr- Jan 2013) and SAP holds the only independently verified test for Big Data loading.

 

Once again – congratulations team, way to go!

Formatted Text

Formatted Text

12-month Leaderboard