Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

When SAP talks about HANA, they quite often talk about the 1000x improvements in performance that you can get. A customer asked me last week why SAP HANA would provide any improvement over their implementation of Sybase IQ, if they pinned all the IQ tables in memory. They conjectured that IQ should be just as fast as HANA, right?

In fact, there are several capabilities of HANA as compared to IQ, which should make it substantially faster in the real world, even when IQ operates entirely in-memory. There is a nice blog written by chris.jones which you can read here, which explains this and some other stuff.

The important thing to note is that IQ is a disk-based data-warehouse that works well with large volumes of memory for caching. HANA is a transactional developer platform written to be in-memory. As we shall see, there are pros and cons to both.

- Both HANA and IQ compress data. But, IQ compresses in bitmaps, and HANA compresses with dictionary encoding which means that HANA only needs to decompress exactly the data it needs, and it does so inside the CPU cache. Because in-memory databases are limited by network bandwidth, this should make HANA much faster than IQ for anything which requires materialization of data.

- HANA is optimized for individual writes. With IQ, you lock a table on write, and it is locked until you specify a commit. This means that you can't have multiple updaters in IQ, whilst HANA has a multi-version concurrency control (MVCC). This means in practice that many people can write to a HANA table with individual writes, whilst IQ requires one person writing to a table, and in batch. It also should mean that IQ is much faster for bulk loading than HANA. In fact, Sybase IQ holds the world record for bulk loading at 34TB/hour.

- Both databases have support for SIMD, but HANA is highly optimized for the Intel E7 platform and its SSE2 instructions, which allow multiple additions/multiplications in one CPU cycle. This should mean that combined with the compression, HANA is faster at aggregating than IQ.

So, I decided to load the same data into IQ and HANA, and do some comparisons on the same hardware.

Test Environment

For this, I used a SAP HANA size "Medium" system from HP. It's has 4 CPUs, 40 cores, 512GB RAM, 25x15k 146GB SAS for data and one FusionIO 640GB for logs. The OS is SUSE Linux for SAP Applications. For my testing I use one database at a time.

For SAP HANA, I used SAP HANA 1.0 SP6 Rev.69, which is the latest available.

For IQ, I used Sybase IQ 16.0 SP2, which is also the latest available.

Installation

It's my first time getting to grips with Sybase IQ and markmumy was a big help in this SCN thread. IQ doesn't come pre-configured out the box and you have to set a few settings to get things to work well. In my database configuration file, I set the following settings:

-c 1g

-iqmc 154000

-iqtc 154000

-iqlm 154000

-iqnumbercpus 40

This basically tells IQ to use 1GB for the cache, and to split the remaining 512GB ram amongst the various processes (30% of RAM to each process). Plus because of hyper threading, IQ thinks I have 80 CPUs, so I have to tell it that I actually have 40.

HANA is definitely easier to install, and requires no special configuration, but this isn't a big deal in the scheme of things.

Data Loading

The Sybase bulk loader is pretty fiddly and very specific about the file format, number of columns and data quality. Actually this is pretty much the feeling of the IQ platform overall - fantastic technology mixed with a relatively poor user experience. The HANA bulk loader isn't very feature-rich, but it is much less picky than the Sybase loader. This is definitely an area that both platforms could work on.

Once you get IQ up and running though, it flies for loading. I'm sure that it could be better optimized by an IQ pro, but I found I could load my 62GB fact table in 2m30s. By comparison, I need 10x this long to load the data into HANA. This doesn't come as a surprise, because IQ doesn't have to worry about multiple inserters, or dictionary encoding. With HANA, you trade off load performance for the behavior of a transactional RDBMS. IQ is a pure data warehouse.

Queries and Aggregations

vishal.sikka often talks about how HANA can aggregate 16m/sec/core. In my 40-core system that should translate to 640m aggregations/sec/core. I actually find it is much more variable than this and depends on the join complexity and grouping sets. For a simple table, you can get as much as 31m, and for very complex joins and grouping I see as low as 9m. You will see this in the results below.

In both cases, I see massively parallel behavior and all 40 cores are used simultaneously.

Still - with my 1.4bn table, I have four queries. I generally find that most questions you can ask fall into one of these categories in terms of performance.

Query

SAP HANA

SPS08

SAP HANA

SPS07

Sybase IQ
SELECT SUM(AMOUNT)/COUNT(AMOUNT) FROM TRANSACTION1.2s1.2s1.2s

SELECT GENDER, SUM(AMOUNT)/COUNT(AMOUNT)

FROM TRANSACTION T

JOIN CUSTOMER C ON T.CUSTOMER_ID=C.CUSTOMER_ID

GROUP BY GENDER

2.0s1.7s18.9s

SELECT MERCHANT, GENDER, SUM(AMOUNT)/COUNT(AMOUNT)

FROM TRANSACTION T

JOIN CUSTOMER C ON T.CUSTOMER_ID=C.CUSTOMER_ID

JOIN MERCHANT M ON T.MERCHANT_ID=M.MERCHANT_ID

GROUP BY MERCHANT, GENDER

2.4s3.4s35.3s

SELECT GENDER, SUM(AMOUNT)/COUNT(DISTINCT T.CUSTOMER_ID)

FROM TRANSACTION T

JOIN CUSTOMER C ON T.CUSTOMER_ID=C.CUSTOMER_ID

JOIN MERCHANT M ON T.MERCHANT_ID=M.MERCHANT_ID

GROUP BY MERCHANT, GENDER

11.2s14s171.8s

select gender, sum(txamount)/count(txamount), count(txamount), count(distinct t.customer_id)

from transaction t

join customer c on t.customer_id=c.customer_id

WHERE gender='M'

AND dob BETWEEN '1980-01-01' AND '1989-12-31'

AND maritalstatus='S'

AND postcode like '%TW1%'

group by gender;

360ms3.2s0.9s
SELECT STDDEV(TXAMOUNT) FROM TRANSACTION410s409s3.8s

With IQ, we see a very similar response time to HANA for Query 1. I'm not sure why that is, but I'm guessing IQ does SUM() and COUNT() on a single table very efficiently. I'd be interested in any IQ expert that can explain this.

Once we get into the realm of complex joins and grouping, HANA outperforms IQ 10:1. This is roughly what we expect because HANA stores its data for faster OLAP retrieval.

Interestingly, the last two questions, where we have a lot of restrictions, plus a COUNT DISTINCT, or a STDDEV, IQ outperforms HANA. We see this in a few places, where IQ's more mature OLAP engine can outperform HANA.

The HANA Development Platform

This little test doesn't bring out a lot of the qualities of HANA. It's worth making a quick note of these:

- If we want to do insertion of data at the same time as loading, HANA queries will continue to run very nicely. It's less clear to me how IQ will behave.

- With HANA, we have a set of engines which run against the core data: predictive, spatial and business functions.

- HANA provides a full development platform including a development IDE, Integration Services and a Web Server for application build.

- With HANA, we need only store one set of data for OLTP and OLAP workloads. IQ only works as a data warehouse and it requires a separate transactional system.

Update June 2014

I've rerun the queries on HANA SPS08 and you can see there has been a nice improvement across the board. Particularly noticeable is the decrease in time to run the query with the most restrictions. This will have a major impact on real-world performance and concurrency. We still don't have STDDEV and other similar calculations running in the column engine, which impacts these sorts of queries.


Conclusions

There's no question about it - in the real world, with complex grouping sets, HANA performs 10x as well as IQ. It's worth noting that IQ is a very fast data warehouse - especially compared to a regular RDBMS.

But make no mistake, IQ is an excellent data warehouse, and its more mature OLAP engine means that for certain operations, it can significantly outperform HANA.

With each release of SAP HANA, the development team optimizes more and more functionality and I have no doubt that what we see here may be very different in future revisions. SAP HANA SP07 is released very soon and I'm interested to see what that will bring.

Quick thank you to markmumy for his assistance getting Sybase IQ running nicely!

33 Comments
Labels in this area