Solved: Understanding the value of hash partition

patrickbachmann · ‎07-21-2014

Hi folks,

In the case of Range partition I can clearly see and understand the value. ie: if I create a range partition on MSEG based upon Material Doc year (MJAHR) I can have a partition for each year. Let's say I have 10 years and 10 partitions. If I query the year 2006 then only that partition is read and the advantage is obvious and I get much faster response times.

But in contrast, if I were to create a HASH partition it is supposedly an equal distribution. So let's say I created a HASH 10 partition on sales item table VBAP. Since there is no nice key field based on YEAR in this table like there is in MSEG lets say I use Sales Document Number (VBELN) and create HASH 10 VBELN. Now lets say our sales document numbers start at 0000000001 all the way through 9999999999. Now I run a query that pulls all sales documents with number somewhere in the middle such as 5555555555. Couldn't this sales document exist in ANY of the 10 equally distributed partitions? Meaning wouldn't each and every partition need to be searched to find the value 5555555555? This is the part that I'm trying to understand about HASH partition. Is there still some sort of unseen range within the 10 hash buckets? I need to understand the algorithm how these records are distributed because if ALL hash partitions are read when I run my query then I do not see any advantage in this type of partition.

Many thanks to anyone who can help clarify HASH type partition!

-Patrick

lbreddemann · ‎07-22-2014

HI Patrick

the point of a HASH function H(x) is that it

distributes all input data x equally to a predefined output set (say the partition numbers)
gives the same value every single time for the same data

So, at the one hand we can evenly distribute the data, e.g. over several nodes/hosts.

And on the other hand, whenever a value x is requested, all we need to do is to evaluate the hash function H(x) to find out in which partition the value is stored.

That way we can simply skip all other partitions (we 'prune' them from the execution path).

A very easy example would be the modulo function that returns the remainder of a division.

Say you have 4 partitions and a partition key column with these values {0, 3, 4, 1, 3, 5, 7, 20, 22}.

In that case the modulo function for a division by 4 would deliver the following partition assignments:

key -> partition

0 -> [ 1 | | | ]

3 -> [ | | | 4 ]

4 -> [ 1 | | | ]

1 -> [ | 2 | | ]

3 -> [ | | | 4 ]

5 -> [ 1 | | | ]

7 -> [ | | | 4 ]

20 -> [ 1 | | | ]

22 -> [ | | 3 | ]

Note that the partition numbers start with 1, while the modulo function will "start" at 0 - thus the shift by one to match the partition numbers.

Easy to see, when I am asking for key = 21 I only need to look into partition 2.

There's a lot more to hash functions (read Knuth to know the most important stuff!) and partitioning, but I hope that the example gets you over the little mental hurdle.

- Lars

Former Member · ‎07-22-2014

Hello Bachmann,

As far as i know If you're searching for a single Sales_Document_Number then no. SAP HANA's hashing algorithm is consistent, i.e. records with the same partition key gets populated in the same partition. on the contrary if you are searching for Sales_Document_Numbers of the last one year then the hashing algorithm will go through a lot of stress to distribute the Tuples evenly.

SAP HANA's hashing algorithm distributes the data in a repeatable manner which makes the Identical Sales_Document_numbers fall under the same partition.

Regards,

Krishna

Understanding the value of hash partition

Accepted Solutions (1)

Accepted Solutions (1)

Answers (1)

Answers (1)

Re: Continuous Integration and Delivery Failed to ...

Exploring SAP support portals & processes

Re: Flexible Workflow for PR

Re: Auto close/update all open Notifications work ...

Re: Auto close/update all open Notifications work ...