cancel
Showing results for 
Search instead for 
Did you mean: 

Performance Issue while Joining two Big Tables

Former Member
0 Kudos

Hello Experts,

We've the following Scenario, wherein we need to have Sales rep associated for a Sales Order. This information is available in VBPA table with Sales Order, Sales Order Item and Partner Function being the key.

NOw I'm interested in only one Partner Function for e.g. 'ZP'. This table is having around 120 million records.

I tried both options:

Option 1 - Join this table(VBPA) with Sales Order Item table(VBAP) within the Data Foundation Layer of the Analytic View and doing the filtering on Partner Function

Option 2 - Create a Attribute View for VBPA having filtering on Partner Function and then join this Attribute View in the Logical Join Layer with the Data Foundation table.

Both these options are killing the performance.

Is there any way out to achieve this ?

Your expert opinion is greatly appreciated!!

Thanks & regards,

Jomy

Accepted Solutions (0)

Answers (5)

Answers (5)

Former Member
0 Kudos

Hi Experts,

I would like to include some more Data Modeling Experts to get their view on the same,as there should be some clean way of sorting these sort of things.

Thanks & regards,

Jomy

lbreddemann
Active Contributor
0 Kudos

Hi Jomy

I think you already got a lot of coverage here.

What's still missing is:

a) the full model including the sizes of the source tables

b) the planviz file for your long running query

c) the query.

Seriously: I wouldn't even look at this issue without these information.

So enough guessing, get the measuring tape !

- Lars

Former Member
0 Kudos

Hi Lars,

Couple of points on when the performance taking a hit.

We've a Delivery Analytical View with joins between LIPS(Delivery Item) to VBPA(Partner) which is joined on LIPS~VGBEL = VBPA~VBELN and LIPS~VGPOS = VBPA~POSNR.

Data Volume:

a. LIPS = 23 million

b. VBPA = 130 million

QUERY:

Now when i run the following Query:

Select WERKS, SUM(ZAN_CM10_TOT_DEL_QTY_CASE_ZORDDEL)

FROM "_SYS_BIC"."Reconciliation/TESTZSD_AN_DELIVERY_ITEMS"

group by WERKS

I get the result in 300-400 ms.

Now I'm interested in only 1 partner function i.e. 'ZR' and hence I did a filtering on VBPA such that PARVW = 'ZR' .( There are only 40 million records with PARVW = ZR).

Now when i run the above query, I get the result in 15-17 sec.

Let me know how to send you the PlanViz file.

Thanks & regards,

Jomy

Former Member
0 Kudos

Hi Lars,

Just fyi,

After looking at the Visualize Plan of the above mentioned different scenarios, I feel the following:

a) In the former case, join between LIPS and VBPA is not executed because it was never requested from query.

b) In the latter case, even though the it was never requested from the query, it actually enforce the filtering and the Join as well and hence the delay.

Thanks & regards,

Jomy

lbreddemann
Active Contributor
0 Kudos

Jomy, this information doesn't help me to help you.

Without the information I listed before, I really (and nobody) cannot tell you why your query is slow.

I think the offer to look into this for you is generous, so if you don't like to take it, that's fine with me.

. Lars

Former Member
0 Kudos

Hi Lars,

I've already mentioned the

  • Data Volume for those table
  • Query itself and
  • Asked how to send you the PLanViz file( basically how to take the overall dump of that for you)

In addition, Lars: I think there is a myth that join will never be enforced in HANA unless and until we request attributes pertaining to that join.

In this case, even though we are not requesting the attributes from that join, it enforces the join because of filtering, which I feel is not correct.

Thanks & regards,

Jomy

lbreddemann
Active Contributor
0 Kudos

Sorry but I don't see any SQL query in this thread, so no the query is not provided.

There's not anywhere a full diagram of the model.

I fail to find any run-time information on the involved base tables (an output of m_cs_all_columns would help) and I believe nowadays there are plenty of options available to share files on the web.

So, again, crucial information are missing.

Concerning myths: they are just that. Myths.

Depending on how joins are specified and what data is requested it might be valid to not execute the join if no joined attributes are requested. But this is obviously not always the case (think outer join, think n:m cardinality) - so 'always' and 'never' are not just over-simplifying matters here, these adjectives are just plain wrong.

- Lars

Former Member
0 Kudos

Hi,

Lars is correct. You may have to spend a little bit more time and give a bigger picture.

I have used this join. It takes about 2 to 3 seconds to execute this join for me. My data volume is less than yours.

You must be have used a left outer join when joining the attribute view (with constant filter ZP  as specified in your first post) to the data foundation. Please cross check once again, as sometimes my fat finger inadvertently changed the join type and I had to go back and fix it. If this is a left outer join or a referential join, HANA  does not perform the join if you are not requesting any field from the attribute view on table VBPA. This used to be a problem due to a bug in SP4 but got fixed in SP5.

However, if you have performed this join in the data foundation, it does enforce, the join even if you did not ask any fields from the VBPA table. The reason being that you have put a constant filter ZR (LIPS->VBPA join in  data foundation as specified in one of your later replies).

If any measure you are selecting in the analytic view is a restricted measure  or a calculated measure that needs some field from VBPA, then the join will be enforced as you would agree. This is where I had most trouble. My  join itself is not bad but my business requirement to get the current value of a partner attribute on  a higher level calculation view sent too much data from analytic view to calculation view.

Please send the full diagram  of your model and vizplan. Also, if you are using a front end (like analysis office), please trap the SQL sent from this front end tool and include it in the message.  Even a straight SQL you have used in which you have detected this performance issue will be helpful.

Ramana

Former Member
0 Kudos

Man, SCN is the place to be.

You've got essentially free consulting services from Ramana and Lars, some of the top guys in the HANA world. This is why SCN is so great.

Don't want to know what the real-world cost of this would be..!

Former Member
0 Kudos

Hi Ramana,

I need couple of help in order to give you those files:

a) Overall Diagram of the ANalytic View

   - Is there any way to extract the Analytic View diagram from HANA Studio which I can provide you?

b) How to extract the complete PlanViz file which I can give you ?

The SQL query I already mentioned which is :

          Select WERKS, SUM(ZAN_CM10_TOT_DEL_QTY_CASE_ZORDDEL)

          FROM "_SYS_BIC"."Reconciliation/TESTZSD_AN_DELIVERY_ITEMS"

          group by WERKS

This is the query which gets triggered from Analysis.

Thanks & regards,

Jomy

param0308
Participant
0 Kudos

Hi Jommy,

I am facing same same problem ..Can you please let me know if it is resolved.

And to Extract the Planviz der is option "save as: at the top right corner. It will generate file which you can only open in HANA studio. Same way to get the view diagram . Right click on the View diagram and select "export diagram" option

lbreddemann
Active Contributor
0 Kudos

Technically speaking, you can open the file with any text editor as it is just a XML style file.

0 Kudos

I'm just curious, have you already tried to write a manual SQL statement that performs the same join? And if so, how does it perform?

In my experience, if you have a larger table that you want to join to another large table, doing this in SQL is much faster than it is using an attribute view joined to an analytic view.

Former Member
0 Kudos

Hi Koster,

I would not like to go that route because of the following:

a) Then in that case I need to create a procedure which needs to be run daily or in some interval may be 3 times a day. This will in turn loose the real time nature of the solution.Business is not accepting that.

In essence, You would like to create a prefiltered table using HANA SQL Procedure.

Thanks & regards,

Jomy

0 Kudos

Hi Jomy,

Maybe I'm misinterpreting your requirements, but I'm not suggesting you should write a procedure and dump that outcome of that in a new table. I'm suggesting to either build a 'normal' database view in SQL, a SQL script-based calculation view (and mark the 'execution in SQL engine' option) or a BO BI universe on top of the tables.

All these options will go through the SQL optimizer / engine instead of the calculation and/or OLAP engine and might give you better real-time results. I would suggest you try to write a normal SQL statement first that simulates the above options, to see if it indeed provides you with an acceptable result.

Best regards,

Stefan

rainer_winkler
Contributor
0 Kudos

Hi Jomy,

did you regard to use an index?

Kind Regards

Rainer

Former Member
0 Kudos

Hi Rainer,

Can you please elaborate your thought ?

Thanks & regards,

Jomy

rainer_winkler
Contributor
0 Kudos

Hi Jomy,

to execute a join the database (simply formulated, I know the precise logic will be more sophisticated) loops over the first table and for each entrry of the first table searches for entries in the second table that matches the join condition. In a classical database I would make shure that there is a primary or secondary index for the second table that can be used (if the second table is big).

In HANA index are often not used, because HANA can loop over big tables very fast. But in this case HANA has to do this loop over the second table quite often, as also the first table is big. Therefore an index on the second table should improve the performance. SAP proposes to use index only if the performance is not sufficient without, but in this case it may be the case, that an index is needed.

With kind regards

Rainer

Former Member
0 Kudos

Hello Rainer,

Thanks for the info.

So do we need to create in ECC or in HANA on these tables ?

If in HANA, then how to create Index on these tables?

In the second table i.e. VBPA we've following key: MANDT,VBELN, POSNR and PARVW.

So according to your expertise, what shall be the index here ?

Thanks & regards,

Jomy

Former Member
0 Kudos

Hi Rainer,

I've created an INDEX on VBELN, POSNR and PARVW for table VBPA and still the performance is bad..getting the result set in 20 sec.

Thanks & regards,

Jomy

lbreddemann
Active Contributor
0 Kudos

HI Rainer


Rainer Winkler wrote:

to execute a join the database (simply formulated, I know the precise logic will be more sophisticated) loops over the first table and for each entrry of the first table searches for entries in the second table that matches the join condition. In a classical database I would make shure that there is a primary or secondary index for the second table that can be used (if the second table is big).

In HANA index are often not used, because HANA can loop over big tables very fast. But in this case HANA has to do this loop over the second table quite often, as also the first table is big. Therefore an index on the second table should improve the performance. SAP proposes to use index only if the performance is not sufficient without, but in this case it may be the case, that an index is needed.

what you describe is basically a nested loop join and doesn't apply at all to what's happening with column store tables.

Most of the time HANA will employ variants of HASH joins (and sometimes sorted merge joins).

For large data sets, HASH JOINS typically perform much better (complexity ~ O(2n) )  than nested loop joins (~O(n^2) ) and can be easier run in parallel.

Depending on the actual query, an inverted index (on both tables!) might be helpful here - but many times it is not.

Table size alone (in records) is not really a factor here for HANA as HANA can operate on compressed data in the value vector.

As with any other performance problem, what we need to understand here is where the time is spend during the query execution.

- Lars

rainer_winkler
Contributor
0 Kudos

Hi Lars,

thank you very much for the answer to my statement.

Kind Regards

Rainer

former_member184768
Active Contributor
0 Kudos

I wonder if these options are anyway different from each other. If you refer to the reply from Lars in the thread , it is mentioned that

"during activation of the analytic views, the joins in the attribute views get 'flattened' and included in the analytic view run time object"


So, as per my interpretation, both the options you mentioned, should have the similar execution plan.


I think the alternative would be to filter the data as much as it is possible, with additional filter criteria. Also you can try partitioning as suggested by Krishna. Also check the feasibility (although quite less likely) of persisting the data with additional redundancy so that the join may not be required and you can get the data from 1 table only.


Regards.


Ravi

Former Member
0 Kudos

Hi Ravi,

Yes both options are almost the same but definitely joining in Data Foundation always gives a better result when you are joining two big tables.

Nevertheless, according to you there are two alternatives:

a) Filtering the data as much as possible -  I can only filter on Partner Function and nothing more. Hence this is not an option for me

b) Partitioning - I need to try that out. WHere shall I partition? In SLT after stopping the replication or is it possible to partition when the replication is happening ?

c) The other possibility, is persisting the data with additional redundancy and then just join it. This will work for sure. But I don't want to do that. In this way, if there are other tables, then I don't need to repeat it again.

Thanks & regards,

Jomy

former_member182302
Active Contributor
0 Kudos

Hi Joy,


Jomy Joy wrote:

Yes both options are almost the same but definitely joining in Data Foundation always gives a better result when you are joining two big tables.

Can you share the Viz plan screens to see why that marginal increase in performance occurs while you are joining at data foundation?

The "flattening" happens during activation. So interested to know this behavior. It would be helpful if you can share.

Regards,

Krishna Tangudu

Former Member
0 Kudos

Hi Krishna,

Please have a look at this and it might give you additional insights :

http://scn.sap.com/thread/3433018

Thanks & regards,

Jomy

former_member182302
Active Contributor
0 Kudos

Hi Joy,

Thanks for the thread. I have gone through it already   just wanted to have a look on the Visual plans of your model to see that "Significance" difference in the speeds.

Regards,

Krishna Tangudu

former_member182302
Active Contributor
0 Kudos

Hi Joy,

Did you have a look on the Visual plans? Is the filtering happening before you joined?

I'd assume it would happen considering the options you tried. But have a check on the Visual plans and analyze what is consuming more time and how the records are transferring?

Regards,

Krishna Tangudu

Former Member
0 Kudos

Hi Krishna,

Earlier I thought that somehow 'Join' is happening first and then filtering but after looking at the Visualize Plan, I see that it's still doing filtering first and then Join as shown in the attachement but somehow it still kills the performance:

Any idea what might be reason and how to overcome this ?

Thanks & regards,

Jomy

former_member182302
Active Contributor
0 Kudos

Hi Joy,

Did you try partitioning on the tables? And see if it improves the performance?

Regards,

Krishna Tangudu

Former Member
0 Kudos

Hi Krishna,

I've already replicated the table from ECC to HANA.

Do you mean stopping the replication and replicating it again from ECC with Partitioning done via SLT

OR

do you have some other idea to do that ?

Just fyi- key of table is Sales Order, Sales Order Item and Partner Function and hence on what shall i partition?

Thanks & regards,

Jomy

former_member182302
Active Contributor
0 Kudos

yes joy i meant to do partitioning done Via SLT.

have a look on the below document for a sample:

As a starter.. As you have a primary key defined try using HASH partitioning on those columns and check if they give you any performance benefits.

Regards,

Krishna Tangudu

former_member182302
Active Contributor
0 Kudos

Hi Joy,

Let me know if the HASH partitioning works or even after that there is no performance difference for you?

Regards,

Krishna Tangudu

Former Member
0 Kudos

Hi Krishna,

I did the following:

I saw that there are 15 different Partner functions as of now in Production:

a) Specified PARTITION BY HASH ("MANDT","VBELN","POSNR","PARVW") PARTITIONS 8 in SLT

b) Join(Left Outer Join in specific) VBAP with VBPA in the Data Foundation Layer with added filtering on Partner Function in VBPA

c) When I ran the SQL on this Analytical View, it is returning data in 10-15 sec instead of around 400 ms when I used a prefiltered table of VBPA having data only with a specific Partner Function.

This means, Partitioning also will not work in this case.

This seems to be a real problem.

Thanks & regards,

Jomy