Solved: Join optimization

rajarshi_muhuri · ‎10-30-2013

This is a new thread opened for a continuing discussion

currently join is needed between two huge fact tables . If the join is performed on the fly on HANA , the performance is degraded and HANA sometimes crashes .

Thus a stored procedure chunks the data and then performs the join on the smaller chunk and then dumps the results in a physical table . The chunking is inefficient and but the data inserts is even more . The majority of time is taken in the delta merges ( data inserts) .

Here is the code

essentially , there are two tables Table_A , and Table_B and an analytical view is created from them .

we pick data between 2 timestamps from analytical view made from Table_A

and within a loop we join data from Table_A and Table_B

and dump the results to a physical table .

What can be done to optimize .

unfortunately I cant seem to hold in Memory the results of the while loop like

A= A union ( results of while loop)

CREATE PROCEDURE CPM_SAP.SP_ZZFSL_CE11_REV_JOIN_V4 
(IN MIN_TIMESTAMP NVARCHAR(14), IN MAX_TIMESTAMP NVARCHAR(14),IN MIN_PERIOD NVARCHAR(6), IN MAX_PERIOD NVARCHAR(6)) 
LANGUAGE SQLSCRIPT AS
XPERIOD INT  := :MIN_PERIOD;
XYEAR   INT  := 0;
XMONTH  INT  := 0;
BEGIN
VAR_Z_NAME_OBSCURED = SELECT "RYEAR", "RTCUR", "RUNIT", "POPER", "RBUKRS", "RACCT", "RCNTR", "RPRCTR", "RZZSITE", 
"RZZPOSID", "RZZFKBER", "DOCTY", "REFDOCNR", "AUFNR", "ZZLDGRP", "VBUND", "ZZBUZEI", "AWORG", "CPUDT", "CPUTM", 
"ZZBUZEI_RPOSN", "PERIOD_ID", "TIMESTAMP",
"TSL" AS "TSL", "HSL" AS "HSL", "KSL" AS "KSL", "MSL" AS "MSL" 
FROM "_SYS_BIC"."wlclose/AN_Z_NAME_OBSCURED_REV_KSLNN" 
where "TIMESTAMP" > :MIN_TIMESTAMP  AND "TIMESTAMP" <= :MAX_TIMESTAMP ;
WHILE :XPERIOD <= :MAX_PERIOD DO 
VAR1_Z_NAME_OBSCURED = SELECT "RYEAR", "RTCUR", "RUNIT", "POPER", "RBUKRS", "RACCT", "RCNTR", "RPRCTR", "RZZSITE", 
"RZZPOSID", "RZZFKBER", "DOCTY", "REFDOCNR", "AUFNR", "ZZLDGRP", "VBUND", "ZZBUZEI", "AWORG", "CPUDT", "CPUTM", 
"ZZBUZEI_RPOSN", "PERIOD_ID", "TIMESTAMP",
"TSL" AS "TSL", "HSL" AS "HSL", "KSL" AS "KSL", "MSL" AS "MSL" 
FROM :VAR_Z_NAME_OBSCURED 
where "PERIOD_ID" = CAST(:XPERIOD AS VARCHAR(6)); 
VAR_CE11000 = SELECT "PALEDGER","KSTAR", "PERIOD_ID","BUKRS", "COPA_AWORG",  "ZZBUZEI_RPOSN", "RBELN", "PERIO", 
"GJAHR", "PERDE", "RPOSN", "PRCTR", "PPRCTR", "SKOST", "RKAUFNR", "KURSF", "VV005_ME", "VV006_ME", "WW004","WW011" , "WW012", "WW013", 
"WW008", "WW023", "WW010", "WW020", "WW015", "WW006", "WW024", "WW026", 
"VV005" AS "VV005", "VV006" AS "VV006" FROM "_SYS_BIC"."wlclose/AN_CE11000_REV_LDG01"
where "PERIOD_ID" = CAST(:XPERIOD AS VARCHAR(6));
VAR_JOIN =  SELECT Z.RACCT AS RACCT, Z.RBUKRS AS RBUKRS, Z."PERIOD_ID" AS ZPERIOD, Z."AWORG" AS AWORG, Z."ZZBUZEI",
Z."REFDOCNR" as REFDOCNR, Z."RCNTR" as RCNTR, Z."RPRCTR" as RPRCTR, Z."RZZSITE" AS RZZSITE, Z."RZZPOSID" AS RZZPOSID, 
Z."RZZFKBER" AS RZZFKBER, Z."DOCTY" AS DOCTY, Z."AUFNR" AS AUFNR, Z."ZZLDGRP" AS ZZLDGRP, Z."VBUND"AS VBUND, 
Z."TIMESTAMP" AS TIMESTAMP,Z."RYEAR" AS RYEAR, Z."POPER" AS POPER ,Z."TSL" AS TSL, Z."HSL" AS HSL,  Z."RTCUR" AS RTCUR,
Z."KSL" AS KSL, Z."MSL" AS MSL, Z."RUNIT" AS RUNIT, C."PALEDGER" AS PALEDGER ,C."KSTAR" AS KSTAR, C."BUKRS" AS BUKRS,
C."PERIOD_ID" AS CPERIOD,C."COPA_AWORG" AS COPA_AWORG,C."RBELN" AS RBELN,C."ZZBUZEI_RPOSN" AS RPOSN,  
C."WW004" AS WW004, C."WW011" as WW011,  C."WW012" AS WW012 , "WW013" AS WW013  , C."WW008" AS WW008,  C."WW023" AS WW023, C."WW010" AS WW010, 
C."WW020" AS WW020, C."WW015" AS WW015, C."WW006" AS WW006, C."VV005" AS VV005,C."VV005_ME" AS VV005_ME,  
C."VV006" AS VV006, C."VV006_ME" AS VV006_ME
FROM :VAR1_Z_NAME_OBSCURED  Z 
LEFT OUTER JOIN :VAR_CE11000 C ON 
                        Z."ZZBUZEI_RPOSN"=C."ZZBUZEI_RPOSN" AND 
                        Z."REFDOCNR"=C."RBELN" AND 
                        Z."AWORG"=C."COPA_AWORG" AND 
                        Z."RBUKRS"=C."BUKRS" AND
                        Z."PERIOD_ID"=C."PERIOD_ID" AND
                        Z.RACCT = C.KSTAR;

INSERT INTO CPM_SAP.ZZFSL_CE11_REV_JOIN (SELECT * FROM :VAR_JOIN );
XYEAR  := (:XPERIOD/100);
XMONTH :=  :XPERIOD -  (:XYEAR*100) ;
IF :XMONTH =12 
THEN 
XPERIOD := ((:XYEAR+ 1) * 100) +1;
ELSE 
XPERIOD := :XPERIOD +1;
END IF;
END WHILE;

END;

Former Member · ‎11-09-2013

My apologies as I don't care to rake through the code and figure out what exactly is going on. So, fully admitting my ignorance, I have to ask -

If you're wanting to combine two fact tables, why aren't you performing a "union with constant values" (UCV) on-the-fly? This approach is doc'ed here (slide 15) by Werner if you're not familiar with this approach: http://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/6056911a-07cc-2e10-7a8a-ffa9b8cf5...

If both fact tables don't share exact same dimensions, sometimes you can still build an intermediate table with required dimensions, join that to the fact table - and then do the UCV.

rajarshi_muhuri · ‎11-18-2013

Thanks everyone for the effort and analysis they have put in this query of mine . I had not had the time , but will now test out the ideas presented here , and update you guys .

rajarshi_muhuri · ‎11-15-2013

Every body is right, but we are not on the same page , but I agree more with Ravi. So I am taking a simpler example and I agree with Jody ..I was too lazy earlier. I too knew UCV but not familiar with the acronym.

We all like CARS !! so

Table: Dealership Car Sales Amounts

Table : Dealership Car Quantity sold

The classic Inner Join would be

Union with Constants (focusing on the 3rd row of the sql inner join) :

I want them in one line , but with dis-simmilar dimensions they will come in two rows. So not acceptable .

However if I have a need for the output where I take only the dimensions which are also the join columns , then i get the required result . (I can also simulate left or right join with a flag-ing trick)

So to conclude , and to revert back to the original problem of mine , The nature of requirements is that I cant do much filtering but need to do this left outer massive join .

This is why I was using the SP to have the data lifted in chunks and joined and spit to a physical table . It performs relatively well when taking data between say August 2013 to Nov 2013 . when running from 2011 - 2013 , it takes 50 min .

I had long since had looked at John Appleby's blog , and took two ideas

1. partitioning the table

2. turn off delta merge and manually merging the data while exiting the SP.

Unfortunately I tested in DEV , where we have small data set . In DEV the SP without Johns Ideas took 2 min , and with John's idea took the same or maybe couple of nano seconds longer.

I am sure that the real advantage of his ideas would come , if I could be given a chance to implement it in PROD where we have a huge data. But since I had implemented it 2 years back and it works , I am not being allowed to tweak it lest something breaks

lbreddemann · ‎11-08-2013

Hi there,

indeed this is a typical situation with DB development in SAP HANA.

There are large tables that are joined with other large tables and/or with information models (in this case an analytic view) and they should produce a large joined result set.

One key point has been overlooked here: to get the super short response times, it's absolutely required to reduce the amount of data that is worked upon as early as possible.

Very likely the usage scenario for the result of this join will be further analysis steps - not the pure list output.

If this is the case, then the models need to be created appropriately. Most of the times, the 'one-model-fits-all-reports' approach won't lead to satisfying results.

To really gain performance for your design, let us know the reporting/analysis requirement(s) you want to solve. Based on that one or several models can be created to accommodate them.

- Lars

Former Member · ‎10-30-2013

Can you paste the table DDL as well, as well as some sample data that we could explode into a big dataset? How big are the tables? What is the performance of the join and the timing of this procedure?

This is a fairly typical challenge with HANA. A few comments:

- It surprises me that a view can't be made to perform well

- CE Functions will typically improve the performance of this sort of code

- Even still, your insert cost will be substantial if there are a lot of rows - you should be able to get 1m rows/sec at best

- Consider partitioning your target table by TIMESTAMP/PERIOD so that your MERGE DELTA doesn't affect the same data that is already stored. This will massively increase INSERT/MERGE DELTA performance

- You need to nest your SQLScript functions so there are no scalars in your SELECT statements. This code will all run in a single thread which isn't helping

John

Join optimization

Accepted Solutions (1)

Accepted Solutions (1)

Answers (4)

Answers (4)

Re: SAP MAXDB : Content Server Administration - Re...

Re: LLM, RAG and Cloud Foundry: No space left on d...

How to configure SAP system in Eclipse ?

Re: Fiori Elements App - With Intent Based Navigat...

Using PINGFED, would that be considered as MFA for...