on 02-03-2016 10:05 PM
Hi there,
I'm working on a job which need to load four datasource by order and any one of them failed, the job should still continually go forward to the end.
In order to do that, I use a series of workflows which connected sequentially. Within each of workflow, there is one dataflow to load one data source to the target table. The target table is composed by those four datasource, so, for each of dataflow, I have to use Table_Comparison to make sure the fields are filled based on the primary key. For this approach, it satisfies on those requirements, however, for each of workflow, they are processing for a while. The whole chain takes too much time. Does anyone have idea on how to improve on the performance or any other strategy to handle this case? Thanks a lot!
The screenshot for each of the layer as below
WF chain:
Within each of WF,
Within each of DF
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Dirk,
Actually I have already checked the doc you mentioned above. I ordered the source based on Primary key and then chose the sorted input in T_C. However, it did not help. My data set is not as huge as 1million. In fact, it's just 300000 roughly. I have no idea why the T_C took so long.
Thanks for your help!
Cindy
Hi Dirk,
How the sort pushed down to the source? I check the source by "View data" and found that the source data already sorted by one of PK. They way I implemented is Query-> Order by -> two PKs.And in the T_C, I put the two PKs on the left, and other fields on the right as the compare columns . Is this the correct way?
That seems like the right approach indeed. In fact, that's how it always works for me with larger data volumes.
You mentioned your input is 300K records. Could you post some extra absolute figures?
Also, can you remove the temp table from your data flow? That might be another bottleneck.
Hi Dirk,
I tested the load without T_C. Apparently it went much faster than I used T_C. I also removed the temp table. With T_C, it did not help a lot.
I checked some docs, (actually a lot) many people mentioned index the fields, however, I haven't find more details on how to do it. Would you pl give me some hints on index? Thanks so much!
From my document on Table_Comparison:
Sorted Input:
Often the most efficient solution when dealing with large data sources, because DS reads the comparison table only once. This option can only be selected when it is guaranteed that the incoming data are sorted in exactly the same order as the primary key in the comparison table. In most cases incoming data must be pre-sorted, e.g. using a Query transform with an Order-by (that may be pushed down to the underlying database), to take advantage of this functionality.
From the log, it seems:
Stage your input data in a database table with the correct primary key (you can use the Data_Transfer transform to that extent) and make sure your comparsion table has the same primary key column(s). You're job will termiate in seconds.
Hi Dirk,
Really appreciated your help. The reason I did not put primary key in the conparison table is that one of data source include NULL value in one field of PK. So, I removed the PK in the target table, and use 'Identify key' as the PK. Now, I just took out that datasource and put the PK back on the comparison table.
In addition, I used a stage table to sort the datasource and after that, use the sorted stage table do T_C. However, I got error message on it. It seems like 'Order by' does not work during the first DF. The stage table is still not sorted. Please look at the below .In order to solve this error, I have to put another Query in front of T_C ( Order by one more time). The error has gone. But the no improvement on the performance compare with the old solution. Would you know where I did wrong? Thanks again!
Sorting in the first data flow is useless. You need the Order by in the 2nd, before the TC.
Can you try and do so? And post the generated SQL (from DS Designer menu: Validation > Display Optimized SQL...)?
Set the Degree of Parallellism (in Dataflow Properties) to 1, run the job and post the Monitor output again, too, if you want.
Hi Dirk,
Thanks for your patience. I added the Order by in the second DF, before the TC. and also change the Degree of Parallellism to 1. Unfortunately, the performance has not been improved. Checked the generated SQL, it looks like below,
in the first DF, load to Stage table
SELECT MATNR , WERKS , PSTAT , LVORM , BWTTY , XCHAR , MMSTA , MMSTD , MAABC , (.................all of fields) ODQ_CHANGEMODE , ODQ_ENTITYCNTR
FROM PLANT_ATTR_PJ_ECC_DATA_SQL
in the second DF, two SQLs
SELECT "TCRdr_4"."MATNR" , "TCRdr_4"."WERKS" , "TCRdr_4"."PSTAT" , "TCRdr_4"."LVORM" , "TCRdr_4"."BWTTY" , "TCRdr_4"."XCHAR" , "TCRdr_4"."MMSTA" (xxxx..................................................................................................................................................................................................................................................................................................)
FROM "SAPSTG"."MaterialPlant" "TCRdr_4"
ORDER BY "TCRdr_4"."MATNR" COLLATE Latin1_General_BIN ASC , "TCRdr_4"."WERKS" COLLATE Latin1_General_BIN ASC
SELECT "StagingPlantAttr"."MATNR" , "StagingPlantAttr"."WERKS" , "StagingPlantAttr"."PSTAT" , "StagingPlantAttr"."LVORM" , "StagingPlantAttr"."BWTTY" , "StagingPlantAttr"."XCHAR" , "StagingPlantAttr"."MMSTA" , (xxxxxx..................................................................................................................)
ORDER BY "StagingPlantAttr"."MATNR" COLLATE Latin1_General_BIN ASC , "StagingPlantAttr"."WERKS" COLLATE Latin1_General_BIN ASC
Thanks again!
Dirk,
I have replaced the 'Query' in the first DF with 'Data_Transfer'. It looks a little bit better. But overall, it's still around 30 mins for the two data source. I attached the monitor, Apparently, the fist DF which load the data into staging table(use Data Transfer) took a while. would you pl suggest on any more improvement if possible? Really appreciated!
Yes, from that log output it seems the extraction process is terribly slow. That probably means there are no issues with the TC and the sorts are OK.
It's your BCE that's causing the problem. There's nothing you can do in DS about this. You should check with your Basis team whether they can improve the extractor performance.
Did you tried with Auto Correct Load option?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi
1. make 2 separate flows, the first flow is Source table >Query> target table, and the second is with TC. By doing so the first flow will be pushed down to DB, thus no Job server is involved in calculations.
2. How do you to optimize TC transform? This can not be pushed down to the DB, thus local resources are used. It depends on Source and target table, if both are properly indexed according to your Comparison options, then performance should be fine. The best bet here would be to avoid TC transform and simpy use script instead, which compares your 2 tables, seeing that there is no SCD2 needed, then this is very easy to maintain.
Custom scripts are always faster and better than predefined transformations.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You can also try to use sorted input in TC (remember to sort the data before the TC )
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Can you let me know what is the count of the Target table for all the data sources. If Amount of data is not too huge then try using 'Cached Comparison Table ' in Table Comparison.
Performance is okay as long as the amount of data is not too huge.
Regards
Arun Sasi
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.