on 06-21-2016 8:03 AM
Hi,
I need help to get the solution of following scenario:
I have table with 4 primary key : PK1,PK2,PK3,PK4.
Input table :
PK1 | PK2 | PK3 | PK4 |
T1 | a | b | c |
T1 | b | c | d |
T2 | b | c | d |
T2 | a | b | c |
1st load I need to generate the count as per PK1.
PK1 | PK2 | PK3 | PK4 | Count |
T1 | a | b | c | 1 |
T1 | b | c | d | 2 |
T2 | b | c | d | 1 |
T2 | a | b | c | 2 |
2nd load will be delta load,
If i will get input table like :
PK1 | PK2 | PK3 | PK4 |
T1 | d | e | f |
T2 | b | c | d |
T2 | a | b | c |
I should get output like:
PK1 | PK2 | PK3 | PK4 | Count |
T1 | a | b | c | 1 |
T1 | b | c | d | 2 |
T1 | d | e | f | 3 |
T2 | b | c | d | 1 |
T2 | a | b | c | 2 |
Any help will be much appreciated.
Thanks in advance.
Warm Regards,
Amit
That's a nice one . I solve this in 3 data flows.
1/. Load INPUT table into OUTPUT table. Add CNT column, use gen_row_num_by_group(PK1) to populate CNT.
2/. Load DELTA table into TEMP table. Use Table_Comparison (PK1, PK2, PK3 & PK4 as Input primary key columns, no Compare columns) to compare with OUTPUT and Map_Operation (Update > Discard, Insert > Normal) to get rid of already existing records in output. Make sure there are no primary key columns in TEMP (you may have duplicates).
3/. Extract 2 columns, PK1, CNT from OUTPUT, group by PK1 and map CNT to max(CNT). Rename CNT column to MaxCNT in output schema. Join that result with TEMP on PK1. Map maxCNT + gen_row_num_by_group(PK1) to CNT.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Dirk,
Thanks for your input.
I implemented as you suggested but I have issue as I am getting data from the same source table and I need to implement Delta logic for second load but need to keep the old data as well.
Data Flow1) First time when I will load data in OUTPUT table next time this id should not be changed.So either I have to exclude this table in next load or include table comparison but how we will deal this load?.
DataFlow 2) There is no separate Delta table its same table but next data load may be you can say all the data included some new data.
Data Flow 3) We should merge this two table?.
Thank you very much.
-Amit
1/. No change. You only run this once, for initial load only.
2/. Load INPUT table into TEMP table. Use Table_Comparison (PK1, PK2, PK3 & PK4 as Input primary key columns, no Compare columns) to compare with OUTPUT and Map_Operation (Update > Discard, Insert > Normal) to get rid of already existing records in input and output. Make sure there are no primary key columns in TEMP (you may have duplicates).
You may improve performance if you can extract those records only that are new in INPUT, e.g. using a last_modification timestamp. If not, use Sorted input option in Table_Comparison.
3/. No Change
Hi Dirk,
For First and Second load its producing correct result.
1) INPUT -----> OUTPUT
2) INPUT--->Table Comparison(OUTPUT)--->Map Operation---->TEMP
3) OUTPUT + TEMP ----> FINAL.
but in Third time data load
Now as you suggested of not using the OUTPUT table for next load. when we will run second dataflow it will compare record with OUTPUT table which has data of 1st dataload and 2nd data load records will be new for this and it will again load that data.
I am not sure if I am missing something here.
Please let me know how to deal with this situation.
Thanks
-Amit
Where did I suggest not using OUTPUT for next loads? I didn't.
What you're missing is the output of the 3rd data flow (my mistake, I forgot to mention that explicitly, I am sorry): that's the same OUTPUT table (there's no separate FINAL table!).
3/. Extract 2 columns, PK1, CNT from OUTPUT, group by PK1 and map CNT to max(CNT). Rename CNT column to MaxCNT in output schema. Join that result with TEMP on PK1. Map maxCNT + gen_row_num_by_group(PK1) to CNT. Map PK1, PK2, PK3 & PK4 from TEMP. Write to OUTPUT.
Run data flow 1 only once. Then you can repeat flows 2 & 3 as often as you want.
User | Count |
---|---|
85 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.