cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental Key in Group

Former Member
0 Kudos

Hi,

I need help to get the solution of following scenario:

I have table with 4 primary key : PK1,PK2,PK3,PK4.

Input table :

PK1PK2PK3PK4
T1abc
T1bcd
T2bcd
T2abc

1st load I need to generate the count as per PK1.

PK1PK2PK3PK4Count
T1abc1
T1bcd2
T2bcd1
T2abc2

2nd load will be delta load,

If i will get input table like :

PK1PK2PK3PK4
T1def
T2bcd
T2abc

I should get output like:

PK1PK2PK3PK4Count
T1abc1
T1bcd2
T1def3
T2bcd1
T2abc2

Any help will be much appreciated.

Thanks in advance.

Warm Regards,

Amit

Accepted Solutions (1)

Accepted Solutions (1)

former_member187605
Active Contributor
0 Kudos

That's a nice one . I solve this in 3 data flows.

1/. Load INPUT table into OUTPUT table. Add CNT column, use gen_row_num_by_group(PK1) to populate CNT.

2/. Load DELTA table into TEMP table. Use Table_Comparison (PK1, PK2, PK3 & PK4 as Input primary key columns, no Compare columns) to compare with OUTPUT and Map_Operation (Update > Discard, Insert > Normal) to get rid of already existing records in output. Make sure there are no primary key columns in TEMP (you may have duplicates).

3/. Extract 2 columns, PK1, CNT from OUTPUT, group by PK1 and map CNT to max(CNT). Rename CNT column to MaxCNT in output schema. Join that result with TEMP on PK1. Map maxCNT + gen_row_num_by_group(PK1)  to CNT.

Former Member
0 Kudos

Hi Dirk,

Thanks for your input.

I implemented as you suggested but I have issue as I am getting  data from the same source table and I need to implement Delta logic for second load but need to keep the old data as well.

Data Flow1) First time when I will load data in OUTPUT table next time this id should not be changed.So either I have to exclude this table in next load or include table comparison but how we will deal this load?.

DataFlow 2) There is no separate Delta table its same table but next data load may be you can say all the data included some new data.

Data Flow 3) We should merge this two table?.

Thank you very much.

-Amit

former_member187605
Active Contributor
0 Kudos

1/. No change. You only run this once, for initial load only.

2/. Load INPUT table into TEMP table. Use Table_Comparison (PK1, PK2, PK3 & PK4 as Input primary key columns, no Compare columns) to compare with OUTPUT and Map_Operation (Update > Discard, Insert > Normal) to get rid of already existing records in input and output. Make sure there are no primary key columns in TEMP (you may have duplicates).

You may improve performance if you can extract those records only that are new in INPUT, e.g. using a last_modification timestamp. If not, use Sorted input option in Table_Comparison.

3/. No Change

Former Member
0 Kudos

Hi Dirk,

For First and Second load its producing correct result.

1) INPUT -----> OUTPUT

2) INPUT--->Table Comparison(OUTPUT)--->Map Operation---->TEMP

3) OUTPUT + TEMP ----> FINAL.

but in Third time data load

Now as you suggested of not using the OUTPUT table for next load. when we will run second dataflow it will compare record with OUTPUT table which has data of 1st dataload and  2nd data load records will be new for this and it will again load that data.

I am not sure if I am missing something here.

Please let me know how to deal with this situation.

Thanks

-Amit

former_member187605
Active Contributor
0 Kudos

Where did I suggest not using OUTPUT for next loads? I didn't.

What you're missing is the output of the 3rd data flow (my mistake, I forgot to mention that explicitly, I am sorry): that's the same OUTPUT table (there's no separate FINAL table!).

3/. Extract 2 columns, PK1, CNT from OUTPUT, group by PK1 and map CNT to max(CNT). Rename CNT column to MaxCNT in output schema. Join that result with TEMP on PK1. Map maxCNT + gen_row_num_by_group(PK1)  to CNT. Map PK1, PK2, PK3 & PK4 from TEMP. Write to OUTPUT.


Run data flow 1 only once. Then you can repeat flows 2 & 3 as often as you want.

Former Member
0 Kudos

Thanks Dirk,

It worked....

Answers (0)