cancel
Showing results for 
Search instead for 
Did you mean: 

How to fill a table column with a value from other column of same table

sksaxena
Participant
0 Kudos

Hi Experts,

Now this is something tricky and few possible things are not working out.

SCENARIO :

I have a table SFE_EXP_BODS_F. Suppose there are 5 columns in table :

1. DIVISION_NAME (varchar)

2. SPECIALITY_NAME (varchar)

3. CALDATE (date)

4. TIMES_EXPOSED (int)

5. PY_TIMES_EXPOSED (int) --> to be filled

CALDATE contains the value from 1 APR 2014 to 31 Mar 2016. (in blocks of month like 2014-04-01, 2014-05-01)

First four columns are coming from previous transformation but PY_TIMES_EXPOSED is getting calculated here in this query transform (via New output column).

LOGIC to be implemented for PY_TIMES_EXPOSED :

This column should get the same value as TIMES_EXPOSED column exactly 1 year back.

This means if a record is coming in this query transform with CALDATE = 2015-04-01, PY_TIMES_EXPOSED should get the value from TIMES_EXPOSED column of this table when CALDATE was 2014-04-01 and rest all fields value = present values of record.

If a record comes with CALDATE < 2015-04-01, then PY_TIMES_EXPOSED should be NULL because the data is from 01 APR 2014 onwards.

IMPLEMENTATION :

I tried devising a logic for this scenario and here's the mapping I used for PY_TIMES_EXPOSED in query transform :

ifthenelse(CALDATE > '2015-03-31',

sql('DRL_ORACLE_Prod', 'SELECT TIMES_EXPOSED from SFE_EXP_F_BODS where CALDATE = \'[add_months(Q_LOOKUP2.CALDATE, -12)]\'

AND DIVISION_NAME = \'[Q_LOOKUP2.DIVISION_NAME]\'

AND SFE_BRAND = \'[Q_LOOKUP2.SFE_BRAND]\'

AND SPECIALITY_NAME = \'[Q_LOOKUP2.SPECIALITY_NAME]\'

AND HQ_CODE = \'[Q_LOOKUP2.HQ_CODE]\'

AND CLASSIFICATION = \'[Q_LOOKUP2.CLASSIFICATION]\''), NULL)

Few other fields seen in above sql operation are just other dimension fields which needs to be matched to fetch TIMES_EXPOSED.

This mapping is not giving any error in validation and should have worked as per me but it is not working when the execution control reaches this query transform. 0 records are sent to target, which is after this query transform and job errors out. Error screen is attached. There's no ODBC connectivity issue.

Any workaround to achieve this functionality??? Please help me. It's very urgent.

Thanks a ton as always!

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Once try lookup_ext function instead of sql function.

sksaxena
Participant
0 Kudos

Thank you Shailesh for reply. However, I am affraid it didn't work out. Same thing is happening...Output of final query transform(with lookup logic) is getting processed extremely slow. It's like 1 row in 5 secs. Same like the sql logic. Nothing in output.

Please help on this.

sksaxena
Participant
0 Kudos

Hi ,

Lookup_ext function worked up! However I had to use 'PRE_LOAD_CACHE'. It's not moving with 'NO CACHE'. Job takes around 700 secs. to complete 6 million lookups from a 8 million lookup table records.

I am still searching for a solution, which can achieve this a little faster.

Thanks a ton Shailesh. Really appreciate!

Answers (5)

Answers (5)

sksaxena
Participant
0 Kudos

,

I tried using the condition suggested by you and it didn't work out. A record which existed only in Jan 2015 didn't get corresponding record in Jan 2016.

Here are my Join conditions :

Q1 (SFE_EXP_STAGE left join SFE_EXP_STAGE_1):

SFE_EXP_STAGE.CALDATE = add_months(SFE_EXP_STAGE_1.CALDATE, 12) and

SFE_EXP_STAGE.DIVISION_NAME = SFE_EXP_STAGE_1.DIVISION_NAME and

SFE_EXP_STAGE.SFE_BRAND = SFE_EXP_STAGE_1.SFE_BRAND and

SFE_EXP_STAGE.SPECIALITY_NAME = SFE_EXP_STAGE_1.SPECIALITY_NAME and

SFE_EXP_STAGE.HQ_CODE = SFE_EXP_STAGE_1.HQ_CODE and

SFE_EXP_STAGE.CLASSIFICATION = SFE_EXP_STAGE_1.CLASSIFICATION

Q2 (SFE_EXP_STAGE_1 left join SFE_EXP_STAGE😞

SFE_EXP_STAGE.CALDATE = add_months(SFE_EXP_STAGE_1.CALDATE, 12) and

SFE_EXP_STAGE_1.DIVISION_NAME = SFE_EXP_STAGE.DIVISION_NAME and

SFE_EXP_STAGE_1.SFE_BRAND = SFE_EXP_STAGE.SFE_BRAND and

SFE_EXP_STAGE_1.SPECIALITY_NAME = SFE_EXP_STAGE.SPECIALITY_NAME and

SFE_EXP_STAGE_1.HQ_CODE = SFE_EXP_STAGE.HQ_CODE and

SFE_EXP_STAGE_1.CLASSIFICATION = SFE_EXP_STAGE.CLASSIFICATION



Request you to review and suggest where are we missing.


Thanks

sksaxena
Participant
0 Kudos

Hi Dirk Venken ,

I truncated few tables in DB and the Job ran. Now, here's what I did in job:

Took a source table X and it's replica X_1 for joining.

Before joining I put a filter on X_1 dataset. In query Q1 which receives X_1 input, I gave this in where clause :


X_1.CALDATE = add_months(sysdate (), -12)


Then Q1 output was joined with X tale. X left join Q1 with the join condition :

X.CALDATE = add_months(Q1.CALDATE, 12) and

X.HQ_CODE = Q1.HQ_CODE and

X.DIVISION_NAME = Q1.DIVISION_NAME and

X.SFE_BRAND = Q1.SFE_BRAND and

X.SPECIALITY_NAME = Q1.SPECIALITY_NAME and

X.CLASSIFICATION = Q1.CLASSIFICATION


Target was filled this time, with PY values for year 2015 and 2016 only - which looks good. But to check whether all PY column values got properly filled, I am doing a check :


For example SUM(PY_TIMES_EXPOSED) for APR 2016 should be equal to SUM(TIMES_EXPOSED) for APR 2015. Right?


Here I am NOT getting the same values. Trust, this is a big setback for me.


Please suggest further.


Thank you

former_member187605
Active Contributor
0 Kudos

What makes you think those sums should be the same? From the moment you've got one combination of   HQ_CODE, DIVISION_NAME, SFE_BRAND, SPECIALITY_NAME and CLASSIFICATION that doesn't exist in both years, they wil differ.

sksaxena
Participant
0 Kudos

Hi

Yes you're right! See the scenario is I need to have all these 3 types in my target table. A record can be either of these types :

1. A record with TIMES_EXPOSED value and also PY_TIMES_EXPOSED value --> For example a record in May 2015 which existed in May 2014 also.

2. A record with TIMES_EXPOSED value and null PY_TIMES_EXPOSED value --> For example a record which came first time in any of months.


3. A record with null TIMES_EXPOSED value and non-null PY_TIMES_EXPOSED VALUE --> For example, a record which existed only in previous year but didn't come in succeeding year.

All records from APR 2014 to Mar 2015 will have some TIMES_EXPOSED value and null PY_TIMES_EXPOSED value. I MUST insert a record in target table corresponding to all such records in APR 2015 to MAR 2016, with TIMES_EXPOSED, PLAN_EXPOSURE as 0 but PY_TIMES_EXPOSED as TIMES_EXPOSED of 2014-15.

As per our present logic of Left join, we are unable to capture Case 3 and all those records with Valid PY values are not appearing in target table. At the front end, dashboard compares Present values vs. PY values on month by month basis.

Reply in your words :

If a combination of   HQ_CODE, DIVISION_NAME, SFE_BRAND, SPECIALITY_NAME and CLASSIFICATION that doesn't exist in both years is the case, we need to insert a record in succeeding year with same value in these 5 columns but TIMES_EXPOSED value in PY_TIMES_EXPOSED column of succeeding year record. And measure columns like TIMES_EXPOSED, PLAN_EXPOSURE can be null or 0.


Please help. It has become pretty challenging to correctly build this table. Thanks a ton for your support on this case. Thanks man!

former_member187605
Active Contributor
0 Kudos

You need a full outer join then. See full outer join in data services | SCN for how to build it.

sksaxena
Participant
0 Kudos

Thanks for the helpful information once again!! Here's how my job looks after implementing Full Outer Join :

I have put the same join conditions in both Q1 and Q2. Q1 has SFE_EXP_STAGE left join SFE_EXP_STAGE_1 and Q2 has SFE_EXP_STAGE_1 left join SFE_EXP_STAGE.

I want some clarifications :

1. I am using the same join conditions in both the Q1 & Q2. I mean in Q1 : SFE_EXP_STAGE.CALDATE = add_months(SFE_EXP_STAGE_1.CALDATE, 12). In Q2 : SFE_EXP_STAGE_1.CALDATE = add_months(SFE_EXP_STAGE.CALDATE, 12).

Do we need to change the CALDATE joining condition in Q2?

2. In Q1 output, I am selecting every column from SFE_EXP_STAGE except PY columns. PY_TIMES_EXPOSED = SFE_EXP_STAGE_1.TIMES_EXPOSED

3. In Q2 output, I am selecting every column from SFE_EXP_STAGE_1 except PY columns. Here PY_TIMES_EXPOSED = SFE_EXP_STAGE.TIMES_EXPOSED. I am mapping TIMES_EXPOSED, PLAN_EXPOSURE(measure columns) to 0 here.

I didn't do anything else. Selecting Distinct in 'DISTINCT' query transform. I still didn't get the Case 3 record. Means for example, I didn't get a record in Mar 2016 with PY values for a record which existed in Mar 2015 but don't exist in Mar 2016. I have to get this 2016 record with all measure values 0 except PY columns.

Please help.

former_member187605
Active Contributor
0 Kudos

You must use exactly the same condition in both joins. Obviously.

Use SFE_EXP_STAGE.CALDATE = add_months(SFE_EXP_STAGE_1.CALDATE, 12) twice.

sksaxena
Participant
0 Kudos

oh! But Dirk, don't we give Left_table.field = Right_table.field in join conditions? Request you to please explain once this. Why are we using same condition in  two joins when our left and right table interchange.

former_member187605
Active Contributor
0 Kudos

Because that's exactly the condition you would use in an inner, a left outer and a right outer join (you want at least the common records in all 3 cases, don't you?). And that's how you simulate a full outer join:

select l.*, r.* from l

full outer join r

on l.id = r.id

and add_months(r.dat,12) = l.dat

is totally equivalent to

select l.*, r.* from l

left outer join r

on l.id = r.id

and add_months(r.dat,12) = l.dat

union

select l.*, r.* from r

left outer join l

on l.id = r.id

and add_months(r.dat,12) = l.dat

In DS design, that means 2 left outer joins with exactly the same condition, and a merge followed by a distinct (=equivalent to the union).

sksaxena
Participant
0 Kudos

Thanks for elaborate reply

As mentioned in my below reply, full outer join also didn't work out. Can you please help in what we are still missing?

former_member187605
Active Contributor
0 Kudos

Some general "Best Practice" advice:

1/. Never use a sql function call in a mapping. This will generate a round-trip to the database for every record that passes thru the data flow.

2/ Use lookup instead of lookup_ext whenever possible. It's slightly more efficient. And, most importantly, use the correct cache option (PRE_LOAD_CACHE or DEMAND_LOAD_CACHE, to avoid database round-trips, too)!

3/. Test your logic on a limited set of records, not on the full input data set. Until you've got it right. If it works for 10 records, it will probably work for all.

4/. Try to read and understand my answers to a post, first. Chances are high I am giving the right solution .

sksaxena
Participant
0 Kudos

Really infromative reply!

Dirk, in my case, I am using lookup_ext as I have multiple conditional expressions. This lookup will be performed for around 6 million records. My lookup table has approx 8 Million records. and I am using PRE_LOAD_CACHE. Is it fine or should I go for DEMAND_LOAD_CACHE?

former_member187605
Active Contributor
0 Kudos

1/. You can use lookup, even with multiple conditional expressions. The function wizard generates one pair only, but you can manually add as many as you like.

Cf. SAP Data Services Reference Guide, section 6.3.91 lookup:

"

Syntax: lookup (<lookup_table>, <result_column>, <default_value>, <cache_spec>,<compare_column>, <expression>)

"

and

"

Note: You can specify more than one <compare_column> and <expression> pair—simply add additional pairs at the end of the function statement. The values must match for all specified pairs in order for the lookup function to find a matching row.

"

2/. If your lookup is 6M records, use DEMAND_LOAD_CACHE.

3/. If you want to speed up your data flow processing, your only option would be to use a left outer join, pushed down to the database. If your data flow is complex, use a Data_Transfer transform to store the result of previous transformations in a temporary table in the same database as your lookup table. Don't use a lookup function anymore, but left-outer-join your temp table with the (lookup) table (using exactly the same conditions as you used in the lookup). If your target table is in the same database, too, the full logic will be puhed down to the database (and typically run much faster).

Cf. for more details on these techniques.

sksaxena
Participant
0 Kudos

Thanks Dirk for the suggestion!  Though I wasn't able to figure out the syntax so easily from your reply but finally figured out by hit n try. So here's the exact way to write multiple conditional expressions :


ifthenelse(CALDATE > '2015-03-31',

lookup(DRL_ORACLE_Prod.DRL_MIS.SFE_EXP_F_BODS, TIMES_EXPOSED, NULL, 'PRE_LOAD_CACHE',

CALDATE, add_months(Q_LOOKUP2.CALDATE, -12),

HQ_CODE, Q_LOOKUP2.HQ_CODE,

DIVISION_NAME, Q_LOOKUP2.DIVISION_NAME,

SFE_BRAND, Q_LOOKUP2.SFE_BRAND,

SPECIALITY_NAME, Q_LOOKUP2.SPECIALITY_NAME,

CLASSIFICATION, Q_LOOKUP2.CLASSIFICATION),

NULL)

However the performance impact was not visibly different than using lookup_ext().

DEMAND_LOAD_CACHE option makes job to get stuck and it starts processing 100 records/5 secs. ONLY PRE_LOAD_CACHE drives job to completion either with lookup or lookup_ext. However there's a very strange thing happening and it is missing few lookup values to be captured.

Now the only solution to be tried in my case seems to use Left outer join. Let me try that!

sksaxena
Participant
0 Kudos

Hi ,

I am unable to implement my logic by using either of methods now. Even Left join is not working and fetching all incorrect values.

Let's for say I have a table X with all information in fields like CALDATE, TIMES_EXPOSED etc. It has PY_TIMES_EXPOSED column as nulls till now.

I take another dataflow, in which I take table 'X' as source two times. I join the these two on these conditions :

X.CALDATE = add_months(X_1.CALDATE, -12) and

X.HQ_CODE = X_1.HQ_CODE and

X.DIVISION_NAME = X_1.DIVISION_NAME and

X.SFE_BRAND = X_1.SFE_BRAND and

X.SPECIALITY_NAME = X_1.SPECIALITY_NAME and

X.CLASSIFICATION = X_1.CLASSIFICATION


Now I fetch all columns from X table except PY_TIMES_EXPOSED. This I fetch from X_1 table. This is what you suggested right?


I am getting all wrong values of <PY> column everywhere.


As I have input data ONLY APR 2014 onwards till present month, I MUST NOT get any value in PY column for dataset between APR 2014 till MAR 2015. Right? It should return nulls in PY column for these dates. But this logic is inserting some value in these records also - which should NOT be the case.


PLEASE give me correct logic to implement my requirement as it is becoming a showstopper now. - Please take a look and help.


Thanks in sdvance!

former_member187605
Active Contributor
0 Kudos

Your join condition is wrong. You must add a year, not subtract!

Use X.CALDATE = add_months(X_1.CALDATE, 12).

sksaxena
Participant
0 Kudos

Hey Dirk,

But I want an year's present value and it's PREVIOUS year value. Why to use 12 and not -12 then?

In your first reply in this thread, you said a condition with -12 only to use..

Can you please explain?

former_member187605
Active Contributor
0 Kudos

No need to argue. Just do it .

And if you really insist on using -12, do it this way:

X_1.CALDATE = add_months(X.CALDATE, -12)

sksaxena
Participant
0 Kudos

Hi Dirk,

I tried with 12 only. Tried two times but job terminated after loading hardly 5,30,000 records with this error :

ODBC data source <MIS_ORACLE> error message for operation <SQLExecute>: <[SAP BusinessObjects][ODBC Oracle Wire Protocol

8744 1356 DBS-070401 5/28/2016 12:05:56 AM driver][Oracle]ORA-01653: unable to extend table DRL_MIS.T5 by 1024 in tablespace USERS>.|<Regular Load Operations>|<T5>

I something bad in logic again?

former_member187605
Active Contributor
0 Kudos

No, the logic is correct. This is a database error, USERS tablespace is full. Is T5 your target table?

Check with your dba team. Either extend the USERS tabespace or create your table in a tablespace with more free storage.

0 Kudos

Hi Samarth,

It is very simple. What ever column (I mean Source column) you are mapping the TIMES_EXPOSED (4TH COLUMN in target) use the same source column for the 5th column (PY_TIMES_EXPOSED) WITH extra logic.

I think it will work.

Thank & Regards

Abdul Majeed Mohammed

former_member187605
Active Contributor
0 Kudos

Include SFE_EXP_BODS_F twice as a source in your data flow. Use a left outer join. Join condition:

     SFE_EXP_BODS_F. CALDATE = add_months(SFE_EXP_BODS_F_1. CALDATE, -12)

sksaxena
Participant
0 Kudos

Hey Dirk,

Thanks for the solution. However, I didn't understand it.

SFE_EXP_BODS_F is my target table which is getting filled. Request you to please elaborate it. It would be really appreciable.

Thanks