cancel
Showing results for 
Search instead for 
Did you mean: 

how to pushdown

Former Member
0 Kudos

We are using Data Integrator XI 3.0. Our local repository is a Oracle 10g database.

I have a Dataflow which has been running fine for past few years, but fail for this quarter execution.

The dataflow is doing the following:

1. Join table1 (join rank 0, cache, 8M) with table2 (join rank 0, cache, 200K); select distinct; use of ifthenelse(), cast(),ltrim(), rtrim(), and constant $parameter1; where table1=table2.person_id and load_date > $parameter2. Both table are in same datastore1.

2. Split query1 into query_location1 and query_location2

3. Merge query_location1 and query_location2

4. Load to target table3 which is in different datastore2 (Datastore 1 and 2 are host in the same server, different schema)


I have tried to do partial pushdown as following to no success.

1. Join table1 (join rank 2, no cache, 8M) with table2 (join rank 0, cache, 200K); use of cast(),ltrim(), rtrim(), and constant $parameter1; where table1=table2.person_id and load_date > $parameter2. Both table are in same datastore1.

2. Split query1 into query_location1 and query_location2

3. Merge query_location1 and query_location2

4. select distinct; use of ifthenelse()

5. Load to target table3 which is in different datastore2 (Datastore 1 and 2 are host in the same server, different schema)



When I check on the Display Optimized SQL..., it shows SELECT statement where table1=table2.person_id and load_date > $parameter2 only.


What is the bottleneck of current DF design that it is not pushdown? Is it because of the use of parameter in where clause, the Merge Transform, select distinct?

How do I re-design this DF to improve the performance?

Thank you.

Accepted Solutions (1)

Accepted Solutions (1)

former_member187605
Active Contributor
0 Kudos

DS can only pushdown logic from Query transforms under certain conditions.

It's all explained in .

In your case, it's definitely the Query split and Merge that cause issues. They're may be more reasons why: ifthenelse(), cast(),ltrim(), rtrim()... might not be pushed down. Yours is a very old release, pushdown functionality hand performance ave been significantly enhanced and improved over the years.

Former Member
0 Kudos

The DF fails on the first query transform which joins,distinct,and ifthenelse(),ltrim(),rtrim(), as seen from Monitor Log query_location1 and query_location2 did not start yet.

After I move the select distinct and ifthenelse() to newquery4 (after Merge but before load to target table3), it fails halfway in query4.

Either ifthenelse() or select distinct is the bottleneck then. To which, I made following changes: move back ifthenelse() to queryjoin, and keeps select distinct to query4. the DF fails during select distinct.

Does it help to add Data_Transfer (type:Table) before select distinct (after merge transform)?

Even simple join table1 and table2 (only keep the mapping of 1 field to $param1 and where clause person_id and $param2) to target table3 did not become INSERT INTO optimized SQL.

former_member187605
Active Contributor
0 Kudos
The DF fails on the first query transform which joins,distinct,and ifthenelse(),ltrim(),rtrim(), as seen from Monitor Log query_location1 and query_location2 did not start yet. 

I suspect the distinct not being pushed down to the database. I do not recall this being supported in BODI XI 3.0 or not (it's such an old version, you know). You can check if it's present in the generated SQL. The distinct involves an order by that must complete before the next steps in the data flow can start.

And the failure is related to memory issues, isn't it? This is a 32-bit program that hits the 3GB limit rather easily. In dataflow properties, can you set cache type to Pageable?


After I move the select distinct and ifthenelse() to newquery4 (after Merge but before load to target table3), it fails halfway in query4.

By definition distinct is not pushed to database, so order by in DI memory. Same problem as above.


Either ifthenelse() or select distinct is the bottleneck then. To which, I made following changes: move back ifthenelse() to queryjoin, and keeps select distinct to query4. the DF fails during select distinct.

Same problem.


Does it help to add Data_Transfer (type:Table) before select distinct (after merge transform)?

No, it doesn't as long as distinct is not pushed to database.


Even simple join table1 and table2 (only keep the mapping of 1 field to $param1 and where clause person_id and $param2) to target table3 did not become INSERT INTO optimized SQL.

That seems strange to me. This type of logic is definitely pushed down in more recent versions. Why don't you consider an upgrade ?

Best technique to find the reason why is by rebuilding your data flow from scratch. Start very simple, gradually increase the complexity and check each time the SQL generated until no insert into is generated anymore. That will allow you finding the culprit.

Former Member
0 Kudos

Yes, the error message is low on virtual memory. The DF cache type is always set to Pageable.

The generated SQL display SELECT FROM WHERE only.

Based on the Monitor Log, the count will fail wherever Distinct is (last execution I placed it at query4 and it fails at query4). Which is why I am thinking of adding Data_Transfer before query distinct.

Ok, so I rebuild the DF from scratch, 2 source tables, 1 query (just select few column from table1), 1 target, the generated SQL is SELECT, if i tick distinct, the generated SQL is SELECT DISTINCT. Does this mean the culprit is the join?

former_member187605
Active Contributor
0 Kudos

The Data_transfer will split the data flow in two at runtime. In that way it might help indeed.

If I understand you right, select distinct from a join seems to be pushed down properly. Then the culprit is the split after. Check my blog post on full SQL-pushdown again for similar cases and workaround.

Former Member
0 Kudos

I have tried like you said distinct involves order by, so i add q_orderby.

The DF currently is:

1. 2 source tables. q-join : cast(),ltrim_blank(rtrim_blanks()), ifthenelse(isvalid dec (), then to decimal, else null)

2. q_location1 and q_location2

3. Merge

4. Q_orderby

4. q_distinct

5. target


From the Monitor Log, it shows all query transform up to q_orderby are in Proceed 5 Mil (5 Mil is expected rows after join). Thereafter, q_distinct kickoff. Unfortunately, fails due to this error

               Cannot convert data <67>> into type <DECIMAL>. Context: Column <>.

I checked my source data doesn't have '67>', but I do have '67.' I updated this to '67' and it works but fails again with this error

               Cannot convert data <88>> into type <DECIMAL>. Context: Column <>.


I ran ifthenelse(isvalid dec ('88.'), then to decimal, else null) in my development repo, and the target table display output 88.00. Looks like not data issue either.


The difference with the initial original DF (which have been running fine for many years, is the addition of q_orderby, and move the select distinct from q-join to q-distinct.

What could have gone wrong by this change?

former_member187605
Active Contributor
0 Kudos

You didn't have to add an order by. What I meant was that DS will implicitly sort the data in order to calculate the distinct.

I cannot see any direct relationship between your recent changes and the error you're getting now. Have you changed any datatypes anywhere?

Answers (0)