cancel
Showing results for 
Search instead for 
Did you mean: 

With Stored proc through SQL transform as source, crazy cartesian activity obvious in Monitor

Colm
Explorer
0 Kudos

Simple scenario

Three sql transforms to execute stored procs in sqlserver as source...

query transform to join.....and continue with other stuff

Before anyone answers with why...and push down to data base and so on...not possible, a lot of complexity behind this and job is being run as part of an application splitting the data in to batches running one at a time, large data volumes. much is out of our control.

The problem is that the join is being totally ignored in first instance, from the monitor log running in to billions of rows for data sets in the region of 20,000- 40,000

It then proceeds to apply the simple join and return the correct data and continuing to complete the rest of the batch job without issue, except for the passing of a ridiculous amount of time.

Is there any way to avoid? Any ideas?

Accepted Solutions (1)

Accepted Solutions (1)

former_member187605
Active Contributor
0 Kudos

I am afraid that's the way DS handles in-memory joins (Cache = Yes). It extracts data from its sources, builds the Catesian product and then filters out unnecessary combinations.

This is the main reason I always use a staging area. And have the database deal with the join logic, it's so much better at it.

Answers (1)

Answers (1)

Colm
Explorer
0 Kudos

Forgot to mention, my solution will be, if not possible in Data services to handle the join correctly will be to write to tables at batch level and then read those table to continue which will be pushing down to the database...but will also mean increased maintenance for IT, the underlying applications can change..ie. the stored procs meaning recreation of the tables....simplified but the main reason why I need to avoid creating new tables.