on 05-06-2016 2:39 AM
Simple scenario
Three sql transforms to execute stored procs in sqlserver as source...
query transform to join.....and continue with other stuff
Before anyone answers with why...and push down to data base and so on...not possible, a lot of complexity behind this and job is being run as part of an application splitting the data in to batches running one at a time, large data volumes. much is out of our control.
The problem is that the join is being totally ignored in first instance, from the monitor log running in to billions of rows for data sets in the region of 20,000- 40,000
It then proceeds to apply the simple join and return the correct data and continuing to complete the rest of the batch job without issue, except for the passing of a ridiculous amount of time.
Is there any way to avoid? Any ideas?
I am afraid that's the way DS handles in-memory joins (Cache = Yes). It extracts data from its sources, builds the Catesian product and then filters out unnecessary combinations.
This is the main reason I always use a staging area. And have the database deal with the join logic, it's so much better at it.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Forgot to mention, my solution will be, if not possible in Data services to handle the join correctly will be to write to tables at batch level and then read those table to continue which will be pushing down to the database...but will also mean increased maintenance for IT, the underlying applications can change..ie. the stored procs meaning recreation of the tables....simplified but the main reason why I need to avoid creating new tables.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.