Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
muthuram_shanmugavel2
Contributor

I come across some Joins scenarios, where I have learnt few Best Practices, which I would like to share.

1. Rank Node (Ex: Get Distinct Records By Latest Date) before Joins:


Two Tables are there and Table names are CUSTOMER and EMAIL_SUBSCRIPTION.

CUSTOMER Table contains the columns - Customer_ID and Email_Address of them.

EMAIL_OPT_IN Table contains the columns - EMAIL_Address, Subscription_Status and Modified_On.


TABLE:  CUSTOMERTABLE:  EMAIL_SUBSCRIPTION

Subscription Status 1 Means "YES" and 0 Means "No"


Requirement:

To Get CURRENT_SUBSCRIPTION_STATUS For Each CUSTOMER.

Solution:



Wrong Approach



Correct Approach


In Dimension Calculation view,

Two tables are directly included in Join node.



Select the required fields and activate the calculation view.

Data Preview of Calculation View is,



This is giving Wrong results.

Customers - C1 and C2 both has subscription status -

Yes and No.

We could not find the Current subscription status of the Customer.


So before joining these two tables,

Current subscription status of Email has to be find with Latest Modified_On date and This output has to be connected to Customers Table in Join node.

In Dimension Calculation view,

Include Rank node in Calculation view.

Add EMAIL_SUBSCRIPTION Table into it.

In Rank node, Enter the following properties.

Sort Direction Descending (Top N) on

MODIFIED_ON Column and

Threshold as 1.

So It will filter out the records based on Latest Modified On and

give distinct records with latest timestamp.

Next, In Join node,

Add Customer Table and RANK_1 Output.

Select the required fields and activate the calculation view.

Let's View the Final output of Calculation view.

It is giving the Correct results.


Email Subscription status Of Customer C1 is NO

Email Subscription status Of Customer C2 is YES


Before Joins, Ranking data is giving Distinct Records By Latest Time stamp. So This approach is giving correct results.

2. Aggregate Table before Joins:


Two Fact Tables are there. Table names are Planning_Sales and Actual_Sales.

Both Tables contain 3 columns - Customer_Id and Product_Id and Sales.

TABLE: PLANNING_SALESTABLE: ACTUAL_SALES

Requirement:

PLANNING and ACTUAL Sales by Product wise.

Solution:



Wrong Approach



Correct Approach


In Calculation view,

Two Tables are directly included in Join node.


Select the required fields and activate the calculation view.

Data Preview of Calculation View is,



This is giving Wrong results.

Lets check the Output at Join node level.



The Cause of the Problem is,


Both the products P1 and P2 have two entries in

Planning_Sales and Actual_Sales Table.

While Joins between these two tables,

Each record in PLANNING table looks Each record in ACTUAL table.


So the Rows_Count is doubled and After Aggregation is giving wrong results.


To avoid this Problem,

First Sales has to be aggregated by Product wise for

both PLANNING table and ACTUAL table,

Then It has to be joined.


In Calculation View,

Planning_Sales Table is Included in

Aggregation Node_1 and

Sales Aggregated by Product wise.

Actual_Sales Table is Included in

Aggregation Node_2 and

Sales Aggregated by Product wise.


Then In Join Node, Two Aggregated Outputs -

Aggregation_1 and Aggregation_2 are Joined.



In Join node, Join Between Two Aggregation Nodes,



Select the required fields and activate the calculation view.

Let's View the Final output of Calculation view.



This is giving Correct results.


Joins before aggregation is giving distinct records with aggregated values.

So This approach is giving correct results.


Regards,

Muthuram

7 Comments
Labels in this area