on 05-28-2013 7:03 AM
Hi All,
I'm new to BODS, can any one please help in implementing SCD type 3 in BODS
Hi Sabarish,
Difference : SCD2 is unlimited history and SCD3 is limited history.
Explanation:
1- Using SCD2 you can save unlimited history with the help of the Surrogate Key . In this structure of the table will never be effected(constant) only the no of the rows will be effected(will be increased) and to prevent the duplication of data ,primary key will be used .
2-While in SCD3 you can save the limited history. There is no use of Surrogate Key.Since here it depends upon you how much old history/data you want to save (for example: till second last , third last and so on....). As much old data want to save as much no. of columns will be required to add to the table.
Implementation: Can you please define how much old data you want to save ,so that i may try to help you
Regards,
Shiva Sahu
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Sabarish,
Is it mandatory to do it by SCD3 only ?
You can try by SCD2 as well also.
Here i am trying to explain it to you.
Step 1 - First implement SCD2 and get your all data into staging area.
Step 2 - Once you get whole data, you can use gen_row_num_by_group() function to generate the row number for each different group.
Step 3 - Now apply filter for two highest no. for each group.
Step 4 - Check for your expected result .
Reason: The reason why am i saying it to implement with SCD2 is because if later your requirement (let say third last or 100th last... etc) gets change you can easily change the filter condition and get the expected data while in SCD3 you will be required to change the structure(adding or removing of columns) of the table as many times your requirement gets change.
But if you have really a wish to implement through SCD3 , yes.. it can be developed.
Please let me know ,if you need any clarification.
Regards,
Shiva Sahu
Hi Sabarish,
SCD- Slowly Changing Dimensions
SCDs are dimensions that have data that changes over time. The following methods of handling SCDs are available:
v Natural consequence of normalization
v Two states of data are preserved: current and old
For SCD Type 1 change, you find and update the appropriate attributes on a specific dimensional record. The new information simply overwrites the original information. In other words, no history is kept.
Example
|
|
|
|
|
|
After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:
Customer Key | Name | State |
1001 | Christina | California |
Advantages:
This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information
Disadvantages:
All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.
With a Type 2 change, we don’t make structural changes in the table. Instead we add a record. A new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key | Name | State |
1001 | Christina | Illinois |
After Christina moved from Illinois to California, we add the new information as a new row into the table.
Customer Key | Name | State |
1001 | Christina | Illinois |
1005 | Christina | California |
Advantages:
This allows us to accurately keep all historical information.
Disadvantages:
This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
This necessarily complicates the ETL process.
With a Type 3 change, we change the dimension structure so that it renames the existing attribute and add two attributes, one to record the new value and one to record the date of change.
In our example, recall we originally have the following table:
Customer Key | Name | State |
1001 | Christina | Illinois |
|
|
|
|
|
|
|
|
|
|
Advantages:
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.
Hope this is helpful to you.
Thanks,
Richa
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
83 | |
10 | |
10 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.