Slowly Changing Dimensions (SCD)
Overview
Slowly Changing Dimensions (SCD) enable an organization to track how dimension attributes change over time. For example, it is possible that an item may be associated with a particular product group code but that it is later reclassified into a different product group. The organization wants to be able to analyze the historical sales data that occurred when the item was assigned to the original product group as well as more recent sales data that has occurred after the item was reclassified to the new product group.
This article will discuss the following:
- Different Types of Slowly Changing Dimensions
- How to Implement Type II Slowly Changing Dimensions
- Example 1: Type II Change with Type I Update All
- Example 2: Type II Change with Enable Deletes
Different Types of Slowly Changing Dimensions
Type I : Overwrite (Does not retain historical data)
Type I dimensions will automatically overwrite old data with updated data from the data source. An example of this would be a change in a customer name.
In the data source, the name for a particular customer is changed from ABC Consulting to Acme Consulting. The next time that the data warehouse is updated the customer name will be changed from ABC Consulting to Acme Consulting and no historical record of the change is kept.
All historical, current, and future transactions will be displayed under the new customer name of Acme Consulting. This is the default methodology of updating data in the data warehouse and no setup it required.
The following example illustrates what the Customer dimension table would resemble if a Type I change occurred.
Original row in Customer dimension table:
Customer No | Name | City | State |
---|---|---|---|
123 | ABC Consulting | Portland | OR |
Updated row in Customer dimension table:
Customer No | Name | City | State |
---|---|---|---|
123 | Acme Consulting | Portland | OR |
The type I response is easy to implement, but it does not maintain any history of prior attribute values.
Type II : Overwrite (Retains historical data)
Type II dimensions will enable the tracking of dimension attributes historically by inserting additional records into the table as the values in specified fields are changed. The Jet Data Manager will administer the tracking of the dimension values as well as the updating of the table. Each record for a particular value such as an item number can be viewed as a different version of this item. The transaction table can then be linked to this table to display which version of the item was associated with the transaction based on the transaction date.
The following example illustrates what the Customer dimension table would resemble using a Type II change.
Customer No | Name | City | State | Version | From Date | To Date |
---|---|---|---|---|---|---|
123 | ABC Consulting | Portland | OR | 1 | 7/1/2009 | 9/18/2012 |
123 | Acme Consulting | Portland | OR | 2 | 9/18/2012 | 12/31/9999 |
The type II response will maintain our history
Hybrid
It is possible to combine the basic Slowly Changing Dimension techniques. Examples of this are given toward the end of the article.
How to Implement Type II Slowly Changing Dimensions
The steps below will explain how to utilize Slowly Changing Dimensions in a Jet Analytics project.
In the example, there will be an item named "Bicycle" that has historically had an Inventory Posting Group of "Finished". Recently, however, this item has been reclassified and is now associated with the Inventory Posting Group "Resale". The organization wishes to track sales for this item under both the historical Inventory Posting Group as well as the new one.
-
Enable and Configure Slowly Changing Dimensions on your dimension table within your staging database
-
Identify the table for which historical changes need to be tracked, right click the table name, and go to Table SettingsIn this example we will use the Item table located in our staging database. The Table Settings window is displayed.
-
Check the box Error under Primary Key Behavior and navigate to the History tab.
-
Next, you will check Enable History and navigate to the Performance tab.
There, you will need to check Enable Physical Valid Table and navigate to the Data Extraction tab.
The last step will be to uncheck the Truncate Valid Table Before Data Cleansing box.
-
The table will now display an " H " inside of the table icon denoting that historical tracking has been enabled for this table. Expanding the table will show a node named " History Settings ".
-
Clicking on the " History Settings " node will display the Slowly Changing Dimensions configuration options on the right-hand side of the screen. This screen is broken down into three sections:
- Natural Key: Select all the fields that represent the natural key of the table. In the current example this would be the "DW_Account" and "No." fields.
- Type I: Check boxes should be selected for all fields which should not have history tracking enabled. If values in these fields are changed in the data source the current record values will be overwritten by the new values when the table is next executed.
- Type II: Check boxes should only be selected for all fields which should have history tracking enabled.
-
The table should now be deployed and executed so that the initial history data can be stored.
A primary key comprising at least one field must be set on the table in order for the table to deploy and execute properly. The primary key fields for a table can be set by right-clicking the desired fields within the table and selecting "Include in Primary Key". This can be done for multiple fields in the same table.
The screenshot below illustrates what the Item table currently looks like. There is one record for the item and currently the Inventory Posting Group is set to "Finished".
The item is now reclassified into the Inventory Posting Group Code of "Resale"
Notice how there are now two records. One for the old posting group and one for the new posting group.
-
There are a few more fields that pertain to the historical values that are useful as well. When the table is executed and notices a change in one of the Type II fields it will automatically add in the dates for which the old value ended and the new value begins. These are the "SCD From Datetime", "SCD To Datetime", and "SCD "IsCurrent" fields.
The To and From field represent the date ranges that this version of the dimension was used in and which record is the current dimension value.
-
-
Creating the Surrogate Key
In a standard transaction table the transaction itself will only be linked to the Item No. The problem with this is that the item number alone does not provide any indication as to which version of the item record the transaction applies to. In order to see this detail the surrogate key from the Item table will be brought into the transaction table based on the To and From dates in the Item table as compared with the Posting Date of the transaction. Based on the screenshot above, all transactions between January 1, 1900 and September 25, 2012 will be associated with the first version of the Bicycle item where the Inventory Posting Group is "Finished". Any transaction after September 25, 2012 will be associated with the latest version of the Bicycle item where the Inventory Posting Group is "Resale".
A surrogate key is a substitution for the primary key in a table. The surrogate key most often represents the unique row number in the table and can be used in one table to refer back to a specific record in another table without having to utilize the natural primary key. In Jet Enterprise, all tables in the staging database and data warehouse have a field named "DW_ID" which represents the surrogate key in each respective table.
In order to see the DW_ID field in the Jet Data Manager the user must right-click the table and go to Advanced → Show System Control Fields.
In the screenshot below, the DW_ID field has been moved from the Item table and added to the transaction table. The field has then been renamed to "Item Surrogate Key" to make it easily understandable to other users. Standard joins have been added between the two tables for DW_Account and the Item No. Additional joins are added for "SCD From Date Time" Less Than or Equal to "DW_TimeStamp" and "SCD To Date Time" Great Than or Equal to " DW_TimeStamp ". This will capture the correction version of the item based on the Posting Date of the transaction.
The transaction table is then deployed and executed so that the new field is added and populated.
The process above should be repeated for any additional transaction tables for which history needs to be tracked.
-
Move the Surrogate Key from Transaction Table in Staging Database to Data Warehouse
Now that the surrogate key has been added to the transaction table(s) this field needs to be added to the relevant transaction tables in the data warehouse. This is accomplished by simply dragging the surrogate key field (in this case "Item Surrogate Key") from the table(s) in the staging database and dropping them onto the relevant tables on the data warehouse.
Deploy and execute the transaction table in the data warehouse for these changes to be propagated to the SQL tables.
-
Add Surrogate Key to Dimension Table in Data Warehouse
The DW_ID field now needs to be added to the related dimension table in the data warehouse. This ensures that the proper mapping will be made between the dimension table and the transaction table in the cubes. To do this simply drag the DW_ID field from the dimension table in the staging data (in this example the Item table) to the related dimension table in the data warehouse (in this example the Item table).
Deploy and execute the dimension table for the changes to be propagated to the SQL database.
-
Update Dimension Key
The dimension key must now be updated to include this surrogate key. This will ensure that Analysis Services sees the uniqueness of the dimension not as the natural key (Item No.) but as the combination of Item No. and the surrogate key.
On the Cubes tab locate the dimension under the Dimensions node and edit the key level ("Item").
To the right of the Key Column click the ellipsis (...) and replace the No field with the surrogate key to the dimension key (in this example it is the "Item Surrogate Key" field). Click Ok..
-
Update the Dimension Relationships in the Cube
The relationships between the dimension and the transaction table must be updated in the cube to reflect the change made to the dimension key in the previous step.
Right-click the relevant dimension in the cube(s) and select Dimension Relations → All Fact Tables. Set the dimension relationship to use the surrogate key that was added to the transaction table in a previous step.
Deploy and execute the OLAP database on the cubes tab for the changes to take effect.
The final result is that users can see data based on the historical attributes that may no longer exist in their ERP system because the information has been overwritten. In the screenshot below the "Bicycle" item shows up twice with the sales amounts associated with the various Inventory Posting Groups that have been used for the item over time.
Examples
Let's look at a couple of examples implementing the steps listed above.
Example 1: Type II Change with Type I Update All
There may be times when you need to perform a Type II change in conjunction with a Type I change that updates all prior version of said record. This means all historical records will be overwritten (Type I) for the attributes that you specify.
Example Situation:
Our organization wants to track the historical change related to which salesperson is associated with a customer account. Said another way if a new salesperson gets assigned to a customer account we want to see a new row reflecting this change (Type II). This will allow us to see which salesperson(s) where associated with the account historically. At the same time if other attributes about our customer change (customer name, city, state, etc.) then we want this to be reflected on each row that belongs to the customer (all historical records) (Type I) as opposed to simply overwriting the current record.
In this example we will be enabling history with regards to our customers. If a new salesperson is assigned to the account we want that change to be tracked. If the customers name changes we want to reflect this across all records.
Enable Slowly Changing Dimensions
Identify the table for which historical changes need to be tracked, right click the table name, and go to Advanced > Advanced Settings.
-
Select the box to Enable Slowly Changing Dimensions and Type I - Update All, click OK.
If a red "X" appears on the right-hand side of the Advanced Settings box it indicates that required conditions are not currently met that are necessary for Slowly Changing Dimensions to be utilized. Placing the mouse over the red "X" will provide a dialog explaining which requirements need to be enabled.
-
The table will now display an "H" inside of the table icon denoting that historical tracking has been enabled for this table. Expanding the table will show a node named "History Settings".
-
Clicking on the "History Settings" node will display the Slowly Changing Dimensions configuration options on the right-hand side of the screen. This screen is broken down into three sections:
- Natural Key: Select all the fields that represent the natural key of the table. In the current example this would be the "DW_Account" and "No." fields.
- Type I: Check boxes should be selected for all fields which should not have history tracking enabled. If values in these fields are changed in the data source all the old values will be overwritten by the new values when the table is next executed.
- Type II: Check boxes should only be selected for all fields which should have history tracking enabled.
-
The table should now be deployed and executed so that the initial history data can be stored.
A primary key comprising at least one field must be set on the table in order for the table to deploy and execute properly. The primary key fields for a table can be set by right-clicking the desired fields within the table and selecting "Include in Primary Key". This can be done for multiple fields in the same table.
The screenshot below illustrates what the customer table currently looks like. The table only has a single customer called ABC Corporation. The current salesperson associated with the customer is Linda Martin.
Now lets say a new salesperson (Bart Duncan) gets assigned to this customer.
Notice that a Type II change was made. A new row was added to the table and the old record has been preserved. Also notice that the SCD Is Current field has a value of 1 for the new record and 0 for the old record.
Now let's say the customer has decided to change their name. They no long want to be known as ABC Corporation, but rather XYZ Corporation.
Notice how both records reflect the new name change. This is the Type I -Update All change in action. With a normal Type I change only the current record would have been updated, not both.
Example 2: Type II Change with Enable Deletes
Example Situation:
A Type II change with "Enable Deletes" allows us to keep track of which records have been removed from the source database. This example will be a continuation of the example above.
-
Enable Slowly Change Dimensions
If a red "X" appears on the right-hand side of the Advanced Settings box it indicates that required conditions are not currently met that are necessary for Slowly Changing Dimensions to be utilized. Placing the mouse over the red "X" will provide a dialog explaining which requirements need to be enabled.
The table will now display an "H" inside of the table icon denoting that historical tracking has been enabled for this table.
-
Expanding the table will show a node named "History Settings".
-
The table should now be deployed and executed so that the initial history data can be stored.
A primary key comprising at least one field must be set on the table in order for the table to deploy and execute properly. The primary key fields for a table can be set by right-clicking the desired fields within the table and selecting "Include in Primary Key". This can be done for multiple fields in the same table.
The screenshot below illustrates what the customer table currently looks like. Notice that we have a new field called SCD Is TombStone. This field is used to track which records have been removed from the source database. If SCD Is TombStore = 0 then the record has not been deleted from the source.
Let's say we delete record 2 from our database. Notice how the TombStone field displays a 1. This indicates that the record has been deleted from the source system.