What and Why?? Surroagate keys in your Data Warehouse…

May 17, 2012

Had a chance to discuss and debate on Surrogate keys at work today. SToo many assumptions too many confusions. So its time lets make it clear and put a full stop in place of the question mark.

A surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible to the user or application. (as said by Hall, Owlett and Codd).

A surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user or application. (as said by Wieringa and De Jonge).

OMG this is good enough to carry you off from the real concept. Lets get into some lay men understanding,
The surrogate key was mailnly brought in use to act as a primary key in place of the natural primary key as in the operational database. As in any datawarehouse where there are multiple occurences of the Primary key ID’s used and so to have a unique key to individually refer to each record we make use of the Surrogate Primary key.

As a standard these Surrogate keys are the column of join between the Dimension or Hierarchy tables to the Fact tables, can be visualized as a star schema having one fact table joined to several Dimensions and Hierarchies.

Surrogate_key_blog_banner

Example:
Take an instance where for the employee ‘Emp1’ the Business Unit changes from B1 to B2:
If you used the natural primary key ‘Emp1’ for your employee within your datawarehouse everything would be allocated to Business Unit ‘B2’ even what actualy belongs to ‘B1.’

If you use surrogate keys, you could create on the 17th May a new record for the Employee ‘Emp1’ in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (i.e before 17th May) with the SID of the Employee ‘Emp1’ >> ‘B1.’ All new data (i.e after 17th May) would take the SID of the employee ‘Emp1’ >> ‘B2.’

Leave a comment