Microsoft Business Intelligence And TSQL: Part 2: Incremental Load

Continue with my Previous Posts:

Part 1: Incremental Load

In this approach, we maintain hash value in destination table; create hash value during transformation on source rows.

Figure: Package Screen Shot

Figure: (DFT) Load Data Into Destination Table

Source Sql in current example, That can either be based on CreateDate and LastModifiedDate, to get only new and modified rows from source:

SELECT

[CustomerAlternateKey] -- this is our business key (BK)

,[Title]

,[FirstName]

,[MiddleName]

,[LastName]

,[BirthDate]

,[MaritalStatus]

,[Suffix]

,[Gender]

,[EmailAddress]

,[YearlyIncome]

,[TotalChildren]

,[NumberChildrenAtHome]

,[EnglishEducation]

,[EnglishOccupation]

,[HouseOwnerFlag]

,[NumberCarsOwned]

FROM [dbo].[DimCustomer];

Destination table in current example: (Destination table has Surrogate Key column as well as Hash column)

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MyDestination]')

AND type in (N'U'))

BEGIN

CREATE TABLE [dbo].[MyDestination](

[SK_Customers] [int] IDENTITY(1,1) NOT NULL,

[CustomerAlternateKey] [nvarchar](15) NOT NULL,

[Title] [nvarchar](8) NULL,

[FirstName] [nvarchar](50) NULL,

[MiddleName] [nvarchar](50) NULL,

[LastName] [nvarchar](50) NULL,

[BirthDate] [date] NULL,

[MaritalStatus] [nchar](1) NULL,

[Suffix] [nvarchar](10) NULL,

[Gender] [nvarchar](1) NULL,

[EmailAddress] [nvarchar](50) NULL,

[YearlyIncome] [money] NULL,

[TotalChildren] [tinyint] NULL,

[NumberChildrenAtHome] [tinyint] NULL,

[EnglishEducation] [nvarchar](40) NULL,

[EnglishOccupation] [nvarchar](100) NULL,

[HouseOwnerFlag] [nchar](1) NULL,

[NumberCarsOwned] [tinyint] NULL,

[DWH_Hash] [char](66) NULL,

SCD_StartDate DateTime Null,

SCD_EndDate DateTime Null,

CurrentFlag Bit NULL,

CONSTRAINT [PK_MyDestination] PRIMARY KEY CLUSTERED

([SK_Customers] ASC)

);

END

The original incremental load pattern uses an OLE DB Command in the data flow to perform the updates. Mostly “OLEDB Command Component” dislike, because it issues a statement for every single row that passes through the component, which is horribly slow for large data sets and it can also bloat the transaction log. Therefore a staging table is created that will hold the update rows so that a set-based UPDATE statement can be issued in the third task. This improves performance immensely.

Task 1 ((SQL) Create Update Table): To Create staging table for modified rows, that can be drop later in task 4

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[UPD_MyDestination]')

AND type in (N'U'))

DROP TABLE [dbo].[UPD_MyDestination];

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[UPD_MyDestination]')

AND type in (N'U'))

BEGIN

CREATE TABLE [dbo].[UPD_MyDestination](

[SK_Customers] [int] NOT NULL,

[CustomerAlternateKey] [nvarchar](15) NOT NULL,

[Title] [nvarchar](8) NULL,

[FirstName] [nvarchar](50) NULL,

[MiddleName] [nvarchar](50) NULL,

[LastName] [nvarchar](50) NULL,

[BirthDate] [date] NULL,

[MaritalStatus] [nchar](1) NULL,

[Suffix] [nvarchar](10) NULL,

[Gender] [nvarchar](1) NULL,

[EmailAddress] [nvarchar](50) NULL,

[YearlyIncome] [money] NULL,

[TotalChildren] [tinyint] NULL,

[NumberChildrenAtHome] [tinyint] NULL,

[EnglishEducation] [nvarchar](40) NULL,

[EnglishOccupation] [nvarchar](100) NULL,

[HouseOwnerFlag] [nchar](1) NULL,

[NumberCarsOwned] [tinyint] NULL,

[DWH_Hash] [char](66) NULL

);

END

Task 2 ((DFT) Load Data Into Destination Table):

A. Set the source

B. Add a script task to create a HashByte value for each row using System.Security.Cryptography class, except primary key columns, that will never be change. There we will add the Hash column to the output with string data type and length 66.

C. Lookup task is configured with destination table as full cache, that’s return the matched rows (Modified rows) and unmatched rows (New row). Unmatched rows is directly pushed into destination table.

SELECT SK_Customers,CustomerAlternateKey,DWH_Hash FROM MyDestination;

D. Whereas Matched rows (Modified rows) can further be differentiating with conditional split by comparing hash value. And push only those rows where hash values are not matched with destination.

Task 3 ((SQL) Update Destination Table) Now we have the modified rows in staging table. That can be directly updated using update statement.

UPDATE dest

SET [Title] = upd.[Title]

,[FirstName] = upd.[FirstName]

,[MiddleName] = upd.[MiddleName]

,[LastName] = upd.[LastName]

,[BirthDate] = upd.[BirthDate]

,[MaritalStatus] = upd.[MaritalStatus]

,[Suffix] = upd.[Suffix]

,[Gender] = upd.[Gender]

,[EmailAddress] = upd.[EmailAddress]

,[YearlyIncome] = upd.[YearlyIncome]

,[TotalChildren] = upd.[TotalChildren]

,[NumberChildrenAtHome] = upd.[NumberChildrenAtHome]

,[EnglishEducation] = upd.[EnglishEducation]

,[EnglishOccupation] = upd.[EnglishOccupation]

,[HouseOwnerFlag] = upd.[HouseOwnerFlag]

,[NumberCarsOwned] = upd.[NumberCarsOwned]

,DWH_Hash =upd.[DWH_Hash]

FROM [dbo].[MyDestination] dest

JOIN [dbo].[UPD_MyDestination] upd ON dest.SK_Customers = upd.SK_Customers;

Reference:

http://www.mssqltips.com/sqlservertip/3170/using-hash-values-in-ssis-to-determine-when-to-insert-or-update-rows/

Package Name: MSSQLTIPS_Hash.dtsx

Continue with my Next Posts:

Part 3: Incremental Load - Using Lookup Component

Part 4: Incremental Load - Using MERGE JOIN
Part 5: Incremental Load - Using TSQL Merge Function
Part 6: Incremental Load - Using Lookup Component with Customized SCD (Slowly Changing Dimension) Type 2

Microsoft Business Intelligence And TSQL

Monday, March 2, 2015

Part 2: Incremental Load - Using Lookup Component and Hash Function

No comments:

Post a Comment