Continue with my Previous Posts:
Part 1: Incremental Load
In this approach, we maintain hash value in destination table; create hash value during transformation on source rows.
Figure:
Package Screen Shot
Figure:
(DFT) Load Data Into Destination Table
Source
Sql in current example, That can either be based on CreateDate and
LastModifiedDate, to get only new and modified
rows from source:
SELECT
[CustomerAlternateKey] --
this is our business key (BK)
,[Title]
,[FirstName]
,[MiddleName]
,[LastName]
,[BirthDate]
,[MaritalStatus]
,[Suffix]
,[Gender]
,[EmailAddress]
,[YearlyIncome]
,[TotalChildren]
,[NumberChildrenAtHome]
,[EnglishEducation]
,[EnglishOccupation]
,[HouseOwnerFlag]
,[NumberCarsOwned]
FROM [dbo].[DimCustomer];
Destination
table in current example: (Destination table has Surrogate Key column as well
as Hash column)
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MyDestination]')
AND type
in (N'U'))
BEGIN
CREATE TABLE [dbo].[MyDestination](
[SK_Customers] [int] IDENTITY(1,1) NOT NULL,
[CustomerAlternateKey] [nvarchar](15) NOT NULL,
[Title] [nvarchar](8) NULL,
[FirstName] [nvarchar](50) NULL,
[MiddleName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NULL,
[BirthDate] [date] NULL,
[MaritalStatus] [nchar](1) NULL,
[Suffix] [nvarchar](10) NULL,
[Gender] [nvarchar](1) NULL,
[EmailAddress] [nvarchar](50) NULL,
[YearlyIncome] [money]
NULL,
[TotalChildren] [tinyint]
NULL,
[NumberChildrenAtHome] [tinyint]
NULL,
[EnglishEducation] [nvarchar](40) NULL,
[EnglishOccupation] [nvarchar](100) NULL,
[HouseOwnerFlag] [nchar](1) NULL,
[NumberCarsOwned] [tinyint]
NULL,
[DWH_Hash] [char](66) NULL,
SCD_StartDate DateTime Null,
SCD_EndDate DateTime Null,
CurrentFlag Bit NULL,
CONSTRAINT [PK_MyDestination]
PRIMARY KEY CLUSTERED
([SK_Customers] ASC)
);
END
GO
The
original incremental load pattern uses an OLE DB Command in the data flow to
perform the updates. Mostly “OLEDB Command Component” dislike, because it
issues a statement for every single row that passes through the component,
which is horribly slow for large data sets and it can also bloat the
transaction log. Therefore a staging table is created that will hold the update
rows so that a set-based UPDATE statement can be issued in the third task. This
improves performance immensely.
Task
1 ((SQL) Create Update Table): To Create staging table for modified rows, that
can be drop later in task 4
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[UPD_MyDestination]')
AND type in (N'U'))
DROP TABLE [dbo].[UPD_MyDestination];
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[UPD_MyDestination]')
AND type
in (N'U'))
BEGIN
CREATE TABLE [dbo].[UPD_MyDestination](
[SK_Customers] [int] NOT NULL,
[CustomerAlternateKey] [nvarchar](15) NOT NULL,
[Title] [nvarchar](8) NULL,
[FirstName] [nvarchar](50) NULL,
[MiddleName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NULL,
[BirthDate] [date] NULL,
[MaritalStatus] [nchar](1) NULL,
[Suffix] [nvarchar](10) NULL,
[Gender] [nvarchar](1) NULL,
[EmailAddress] [nvarchar](50) NULL,
[YearlyIncome] [money]
NULL,
[TotalChildren] [tinyint]
NULL,
[NumberChildrenAtHome] [tinyint]
NULL,
[EnglishEducation] [nvarchar](40) NULL,
[EnglishOccupation] [nvarchar](100) NULL,
[HouseOwnerFlag] [nchar](1) NULL,
[NumberCarsOwned] [tinyint]
NULL,
[DWH_Hash] [char](66) NULL
);
END
GO
Task
2 ((DFT) Load Data Into Destination Table):
A. Set
the source
B. Add a script task to create a HashByte value
for each row using System.Security.Cryptography class, except primary key columns, that will never be
change. There
we will add the Hash column to the output with string data type and length 66.
C.
Lookup task is configured with destination table as full cache, that’s return
the matched rows (Modified rows) and unmatched rows (New row). Unmatched rows
is directly pushed into destination table.
SELECT SK_Customers,CustomerAlternateKey,DWH_Hash FROM MyDestination;
D. Whereas
Matched rows (Modified rows) can further be differentiating with conditional
split by comparing hash value. And push only those rows where hash values are
not matched with destination.
Task
3 ((SQL) Update Destination Table) Now we have the modified rows in staging
table. That can be directly updated using update statement.
UPDATE dest
SET [Title] = upd.[Title]
,[FirstName] = upd.[FirstName]
,[MiddleName] = upd.[MiddleName]
,[LastName] = upd.[LastName]
,[BirthDate] = upd.[BirthDate]
,[MaritalStatus] = upd.[MaritalStatus]
,[Suffix] = upd.[Suffix]
,[Gender] = upd.[Gender]
,[EmailAddress] = upd.[EmailAddress]
,[YearlyIncome] = upd.[YearlyIncome]
,[TotalChildren] = upd.[TotalChildren]
,[NumberChildrenAtHome]
= upd.[NumberChildrenAtHome]
,[EnglishEducation] = upd.[EnglishEducation]
,[EnglishOccupation] = upd.[EnglishOccupation]
,[HouseOwnerFlag] = upd.[HouseOwnerFlag]
,[NumberCarsOwned] = upd.[NumberCarsOwned]
,DWH_Hash =upd.[DWH_Hash]
FROM [dbo].[MyDestination] dest
JOIN [dbo].[UPD_MyDestination] upd ON dest.SK_Customers = upd.SK_Customers;
Package Name: MSSQLTIPS_Hash.dtsx
Continue with my Next Posts:
Part 4: Incremental Load - Using MERGE JOIN
Part 5: Incremental Load - Using TSQL Merge Function
Part 6: Incremental Load - Using Lookup Component with Customized SCD (Slowly Changing Dimension) Type 2
Continue with my Next Posts:
Part 4: Incremental Load - Using MERGE JOIN
Part 5: Incremental Load - Using TSQL Merge Function
Part 6: Incremental Load - Using Lookup Component with Customized SCD (Slowly Changing Dimension) Type 2
No comments:
Post a Comment