DataFlow Tasks - Source and Destination Connections
Excel Source
Flat File Source
OLEDB Source
ODBC Source
XML Source
Raw File Source: Retrieve data from binary file, This component is one of the only components that does not use a Connection Manager.
Excel Destination
Flat File Destination
OLEDB Destination
ODBC Destination
Partition Processing:
Dimension Processing:
Raw File Destination:
Record-set Destination
Sql Server Destination
DataFlow Tasks - Data Transformation
Row Transformations - It transforms each row present in the pipeline, Comes under synchronous (non-blocking transformation)
- Character Map: Convert String into Lower case / Upper Case / Different supported language within the same column or new column
- Reference: http://sqlage.blogspot.in/2013/08/ssis-how-to-use-ole-db-command.html
- Data Conversion: Perform Casting, Used to convert the data type of a column, It create a new output column with the converted datatype.
- Copy Column: Copies columns to new output columns.
- Derived Column: Used to create a calculated column in the output, also can be used to convert the datatype of a column. With in the expression, we can used system/user variables as well as list of columns. It create a new output column as well as replace the existing column.
- OLE DB Command: Executes a command against a connection manager for each row. This transformation can behave as a destination
- Split and join Transformations - Distribute rows into different output or join multiple output into one. Comes under synchronous (non-blocking or semi blocking transformation)
- Conditional Split: Split data based on different conditions configured. As similar as Case When statement. Return multiple output, based on number of conditions defined into conditional split task.
- Reference: https://www.simple-talk.com/sql/ssis/ssis-basics-using-the-conditional-split/
- Reference: http://oakdome.com/programming/SSIS_ConditionalSplit_Multicast.php
- Lookup: Lookup is used to access additional information in a related table, based on the values in common join columns. Three types of caching options available - Full cache (The referenced dataset is generated and loaded into cache before the Lookup transformation is executed), Partial cache (The referenced dataset is generated when the Lookup transformation is executed, and the dataset is loaded into cache) and No cache (The referenced dataset is generated when the Lookup transformation is executed, but no data is loaded into cache). Lookup source can be configured "Cache Connection Manager" (A pre-defined cache for example set of hardcoded values) and OLE DB Connection Manager (A table stored in the database). The output of lookup can be "Matched Rows" (Return rows that are matched in the referenced table), "Unmached Rows" (Return rows that are not found in referenced table), Error output (returns if any error occur).
- Reference: http://www.dwbiconcepts.com/etl/29-etl-ssis/144-ssis-lookup-transform.html
- Reference: https://www.simple-talk.com/sql/ssis/implementing-lookup-logic-in-sql-server-integration-services/
- Union All / Merge: Merge is as similar as union all. the main difference between merge or union transformation is the merge is used to merge 2 sorted inputs and return as a single sorted output. where as union all transformation can takes more than 2 inputs, no need to be sorted and return single unsorted output. the columns list and there datatype should be match for each input.
- Merge Join: Merge Join transformation is used to join 2 sorted resultset via Inner / Left / Right join. As similar as joins in Tsql.
- Reference: https://www.simple-talk.com/sql/ssis/ssis-basics-using-the-merge-join-transformation/
- MultiCast: The Multicast transformation distributes its input to multiple similar outputs, that means using this we can create many copies of the input dataset. It has one input and multiple outputs. but does not support an error output
- Fuzzy Lookup: The Fuzzy Lookup transformation performs data cleaning tasks such as standardizing data, correcting data, and providing missing values. This transformation differs from the Lookup transformation in its use of fuzzy matching. The Lookup transformation uses an equi-join to locate matching records in the reference table. It returns either an exact match or nothing from the reference table. In contrast, the Fuzzy Lookup transformation uses fuzzy matching to return one or more close matches from the reference table.
- Reference: http://www.codeproject.com/Tips/528243/SSIS-Fuzzy-lookup-for-cleaning-dirty-data
- Fuzzy Grouping: http://ssis-tutorial-online.blogspot.in/2013/04/fuzzy-grouping-transformation.html
- Term Lookup: Available only in enterprise edition. Term lookup is basically used to search set of text records, within a particular table, input is a text column (nVarChar, nText) and output is occurance count of that text row by row basis.
- Reference: http://www.codeproject.com/Tips/574437/Term-Lookup-Transformation-in-SSIS
- Reference: http://www.codeproject.com/Tips/528243/SSIS-Fuzzy-lookup-for-cleaning-dirty-data
- Term Extraction: Available only in enterprise edition. Term Extraction extract list of words from a simple text and retrun no of occurance each word with in a text. This transformation uses its own english dictionary, so if a word is out from this dictionary then it will not be consider.
- Reference: http://www.packtpub.com/article/term-extraction-tasks-in-sql-server-integration-services
Rowset Transformations – These transformations are also called Asynchronous (Blocking Transformation). Stores all the rows into the memory before it begins the process of modifying input data to the required output format.
Pivot
UnPivot
Row Sampling
Percentage Sampling
Aggregate
Sort
No comments:
Post a Comment