Thursday, November 3, 2016

Best Practice for Dimensional Modeling



Best Practice for Dimensional Modeling



1. Before designing Dimensional modeling, it should be keep in mind, that Datawarehouse should be fast and secure able. No matter it is ROLAP / MOLAP or HOLAP.
2. Dimensional model either be a star schema (simple and flexible) or snow flake (More complex and normalized as compare to star).
   But in the real it is combination of both star and snow-flake galaxy schema.
3. Datawareshoue should be designed in terms of group of multiple data marts depends on different-2 business process.
4. Confirmed dimension should be handled separately, that are re-usable for all the data marts.
5. Proper relationship defined in between fact and dimensions and fact and fact, if we use degenerated dimension.
6. Use junk dimension as a consolidation of all small-2 rarely changed dimensions like flag values, gender etc.
7. Proper and truly relationship defined between attributes and level and there hierarchy.
8. More focus to define attribute relationship that can either be fixed or Regid.
9. Use calculated member instead calculated column, as calculation in calculated members are based on aggregated data as well as no need of any extra space to store.
10. Different periodic measure groups like FactWeekly, FactMonthly, FactYearly for different-2 reporting purpose.
11. Avoid snapshoting in fact table if possible, Use additive or semi-additive facts.
12. Maintain dimensions based on different-2 SCDs required. Don't go to SCD2 (Historical) always. Try to use SCD4 (Rapidly Changing dimension)
    to separate rarely changing dimension attribute with frequently changing dimension attribute.
13. Define different-2 partitions per measure group, Proper aggregation.
14. Define separate process methods based on the nature of dimenson and fact. for example date or time dimension may process separately that can either be yearly or two.
    (Process full, Process update, Process clear, Process delete)
15. Design for Process incremental instead of full. Although there is no pre-defined functionality for incremental process in SSAS, but by adding some date column and maintain
    process control table, we can configure incremental process of our cube.




Best practice for ETL


Best practice for ETL



1. Analyze source data and fix the issues at source level before start extraction.
2. Extract only needed data instead all, ETL package should be design as incremental data extraction either be based on lastmodified and submit date or based on some key.
3. Configre Parllal processing within ETL package, it can either be at package level (MaxConcurrentProcess property that max by no of processor + 2) or at Dataflow task level Engine Thread
4. Avoid to use of Asynchronous tasks that block the data to complete there operation also need more then one buffer.
5. Avoid to use implicit type casting, use Data conversion task instead.
6. Indexes should be disable before pushing data into destination table. Even clustered index should be drooped if the clustered index is not auto-generated surrogate key that save time,
   that can be rebuild or reorganize. Clustered index can only be drop and re-create.
7. Use fast load if possible, fast load is only possible if source server is also a sql server.
8. Use SqlBulk.WriteToServer, if we need to store large amout of data.
9. Use bulkCopy or bcp utility to upload the data from flat file.
10. Data-reconcilation should be done after ETL that can either be based on Checksum, binarychecksum or Hash_Byte functions.
11. Avoid use of event handling
12. Proper logging at ETL level to track failure or any performance issue. And use sql server loging.
13. Avoid to use of SCD task.



Best Parctice for Database design or Modeling



Best Practice for Database design or Modeling



1. Use singular name for DB objects.
2. Don't use any space or special character or any reserve word in DB object.
3. Don't use any prefix like tbl / sp.
4. if we need to store password or any critical information in tables then it should be encrypted.
5. Create B-tree organizational structure instead of heap.
6. Try to clustered or non-clustered index on integer column. Index on varchar column may be causes of performance hit.
7. Double check before choosing datatype for a column like for flag use bit (0/1), for pre-defined list of column value use smallint, use date instead of datetime if possible.
8. Avoid to use sql_varient datatype that is equivalent of nvarchar(8000) for outside the world.
9. Create proper indexes that should be rebuild and re-created timely.
10. Use Includes keyword during Non-clustered index creation to prevent Key Lookup operator.
11. Define primary key and foreign keys in database tables.
12. Define partitions for large tables.
13. Define different-2 filegroups, we can arrange tables there based on frequently accessed, frequently joined, on there SCD Types, based on table partitions.
14. Put extra effort to decide fill factor value, in case if we have more update statements then fill factor value should be high.
15. Statistics should be set as auto update.
16. Separate Disk for temp db, mdf files, log files and backup files
17. proper set up for database security via schema, views, logins and roles
16. Set MaxDoop property based on no of CPUs for parllel processing.
17. Arrange tables at-least upto BCNF normalization form.
18. Create maintenance plans as well as backup plans for unexpected failure.
19. use filetables instead of file system to store document or picture file.