SQL Server Analysis Services

SQL Server Analysis Services(SSAS) delivers online analytical processing(OLAP) with data mining functionality for business intelligence applications.
Analysis Services supports OLAP by letting you design, create, and manage multidimensional structures that contain pre aggregated data from other data sources, such as relational databases.
In data mining feature Analysis Services, it lets you design, create and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.

Analysis Services main features include:

User-Friendly Conventions for names:The user experience, especially for ad hoc queries is one of key feature of SSAS. These conformed dimensions which can used for multiple Fact Tables.

Calculations/Pre-Aggregations:Calculations/pre-Aggregations of facts like proft, sales year to date, and sales same period last year is another key feature of SSAS.

Security: This is another main feature where limited Access through Perspective or restricting users from particular Access.

Query performance: Query performance is the frst reason most people are attracted to OLAP. SSAS offers excellent dimensional query performance.

The steps involved normally for Developing SSAS Cube:
1. Develop your plan.
2. Set up the design and development environment.
3. Create a Data Source View.
4. Create and fne tune your dimensions.
5. Run the Cube Wizard and edit the resulting cube.
6. Create calculations and other decorations.
7. Re-Iterate again and again until you get it right and acceptable.

I will post more information on upcoming posts….

SQL Server Partitioned View vs Partition Table

SQL Server Partitioned View vs Partition Table
Small Demo Code for Creating Partition Table
Another Small Demo Code for Creating Partition View
Now with below queries, we can find use of partitions in Partition Table

---------------to check the use of partitions-------------
set statistics io on
set showplan_text on
select * from Partitioned_Table
where col1 ='1521'

set statistics io off
set showplan_text off
Now with below queries, we can find use of partitions in Partition View
set statistics io on
set showplan_text on
select * from Sales_Tbl1
where Sales_TblID ='1555'

set statistics io off
set showplan_text off

I will discuss more on suitable usage for both in next blogs…………..

Setting up ForLoop to load CSV Files and Setting up Execute on error in SSIS

For loop can be used to setup a loop to selecting all CSV’s in particular location based on a pattern like C:App*.csv for all files starting with App in their name in C Drive root Folder.

This can be achieved by using following steps:

1. Create a Variable to store the connection string.

2. Drag a For loop in control and set the directory to look for and pattern. Map the variable as shown below.

3. Drag a DataFlow task and set the FlatFile connection connecting any of the files in that location.

4. Now change connection manager properties in Expressions to use the connection string using the variable created above.

5. Perform transformations and load into destination.

Now, say you have one file which does not match the same columns or invalid. So, you want the execution not to stop on error.

This can be achieved by using event Handler on DataFlow Task Forloop and making propagate on error to False as shown below:

Surrogate keys and lookup Transformation in Developing Data Warehouse

Surrogate keys and lookup Transformation in Developing Data Warehouse:

With conformed Dimensions approach, the Sk’s are created in when Dimensions are created by using Identity property (also can use alternative approach) but this has been convenient to me.
When adding identity to Dimension tables for Surrogate Keys make sure the Dummy rows like Unknown are added before you enable the identity.

Based on Business key, a Surrogate key lookup needs to be made in Package for Fact table and retrieve the Surrogate key from Dimension table by using lookup transformation.(Normally I use, Full Cache and Ole DB Data Connection).

By doing this, this will act as a RI (Referential Integrity) for joining facts and Dimensions tables together in Snow Flake Schema.
The major part in designing the Data Warehouse lies in understanding the underlying Business rules and data. Most of the time, you will need to write TSQL Scripts in SSMS first before designing SSIS packages using joins to analyse the data.

Another consideration in Dimensions is using Slowly changing Dimensions:
With slowly changing dimensions you can respond in one of three ways.
1.Type 1 Response: Overwrite the Old Data Value: You can overwrite the old data with the new data, which works best when the old data has no significance and can be discarded.
2.Type 2 Response: Create a New Dimension Record: You can create a new dimension record for the time period that defines the change, which works best with changes that neatly partition history.
3.Type 3 Response: Create an Old Value Column: You can also create an “old value” column in the slowly changing dimension to store the previous value, which works best with soft changes.

This can be achieved using Slowly Changing Dimensions Transformation which either inserts or UPdates the records based on a Businness key which needs to be defined as Key column with others as changing columns.

There has always been a Debate on adding Referential integrity constraints on Data Warehouse but if you like Data Load to fail instead of loading Wrong or unvalid data then, referential integrity would be handy. More over, if using analysis services then it would help it when you import the tables because it will see the relationships.

If you decide on adding the RI’s then below are few examples of them:

Add Default values to the Fact and Dimension table to columns where needed using a Default Constraint.

Add Primary key constraints and Foreign key contraints very carefully with effecting the package execution to Facts and Dimension tables.
Primary key for Date Dimension will be Datekey with Foreign key contraints on Releavant Fact Tables using it in Snow Flake Model.
A combined Primary key is normally built on Fact table like SalesOrderNum, SalesOrderLineNum,SalesOrderRevisionNUm on Orders Fact Table. We need to make sure its unique for all the table.
We can also add check contraints based on Auditkeys inserted inside the package for checking the audit key used in Dimensions are indeed already exists in AuditTable.

How is the Audit table helpful in DataWarehouse Development Cycle

How is the Audit table helpful in DataWarehouse Development Cycle?

The answer looks obvious which is debugging the package but what kind of information is normally loaded into a Audit Table and how it is helpful is a important topic.
Let us see some of the Audit Table columns helpful in our Audit table created here.

1. Using variable to populate the Audit key from master to child:
• When individual packages are executed from master package, pass the Auditkey to individual packages from master copy to populate the parentAuditkey column in Audit table designed.
• This way, we will have reference of parentAuditkey under which child id running.
• For doing this on Child packages, right click on package configurations and Enable package configuration for the variable by clicking add button. The value property needs to be selected for variable.
2. Auditkey and parentAuditkey provides information about master and child packages run together and if on the whole the execution of all packages is successful or not in the column SuccessfulProcessingInd (=’Y') which needs to be populate in master package after all the child package executions for all packages from master and child using Auditkey in master package.
For Example:
End of Master Package:
UPDATE [DimAudit]
WHERE AuditKey = ?

End of Child Package:

UPDATE [DimAudit]
, ExtractRowCnt = @RC_Xtrct
,TableFinalRowCnt = @RowCount
,ErrorRowCnt = @RC_Error
WHERE AuditKey = @AuditKey

3. [ExecStartDT],[ExecStopDT]will be helpful in finding time taken to execute individual packages as well as master package.
1. [ErrorRowCnt],[TableInitialRowCnt],[TableFinalRowCnt]columns are for checking for any data errors in execution.
2. [InsertRowCnt],[UpdateRowCnt] are helpful in finding number of new rows inserted versus updated rows(already existing data).