Project DWH is finally completed!

Task

Multiple operators receive calls and each call creates a record in the database (SQL Server). Our task was to combine and unify those databases and tables into one single unique DWH (Data Warehouse) system which later on the client should be able to run various reports using the report tool provided. The reports are running on Power BI.

Business Challenge

The problem we faced was that databases did not have unique primary key or referral constraints. Thus many records could potentially be doubled or tripled, or records in the referral tables did not have the key record in.
For example one of the customers changed their name. He was renamed in the dimension tables but all records with the old name was left the same in the transactional tables – (tables where actual data is)! Thus - lost records. Nightmare!
Also there were quite a few problems with current documentation being very outdated and we had to make some assumptions.
However we have managed to get through all of this finally!

Successful Solution

Our pipeline is PROD -> STAGING ->DWH with transformation procedures and procedures to catch up exceptions mentioned above.
Our first stage was TEST-> STAGING ->DWH with transformation procedures and procedures to catch up exceptions mentioned above.
Another challenge we came up with is to refresh TEST env with PROD data, giving the fact there was not much storage space for the TEST. We came up with compression of the old data deletion, partitioning and index rebuilds techniques.
But the refresh data was still one of our biggest issues.
All in all Project took 6 months from the start , first 3 months went on discovery, how the current process works with limited or old documentation in place, setting up TEST environment with limited storage capacity and automate data refresh from PROD to TEST.


Benefits

There is 790 371 876 rows replicated and transformed to DWH and 530 Gb data as per initial DWH set up and now it is receiving 1 million rows on a daily basis.
The client is using Power BI and quite happy ( I hope) with our skills and effort we have made.