Sunday, November 24, 2013

ETL -- Data Warehousing Architecture

Let us first understand what Data Warehousing Architecture means?


The architecture of a Data Warehouse typically comprises of the descriptions and functionalities of components and services. It also tells us how various components interact.

By analyzing the architecture we will be understand the scope for expansion on the existing system.
Below is an image that gives us a sample of a Data Warehouse Architecture




Some important terminologies to remember


Data Mart


Before we jump to the different architecture approaches for building a Data Warehouse, let us understand what Data Marts are?

From the DW, granular level data flows to various departments for their custom needs. Data which is periodically loaded from the DW to a local database, then the local database is referred to as a Data Mart




Data Sources in a DW environment


Data sources in a DW environment can be anything from a flat file to a RDBMS and also a Mainframe application. They can broadly be classified into:

  • Legacy – Mainframes
  • Operational Data – transaction processing systems
  • External Data source – Authorization data for Visa, master cards




Metadata


At all levels of the data warehouse, information is required to support the maintenance and use of a Data Warehouse

Metadata is data about the data.



Operational Data Source (ODS)


Operational data source is similar to a Data Warehouse but with a few differences.
Here are some features

  • It is subject oriented
  • Integrated
  • Current – history is not kept
  • Volatile – Since changes are current
  • Detailed – it is usually more detailed than a DW as summary data is not stored


So ODS is typically something which integrates data from multiple business operations to address operational issues that span one or more business functions of an organization.




Benefits of Operational Data Source (ODS)



  • Supports operational reporting needs of an organization
  • Provides a complete view of customer relationships, the data for which might be stored in several databases
  • Provides more current data than a DW and is more integrated than a OLTP system
  • Reduces the additional burden on other operational or data warehouse systems by providing the required additional data
  • Sends data to other operational systems also
  • Data is updated frequently and so this can be used for drill down reporting from the summary reported by the Data Warehouse


Data Warehousing Architectures


Data warehouses can be built on the following architectures


  • Centralized
  • Federated
  • Tiered






Centralized


In a centralized architecture, there exists only one data warehouse which stores all the necessary data for analysis.
Pros:
It helps organizations answer questions that were ‘unanswerable’ before the data was loaded to DW
It is best suited for integrated business where organizations across enterprise work together.
Cons:
The disadvantage here is loss of performance when compared to a distributed approach.
%nbsp;






Federated


In this architecture, data is logically consolidated but stored on separate physical databases either at the same of different physical location.

The local Data Marts store only relevant information for that particular department.

When compared to a Centralized architecture, the amount of data is reduced.






Tiered


A tired architecture is a like distributed data approach. This process cannot be done at one go as there are many sources that need to be integrated to a DW.

The first step would be the collected data at the lowest level and then keep consolidating until all the data is available in the DW

Ex: Data from all branches in a region is collected first, then data from all regions is integrated to form the DW

Pros:

  • It has a faster response time because data is located near the client applications
  • Data retrieval can be done on a reduced set of data thereby making it faster

2 comments:

TestNG - Can i use the 2 different data providers to same @test methods in TestNG?

public Object [][] dp1 () { return new Object [][] { new Object [] { "a" , "b" }, new Obje...