Sunday, November 24, 2013

ETL -- 6. Data Warehousing Schemas and Objects

Data Warehousing Schemas


A schema is a collection of database objects like tables, views, indexes, synonyms etc.
Schemas can be arranged in different ways in a DW environment


  • Star Schema
  • Snow Flake Schema


Start Schema


It is a highly de-normalized query centric model where information is broken in to two groups ( Facts and Dimensions). It generally follows the principal of One Fact to many Dimensions




Snow-Flake Schema


The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema’s dimensions are normalized with each dimension represented by a single table.


Data Warehousing Objects


Fact Tables



  • Represent a business process as an artifact in the data model
  • It contains business logic problems, historical data – they mainly contain numeric data
  • Fact tables usually have high volume of data
  • Fact tables contain foreign keys of dimension tables
  • It mostly contains additive data but can also have semi-additive (ledger balance) and non additive (unit price) data also.


Dimension Tables


Contains business attributes and plays a major role in providing analytics
Ex: Customer table, information, product etc.,

  • Define business in terms already familiar to users
  • Descriptive data
  • Small tables compared to fact tables
  • Joined to Fact tables via a Foreign key
  • Heavily indexed




Hierarchies


Hierarchies are logical structures that use ordered levels as a means to organize data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure

Unique Identifiers


Unique Identifiers are specified for one distinct record in a dimension table. Artificial unique identifiers are often used to avoid the potential problem of unique identifiers changing.
Ex: #customer_id

Relationships


Relationships guarantee business integrity. An example is that if a business sells something, there is obviously a customer and a product. Designing a relationship between the sales information in the fact table and the dimension tables products and customers enforces the business rules in databases

Below is an image that shows illustrates schemas and objects in DW environment

No comments:

Post a Comment

TestNG - Can i use the 2 different data providers to same @test methods in TestNG?

public Object [][] dp1 () { return new Object [][] { new Object [] { "a" , "b" }, new Obje...