Thursday, August 22, 2013

Dimension Hierarchies and Categories

Dimension Hierarchies/Categories
When a user analyzes the measurements along a business dimension, the user usually would like to see the numbers first in summary and then at various levels of detail. What' the user does here is to traverse the hierarchical levels of a business dimension for getting the details at various levels. For example, the user first sees the total sales for the entire year. Then the user moves down to the level of quarters and looks at the sales by individual quarters. After this, the user moves down further to the level of individual months to look at monthly numbers. What we notice here is that the hierarchy of the time dimension consists of the levels of year, quarter, and month. The dimension hierarchies are the paths for drilling down or rolling up in our analysis. 

Within each major business dimension there are categories of data elements that can also be useful for analysis. In the time dimension, you may have a data element to indicate whether a particular day is a holiday. This data clement would enable you to analyze by holidays and see how sales on holidays compare with sales on other days. Similarly, in the product dimension, you may want to analyze by type of package. The package type is one such data clement within the product dimension. The holiday nag in the time dimension and the package type in the product dimension do not necessarily indicate hierarchical levels in these dimensions. Such data elements within the business dimension may he called categories. 

Hierarchies and categories are included in the information packages for each dimension. Let us go back to the two examples in the previous section and find out which hierarchical levels and categories must be included for the dimensions. Let us examine the product dimension. Let us examine product is the basic automobile. Therefore, we include the data elements relevant to product as hierarchies and categories. These would be model name, model year, package styling, product line, pi %.duct category, exterior color, interior color, and first model year. Looking at the other business dimensions for the auto sales analysis, we summarize the hierarchies and categories for each dimension as follows: 

Product:- Model name, model year, package styling, product line, product category, exterior color, interior color, first model year
Dealer:-  Dealer name, city, state, single brand nag, date first operation
Customer demographics:- Age, gender, income range, marital status, household sue, Veil IC Ics owned, home value, own or rent
Payment method:- ' Finance type, term in months, interest rate, agent
Time:- Date, month, quarter, year, day of week, day of month, season, holiday flag
Let us go back to the hotel occupancy analysis. We I' ' included three business dimensions. Let us list the possible hierarchies and category the three dimensions.
Hotel:- Hotel line, branch name, branch code, region, address, city, state, Zip Code, manager, construction year, renovation year
Room Type:- Room type, room size, number of bed, types of beds, maximum occupants, suite, refrigerator, kitchenette
Time:-  Date, day of month, day of week, month, quartet, year, holiday flag

Wednesday, August 21, 2013

Business Dimensions

Business Dimensions
As we have seen, business dimensions form the underlying basis of the new methodology for requirements definition. Data must be stored to provide for the business dimensions. The business dimensions and their hierarchical levels form the basis for all further phases. So we want to take a closer look at business dimensions. We should be able to identify business dimensions and their hierarchical levels. We must be able to choose the proper and optimal set of dimensions related to the measurements. 

We begin by examining the business dimensions for an automobile manufacturer. Let us say that the goal is to analyze sales. We want to build a data warehouse that will allow the user to analyze automobile sales in a number of ways. The first obvious dimension is the product dimension. Again for the automaker, analysis of sales must include analysis by breaking tlic sales down by dealers. Dealer, therefore, is another important dimension for analysis. As an automaker, you would want to know how your sales break down along customer demographics. 

You would want to know who is buying your automobiles and in what quantities. Customer demographics would be another useful business dimension for analysis. How do the customers pay for the automobiles? What effect does financing for the purchases have on the sales'? These questions can be answered by including the method of payment as another dimension for analysis. What about time as a business dimension? Almost every query or analysis involves the time element. In summary, we have come up with the following dimensions for the subject of sales for an automaker: product, dealer, customer demographic, method of payment, and time.. 

Let us take one more example. In this case, we want become up with an information package for a hotel chain. The subject in this case is hotel occupancy. We want to analyze occupancy of the rooms in the various branches of the hotel chain. We want to analyze the occupancy by individual hotels and by room types. So hotel and room type are critical business dimensions for the analysis. As in the other case, we also need to include the time dimension. In the hotel occupancy information package the dimensions included are hotel, room type, and time.

INFORMATION PACKAGES - A NEW CONCEPT

INFORMATION PACKAGES—A NEW CONCEPT
We will now introduce a novel idea for determining and recording information requirements for a data warehouse. This concept helps us to give a concrete form to the various insights, nebulous thoughts, and opinions expressed during the process of collecting requirements. The information packages, put together while collecting requirements are very useful or taking the development of the data 'warehouse to the next phases.

Requirements Not Fully Determinate. As we have discussed, the users arc unable to describe fully what they expect to see in the data warehouse. 'You are unable to get TA handle cm what pieces or in you want to keep in the data warehouse. You are unsure of the usage patterns. You cannot determine how each class of users will use the new system. So, when requirements cannot be fully determined. we need a new and innovative concept to gather and record the requirements. The traditional methods applicable to operational systems are not adequate in this contest we cannot start with the functions, screens and reports. We cannot begin with the data structures. We have noted that the users tend to think in terms of business dimensions and analyze measurements along such business dimensions. This is a significant observation and can form the very basis for gathering information.

The new methodology for determining requirements for a data warehouse system is based on business dimensions. It flows out of the need of the users to base their analysis on business dimensions. The new concept incorporates the basic measurements and the business dimensions along which the users analyze these basic measurements. Using the new methodology, you come up with the measurements and the relevant dimensions that must be captured and kept in the data warehouse. You come up with what is known as an information package for the specific subject.

Let us look at information package flu sales for a certain business. Figure 5-4 contains such an information package. The subject here is sales, the measured facts or the measurements that are of interest for analysis are shown in the bottom section of the package diagram. In this ease, the measurements are actual sales, forecast sales, and bud-get sales. The business dimensions along which these measurements arc to be analyzed are shown at the top of diagram as column headings. In our example, these dimensions arc lime, location, product, and demographic age group. Each of these business dimensions contains a hierarchy or levels. For example, the time dimension has the hierarchy going from year down to the level of individual day. The other intermediary levels in the time dimension could be quarter, month, and week. These levels or hierarchical components are shown in the information package diagram.

Your primary goal in the requirements definition phase is to compile information pack-ages for all the subjects for the data warehouse. Once you have firmed up the information packages. You'll be able to proceed to the other phases.

Essentitilly information packages enable you to:
  • Define the common subject areas
  • Design key business metrics
  • Decide how data must be presented
  • Determine how users will aggregate or roll up
  • Decide the data quantity for user analysis or query
  • Decide how data will be accessed 

Figure 5-4 An Information Package

  • Establish data granularity
     
  • Estimate data warehouse size
     
  • Determine the frequency for data refreshing
     
  • Ascertain how information must be packaged