Dimensional modeling is the name of a logical design technique often used for data warehouses. DM is the only viable technique for databases that are designed to support end-user queries in a data warehouse. It is different from entity-relation modeling. ER is very useful for the transaction capture and the data administration phases of constructing a data warehouse, but it should be avoided for end-user delivery. This paper explains dimensional modeling and how the dimensional modeling technique varies/ contrasts with ER models. The author structured the essay and created a list of used sources in accordance with universal academic requirements.
Dimensional Modeling is a favorite modeling technique in data warehousing. DM is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. Astera names three basic advantages of DM. First off, one can obtain the data faster. Secondly, a person comprehends the introduced information easily. Thirdly, it is not difficult to change it when needed. That is why one must be aware of the principles of dimensional modeling and differ it from ER modeling to record information in data warehouses successfully.
Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic “star-like” structure is often called a star join. A fact table, because it has a multipart primary key made up of two or more foreign keys, always expresses a many-to-many relationship. The most useful fact tables also contain one or more numerical measures, or “facts,” that occur for the combination of keys that define each record. Dimension tables, by contrast, most often contain descriptive textual information. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set. Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code. Generally, the Dimension Attributes are used in report labels, and query constraints such as where Country=’UK’. The dimension attributes also contain one or more hierarchical relationships. Before designing a data warehouse, one must decide upon the subjects.
In DM, a model of tables and relations is constituted with the purpose of optimizing decision-support query performance in relational databases, relative to a measurement or set of measurements of the outcomes of the business process being modeled. In contrast, conventional E-R models are constituted to remove redundancy in the data model, to facilitate retrieval of individual records having certain critical identifiers, and therefore, optimize Online Transaction Processing (OLTP) performance.
There are eight types of DM. A conformed type introduces tables with the same domain content and column names. An outrigger type is presented as a snowflake table. A shrunken type involves a higher level of summary. A role-playing type resembles the conformed one, but the same fact provides a different meaning. A junk type includes attributes that are not a part of any table. A dimension key that lacks its table is used in a degenerate type. If a dimension has several alternative versions, it will be called swappable. A step type shows how many steps a person must take to complete a concrete session, and at what step he is at the moment.
In a DM, the grain of the fact table is usually a quantitative measurement of the outcome of the business process being analyzed. The dimension tables are generally composed of attributes measured on some discrete category scale that describe, qualify, locate, or constrain the fact table quantitative measurements. Ralph Kimball views that the data warehouse should always be modeled using a DM/star schema. Indeed, Kimball has stated that while DM/star schemas have the advantages of greater understandability and superior performance relative to E-R models, their use involves no loss of information, because any E-R model can be represented as a set of DM/star schema models without loss of information. In E-R models, normalization through the addition of attributive and subtype entities destroys the clean dimensional structure of star schemas and creates “snowflakes,” which, in general, slow browsing performance. But in star schemas, browsing performance is protected by restricting the formal model to associative and fundamental entities, unless certain special conditions exist (Kimball, 19960).
The key to understanding the relationship between DM and ER is that a single ER diagram breaks down into multiple DM diagrams. The ER diagram does itself a disservice by representing in one diagram multiple processes that never coexist in a single data set at a single consistent point in time. It’s no wonder the ER diagram is overly complex. Thus, the first step in converting an ER diagram to a set of DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately. The dimensional model has a number of important data warehouse advantages that the ER model lacks. The dimensional model is a predictable, standard framework. Report writers, query tools, and user interfaces can all make strong assumptions about the dimensional model to make the user interfaces more understandable and make processing more efficient.
The wild variability of the structure of ER models means that each data warehouse needs custom, handwritten, and tuned SQL. It also means that each schema, once it is tuned, is very vulnerable to changes in the user’s querying habits, because such schemas are asymmetrical. By contrast, in a dimensional model, all dimensions serve as equal entry points to the fact table. Changes in users’ querying habits don’t change the structure of the SQL or the standard ways of measuring and controlling performance (Ramon Barquin and Herb Edelstein, 1996).
It can be concluded that dimensional modeling is the only viable technique for designing end-user delivery databases. ER modeling defeats end-user delivery and should not be used for this purpose. ER modeling does not really model a business; rather, it models the micro relationships among data elements (Ramon Barquin and Herb Edelstein, 1996).