Friday, April 29, 2016

Eight Steps to an Enterprise Data Model (EDM)


By Barry Williams
Principal Consultant
DatabaseAnswers.org
What is an Enterprise Data Model, or EDM? How can you use one, and why should you be prepared to deploy it? In this in-depth article, we’ll cover the fundamentals of EDM theory and design, including what makes an EDM different and what steps are crucial to its success.
Let’s start with the definition.

What Is an Enterprise Data Model?

For the purposes of this article, we’ll use customer retail purchases as the background for our EDM. They’re something we’re all familiar with, and they illustrate the key steps to building an EDM very well.

Essentially, an EDM is a High Level Data Model (HLDM) with a number of Subject Area Data Models. It will always be created and maintained by a commercial data-modeling tool, which will create a data dictionary that houses the definitions of all common terms within a specific business.

What makes an enterprise data model different? It has two unique characteristics:
  1. It includes details of all tables, columns, and data types for everything in an enterprise’s defined business areas. This includes the physical data model with tables, columns, and data types.
  2. There is a comprehensive definition of the data in an EDM. EDM should include a glossary of terms and a set of business rules.
EDMs may start with both High Level (conceptual level) and Subject Area (logical level) data models but will finish with a complete physical model.

Why Use an EDM?

An EDM helps to understand the data that is available on demand within the enterprise.

This will include data for management inquiriess, performance reports and include data from operational systems and external sources. It provides an integrated definition of data, which we call a ‘Single View of the Truth’. Therefore, it’s not surprising that EDMs are frequently used as the foundation for an enterprise data warehouse (EDW).

The next question logically is, How are EDMs created? There are two ways: the waterfall method and the agile method.

The Waterfall Method of Building an Enterprise Data Model

The traditional approach, which emphasizes process and collaboration, usually takes three to six months of work with a team of people to establish business definitions and construct the actual model. Software developers will see the similarities between this process and their waterfall method where progress is measured by the completionnnnn of distinct phases, and each phase is completed before the next one begins.

As anyone in business or any kind of development might expect, there are drawbacks to the waterfall method of EDM development. The phenomenon of large teams taking longer than small teams has been recognized for many years. Nearly 30 years ago, it was detailed in The Mythical Man-Month by IBM Project Manager Fred Brooks. The author of the book explains that because of the division of labor, large programming projects suffer from management problems that differ from those encountered in small projects. In other words, adding people to a project that was already running behind schedule causes it to run even further behind. The reasons for this include more time spent in communication, training, and management—and, as we all know, in a seemingly endless stream of meetings, involving a whole range of team members, key stakeholders, outside consultants, and others.

Since time and money are often at a premium, a different approach has also been developed: the agile method.

The Agile Method of Building an Enterprise Data Model

The agile approach is built around a very small, very experienced team. It could be just one person who is well versed in creating EDMs in close collaboration with business users. Rather than proceeding on the basis of committee and process, it favors speed and efficiency.

In my experience, the agile approach entails working closely with a single senior manager to model the functional area for which he or she is responsible. This is repeated for as many functional areas as needed. The aim is to get the area completed quickly and to promptly secure approval from the manager, and then to move on to the next functional area.

A Case History: Using the Agile Method to Build an EDM

When I am asked to collaborate on an EDM build or extension, one of the following two scenarios usually applies:
  1. Build a new EDM from scratch. In this case, I recommend to the client that we base our approach on a High Level Data Model (HLDM). In taking this approach, significant progress can be made by an expert-level data modeler in two to three months.
  2. Base a new or existing EDM on a third-party data model, such as those supplied by Teradata. For a team of up to 6 people, this can take anywhere from three to six months.
My preference is for the agile method, an approach with which I’ve had considerable experience. When I was invited to single-handedly create an EDM for a local authority in London, I accomplished the following tasks over the course of six months:
  • Created a 200-page document covering the top 20 applications within the organization
  • Ran weekly facilitated workshops to address enterprise-wide issues, such as data quality
  • Interviewed multiple Subject Matter Experts (SMEs) on a weekly basis
  • Built an Online Glossary of Terms and achieved sign-off from the SMEs
  • Designed HDLMs for all major functional areas, such as council tax, electoral register, housing benefits, parking fines, rent areas, and so on
  • Defined a library of Subject Area Data Models for common functions, such as payments
After completing these tasks, I was asked by the CEO to document this approach, which had been approved by senior management and accepted as the way forward. The resulting manual of best practices enabled permanent employees to continue my work on future projects.

I have generalized this approach and applied it to a variety of other clients. It has been very successful, because I’ve been able to work on my own and maximize the results of my time and expertise.

Building a Better EDM in Eight Steps

How complicated is a traditional EDM? The image below is just a partial representation of one EDM:



Traditional EDM
Obviously, we would never want to intentionally create something unduly complicated when a simpler solution is available, so let’s look at the agile way of building an EDM, in eight steps. These steps will create a High Level Data Model with the associated Subject Area Data Models. Our target is to have all parts of the EDM approved by business users.

The eight steps we will use are:
  • Talk to the Business
  • Identify the Dominant Entity
  • Determine relationships and derive business rules
  • Create the High Level Data Model
  • Create the Subject Area Data Models
  • Add identifiers to generate the Logical Data Models
  • Add data types to generate the Physical Data Models
  • Repeat 1 to 7 as needed.

Step 1. Talk to the Business

We start by talking to the business users to establish the scope and the functionality of the project. Bear in mind that if the scope for an EDM is an entire enterprise, then an incremental approach can be used; in fact, it can be beneficial to do this for large projects.

Let’s demonstrate this initial step for our chosen example: customer retail purchases. We talk to the business users to establish the functionality within the scope of our data model. During this process, the following information emerges:
  • Customers are defined as people who visit stores and purchase products. Note, however, that the definition of a Customer might vary from company to company; in some settings, Customers might be people who respond to emails or offers, rather than being limited to actual buyers.
  • Customers make Purchases and pay for their items.
  • Each Purchase may or may not be associated with a Promotion.
  • Each Purchase involves a member of Staff.
  • Products are delivered from Warehouses to Stores and are then sold to Customers.

Step 2. Identify the Dominant Entity

From what we’ve learned from the company in Step 1, we can determine the dominant entity and the other pertinent entities.

Step 3. Determine Relationships and Derive Business Rules

With the entities and their relationships established, we can design our entity relationship diagram (ERD), a logical data model that we refer to as our High Level Data Model. Each entity in the HLDM will have its own Subject Area Data Model.

Next, we establish the business rules that our data model will have to comply with. For example, a Customer can be associated with zero, one, or many Orders. These rules are simple to establish, and very powerful for validating test data and relationships.

It is good practice to specify the business rules that apply to a conceptual data model. It is an even better idea to first discuss them with business users. This provides you with a great opportunity to gather input from the users and establishes a good basis for future interactions. For purposes of documentation, I recommend getting them to sign a statement of approval after they review your iterations of their rules. Thus, we establish buy-in without getting caught up in office politics.

Below you’ll find an example of the business rules that apply to our current scenario. We’ve used a formal syntax to unambiguously translate the rules into data model relationships. To further aid understanding, we’ve listed these rules by entity or table. Validation is achieved by stating that each entity must appear in at least one rule.
  • Business Rules: Channel
    • BR.Ch1. A Channel can be involved in zero, one, or many Sales Purchases
  • Business Rules: Customer
    • BR.C1. Customers can make one or many Sales Purchases. (Recall that in order to be classified as a Customer, the person or Organization must make at least one Sales Purchase.)
  • Business Rules: Product
    • BR.P1. A Product can be associated with zero, one, or many Stores
    • BR.P2. A Product can be associated with zero, one, or many Warehouses
  • Business Rules: Sales Purchase Transactions
    • BR.SP1. A Payment can be associated with one and only one Sales Purchases.
    • BR.SP2. A Product can be associated with zero, one, or many sales purchases.
    • BR.SP3. A Promotion can be associated with one or more Sales Purchases
    • BR.SP4. One or many members of Staff can be involved in a Sales Purchase
  • Business Rules: Distribution
    • BR.ST1. A Store can be associated with one Warehouse.
Note that these rules are clear, easy to understand, and presented logically.

Step 4. Create the High-Level Data Model

Now that we have clear definitions to work with, we can create and review the High Level Data Model. Again, after we create the HLDM, we go back to the users, review the business rules and the model, and get their approval and signature.

Our design pattern consists of the dominant entity and any associated subject areas. We’ll use this pattern for our example, but in real life we would expect to have some variations. The following is what the blank pattern looks like:

Design pattern

Step 5. Create the Subject Area Data Models

To move on, we need to customize and fill in our pattern with the appropriate subject areas that we’ve already identified. Once this is done, our design pattern looks like this, with Retail Purchases being the dominant entity:

Creating the subject area data models

With the HLDM mapped out, we can go on to create a data model for each subject area. At Database Answers, we have an extensive library of subject area models that we can adapt for use here. We’ve designed and built these to a common standard, so we can expect to find some generic models to use on a “plug-and-play” basis.

Step 6. Add Identifiers to Generate the Logical Data Models

Once we’ve mapped out our subject areas, we can separately add identifiers for each Customer, Product and Sales Purchase.

Naturally, no one knows the exact volumes involved here, so we’ll provide for very large numbers. The way we meet this requirement is by using what is called an auto-increment field. This is a field that automatically increases in value every time a new value is referenced. In SQL Server, this is called an identity; in Oracle, it is called a sequence.

If a Customer purchases a small pair of black pants, we consider the attributes “small” and “black” to be Reference Data, because their values do not change and we can define them in advance. Our naming convention for Reference Data is that we define the Primary Keys as character strings of up to 15 characters. In our example, for sizes we use S, M, and L for small, medium, and large.

Step 7. Add Data Types to Generate the Physical Data Models

As I said at the beginning, the EDM must include the data types of all attributes of all entities.

At this point, it’s good to add the appropriate details of the identifier and data types to the simplified conceptual data model.

These are the defaults that apply:
  • Primary key ID fields are all auto-increment fields.
    (In SQL Server, this is called an identity; in Oracle, it is called a sequence.)
  • Primary key code fields are of type Char (15).
    (These are often low-volume reference data that change rarely if at all.)





  • Dates are of “date” type and are stored with or without time.
  • Prices and values are of “money” type.
  • All descriptions and text fields, or any other fields, are of type Varchar(255) in SQL Server, or Varchar2(4000) in Oracle.
During this step, we’ll produce our EDM by consolidating the HLDM’s physical data models with all the subject-area models.

Step 8. Repeat Steps 1 to 7 as Needed

Our final EDM will need to be controlled by master data management, at least for reference data and master data, such as customers and products.

This diagram presents a summary of our approach:

Diagram

The Importance of Design Patterns

We can describe this small data model as follows:
  • A Retail Purchase can include many Products.
  • The same kind of Product can be included in many Retail Purchases.
This defines a many-to-many relationship between products and retail purchases. Some modeling packages can support this feature at the logical level; at the physical level, however, relational databases cannot do so. Therefore, we introduce an “associative entity” to solve this problem.

This example shows an associative entity called Products in a Purchase.

Associative entity products in a purchase

This associative element occurs frequently and is a popular solution to a common problem. Therefore, we can refer to it as a “design pattern.” I find it very useful to maintain a library of design patterns and add to it whenever possible. In this way, my library becomes a part of my best practices for data modeling.

Customers, Parties, and Participants

In everyday language, we refer to ‘Customers’ frequently, without a second thought. However, data modelers are accustomed to thinking about the role that parties or participants play in events.

For example, if we are designing an EDM for a taxi company, we would say that taxi drivers play the role of suppliers to their passengers. Yet, the same taxi driver plays the role of a Customer when he buys fuel for his taxi. Although he figures in two transactions, we cannot count him twice. The data model will have to be able to reflect that the same person is playing different roles.

In a traditional EDM, we use a party entity to represent this. I prefer to call a party entity a Participant because it is more readily understood among non-developers. Either way, we can use an SQL view to implement this role. This means that we can continue to refer to Customers, and then implement a Participant approach with SQL views if that becomes necessary.

Party entity

Exploring Data Model Types

Each model type has a particular usage. The conceptual model is great for business users who want a high-level understanding. In this model, we show only the entity name, since the focus is on the scope and relationships between entities.

In the logical model, we add the attributes of each entity. This model is of interest to those who want a more in-depth understanding but don’t need to know details about data types (for example, financial analysts who need to generate reports).

For DBAs, developers, and anyone involved in database implementation, there is the physical model, where we add the data types for each attribute in each entity in the logical model.

The composite model, shown below, outlines all of these three levels:
  1. At the conceptual level, there are Channels, Payments and Promotions.
  2. At the logical level, there are Staff, Stores and Warehouses.
  3. At the physical level, there are Products, Products_in_Purchases, Ref_Sizes, and Retail Purchases.


In addition, there are two related model types: dimensional and operational.

Dimensional models are extremely useful for developing reports based on dimensions—Date, Product, Geographic Area, and so on. Interestingly, dimensional models are also called data marts, which highlights their capability for taking data from the EDW and making it available for Business Intelligence (BI) tools.

Operational models, or operational data stores, are used to store data extracted from operational systems. These could include common third-party systems such as SAP, Salesforce, and Oracle E-Business Suite. Another product, such as Safyr, can be used to help load data into an EDM-based EDW.

This diagram shows how this works in practice:

Diagram

Getting into the Details: Generating Subject Area Models with Data Types

In the composite EDM shown below we show entities at different levels. At the conceptual level, there are Channels, Payments and Promotions. At the logical level, there are Staff, Stores and Warehouses. And at the physical level, there are Products, Products_in_Purchases, Ref_Sizes, and Retail_Purchases.

To generate physical data models from logical data models, the default data types should be applied.

We’ll show this for each data type, and you can access a link to each data model by clicking on the data type name. For example, click on the “Customers” heading below this text to go to that model at DatabaseAnswers.org.

The final step is to expand each Subject Area Model to include all the Tables in the Physical Data Models, showing the data types.

The purpose of showing this level of details is so that the EDM can be used by developers to identify the low level of data that they will require.

This means that the EDM provides a very powerful facility for people at all levels to understand the data that is available and how they can use it.

Customers


Documents


Locations


Payments


Products

Note: We have many choices for Products on the Database Answers website.


Here is another choice that for products; this is an enterprise–attribute–value approach:


Promotions


Retail Purchases

Note that the Customers table here should have the same fields as the Customers table referenced in the subheading Customers, Parties, and Participants.


Staff


Stores


Stop! Review with Users!

I know we’ve already mentioned this several times, but it bears repeating. It is very important to review your model with business users. You want them to agree with your business rules, the scope you have defined, and the entities within the scope. Reviewing as you go will prevent problems and headaches down the road.

How to Use an EDM as the Foundation for an EDW

I have often seen an EDM used as the foundation for an Enterprise Data Warehouse (EDW).

We can identify three layers of data in an EDM which are implemented directly in the EDW, as the image below demonstrates:


Three layers of data in an EDM
In the EDW, master data is usually associated with a master data management (MDM) function and staffed with individuals who follow corporate MDM governance procedures.

EDW and the Single View of the Truth

A very important aspect of the EDW is the ability to produce a “Single View of the Truth” (aka “Single Vision of the Truth” or simply “SVOT”).

In the case of the Customers entity, our EDM is now supporting an EDW. We have to produce an SVOT for these Customers. This means, for example, that a Customer called Joseph Bloggs might also be known as Joe Bloggs or Joey Bloggs. Our MDM must recognize that these three names represent a single Customer. Thus, we need to be able to integrate the data for Joseph, Joe, and Joey, and then present it as a SVOT for the same Customer.

Naturally, the amount of data being handled requires another system to make sure it’s being processed properly, and this is where data governance is needed. Data governance is used to ensure that the SVOT is maintained whenever changes are made. These include alterations to table structures, or to the way data is loaded into the EDM, or how the data is integrated after loading. Data stewards—people responsible for the management of critical data—are trained and authorized to supervise this process. It’s up to them to see that all changes are done in a controlled manner, to ensure data integrity, and to maintain the consistency of the SVOT.

A Complete View of Agile EDM Creation

In case you were wondering what this entire project will look like when it’s finished, here is the completed version of the EDM’s Physical Model:

We’ve come a long way in this post. We’ve discussed Enterprise Data Models and a unique approach to building them. We’ve dissected the High Level Data Model and explored the many variants of Subject Area Models. We have also looked at how data is used in an EDM-based data warehouse and how it can be loaded into a data warehouse.

Hopefully, you’ll agree that the agile approach offers significant benefits over the traditional approach. I’ve used it for over ten years, and I know from firsthand experience that it works. Please, use it and investigate how it can be tailored to your projects. If you have any questions or comments, feel free to start a discussion below, or go to DatabaseAnswers.org and check out our additional material.

No comments:

Post a Comment