The 10 Most Common Data Modeling Mistakes
Data modeling is the process by which we represent the objects or entities of the information system and the connections between them. These entities can be people, products or anything else related to your business; regardless of entity type, modeling them correctly results in a powerful database configured for fast information retrieval, efficient storage, and more.
SEE: Job Description: Big Data Modeler (TechRepublic Premium)
Given the benefits that data modeling offers for database information, it is important to learn how to effectively model data in your organization. In this guide, I will highlight some key mistakes to avoid when modeling your data.
- Not seeing quality data models as an asset
- Disregard app data usage
- Without schema does not mean without data model
- Not taming semi-structured data
- Not planning data model evolution
- Rigid mapping of UI to fields and values of your data
- Incorrect or different granularity levels
- Inconsistent or non-existent naming patterns
- Do not separate the concept of keys from indexes
- Starting Data Modeling Too Late
Not seeing quality data models as an asset
As Melissa Coates, Microsoft Power BI Consultant, has underlinewe sometimes optimize our data models for a particular use case, such as analyzing sales data, and using the model quickly becomes more complicated when analysts need to analyze more than one thing.
For example, it can be difficult for analysts to jump straight into analyzing the intersection of sales and support calls if models have been optimized for sales data only. Not to mention the additional time, resources, and possible costs that might be required to create additional models when one model would have sufficed.
To avoid this type of model inefficiency, take the time to ensure that your data model has broader applicability and makes good financial sense in the long run.
Disregard app data usage
One of the hardest things about data modeling is finding the right balance between competing interests, such as:
- Application data needs
- Performance goals
- How the data will be recovered
It’s easy to get so wrapped up in looking at the data structure that you spend too little time analyzing how an application will use the data and finding the right balance between querying, updating, and processing. Datas.
SEE: Recruitment kit: Data scientist (TechRepublic Premium)
Another way to express this error is to have insufficient empathy for others who will use the data model. A good data model considers all users and use cases of an application and builds accordingly.
Without schema does not mean without data model
NoSQL databases (document, key-value, wide column, etc.) have become an essential component of enterprise data architecture, given the flexibility they provide for unstructured data. Although sometimes mistakenly thought of as “schemaless” databases, it’s more accurate to think of NoSQL databases as allowing for flexible schema. And although some confuse data schemas with data models, the two perform different functions.
A data schema tells a database engine how data in the database is organized, while a data model is more conceptual and describes data and the relationships between data. Regardless of this confusion about how flexible schema can impact data modeling, just like with a relational database, developers need to model data in NoSQL databases. Although depending on the type of NoSQL database, this data model will be either simple (key-value) or more sophisticated (document).
Not taming semi-structured data
Most data today is unstructured or semi-structured but, as with error number three, that doesn’t mean your data model has to follow those same formats. While it can be convenient to put off thinking about how to structure your data when ingesting, it will almost inevitably hurt you. You can’t avoid semi-structured data, but the way to deal with it is to apply rigor in the data model rather than taking a hands-off approach when retrieving data.
Not planning data model evolution
Given the amount of work required to map out your data model, it can be tempting to assume that your work is done when you’ve created the data model. This is not the case, noted Anna Geller of the Warden: “Creating data assets is an ongoing process,” she said, because “as your analytical needs change over time, the scheme will also have to be adjusted”.
One way to facilitate the evolution of data models, she continued, is to “split and decouple data transformations [to] make the whole process easier to build, debug, and maintain long term.
Rigid mapping of UI to fields and values of your data
As Tailwind Labs partner Steve Schoger has Underline, “Don’t be afraid to ‘think outside the database'”. He goes on to explain that you don’t necessarily have to map your UI directly to every data field and every value. This error tends to come from being fixated on your data model rather than the underlying information architecture. The problem also means that you’re probably presenting the data in a more intuitive way to the audience of the application than a one-to-one mapping of the underlying data model.
Incorrect or different granularity levels
In analytics, granularity refers to the level of detail we can see. In a SaaS company, we might, for example, want to see the level of consumption of our service per day, per hour or per minute. Getting the right amount of granularity in a data model is important because if it’s too granular, you can end up with all sorts of useless data, making it difficult to decipher and sort.
But with too little granularity, you may be missing enough detail to tease out important details or trends. Now add the possibility that your granularity is focused on daily numbers, but the company wants you to determine the difference between peak and off-peak consumption. At this point, you would be dealing with mixed granularity and end up confusing users. Determining exactly your data use cases for internal and external users is an important first step in deciding how much granular detail your model needs.
Inconsistent or non-existent naming patterns
Rather than inventing a single naming convention, it’s better to follow standard approaches with data models. If tables, for example, lack consistent logic in the way they are named, the data model becomes very difficult to follow. It may seem smart to come up with obscure naming conventions that relatively few people will immediately understand, but it will inevitably lead to confusion later on, especially if new people are onboarded to work with these models.
Do not separate the concept of keys from indexes
In a database, keys and indexes perform different functions. As Bert Scalzo explained, “Keys enforce business rules – it’s a logical concept. Indexes speed up access to databases – it’s a purely physical concept.
Because many confuse the two, they end up not implementing candidate keys and thus shrink indexes; in the process, they also slow down performance. Scalzo went on to offer this advice: “Implement the fewest indexes [that] can effectively support all keys.
Starting Data Modeling Too Late
If the data model is the model for describing an application’s data and how that data interacts, it doesn’t make sense to start building the application until a big data modeler has fully defined the data model. Yet that is precisely what many developers do.
Understanding the shape and structure of data is critical to application performance and ultimately user experience. This should be the first consideration and brings us back to mistake number one: not viewing quality data models as an asset. Not planning the data model is essentially planning for failure (and planning to do a lot of refactoring later to fix the mistakes).
Disclosure: I work for MongoDB, but the opinions expressed here are my own.
SEE: The best data modeling tools (TechRepublic)