CONTACT US: 778-373-4422 |

Photo by Mike from Pexels

Why You Need to Avoid Dinosaurs and Learn Data Normalization

This is a post about data normalization, and why you should understand how it works. If you have never heard this term before, don't worry - most people haven't. If you have, this is a good place to refresh your knowledge.

You see, almost all data that you use and see in your daily life, came from a database somewhere. It may have taken several steps and transformations before it got into your hands, but it started in a database.

Take your phone bill, for example. The data started in your phone company's database, was queried to just get your data for last month, joined together with all your related data (address, usage etc), and then presented to you in a nice pdf or printed format for you to read. Seems simple, right?

Well, yes.. and no. You see, when the phone company took your data, created your account, and tracked your information, they split the data into logical tables that represented objects in the real world. For example, people were put into one table, their address into another, their phone plan into another and so on.

So, if you just looked at one table on its own, you would never understand it! For example, the table that has your phone plan on it might only be a table with columns of ID numbers! Why?

There are many reasons why relational databases have this characteristic:

- It makes the data much more efficient because no data item is duplicated. Imagine the crazy amount of space it would take if they wrote your name on each phone call's record! Instead, they probably just use your device number on that call record. The same ID is used in a device table which has your personal account or ID number on it, which can be used to pull up your single personal record on it, with your name in all of its glory.

- If they need to update your name, they only need to do it in one place!

- It makes the data storage more efficient, because there is almost no empty space or fields in the database.

When we split up our data in tables to represent objects in the real world, we call it data normalization.

For creators, it is an art form. For professionals and budding data analysts, it is a critical bit of knowledge and skill that you should master and practice regularly, until you understand it fully. You need an understanding on this concept as you study SQL and queries.

Click here to get started reading on data normalization. I encourage you to pay attention to the examples.

The first time you deep-dive into a database of any complexity (think manufacturing, insurance) to get data to transform for analytical purposes, you'll understand why you need to understand normalization. If you're an analyst with poor understanding of normalization, these databases will eat you alive. Think Jurassic Park.

Avoid dinosaurs. Don't wait around for someone else to get your data for you. Learn normalization and SQL, get read-only access where you can from your system administrators, and practice your SQL on real databases. Learn how each one was designed. You will see that some are clunky and inefficient, but workable; others, you may find are elegant or even beautiful in their design. You will find that the latter are a joy to run analytics against.

3 Reasons Why You Should Learn Data Techniques

Find 3 big reasons why you should learn data techniques if you are a professional or business person.

Read article

About our workshop

The Professional Problem, 3 Things They Need, and 1 They Don't Have

Why do people constantly run into problems with their data, even when they are highly trained in their area?

Read article

About our workshop

Why Finding Uniqueness is the Key: 6 Methods to Help

Find out why keys and uniqueness are so important for data transformation and data analysis.

Read article

About our workshop

When to Excel and When Not To: Why Specialists Don't Use Excel for Data Transformations

Start your journey in data transformation, and learn why specialists don't use Excel to transform raw data for analysis.

Read article

About our workshop