January 2021 - Dibyendu Deb

How to create data model relationships in Power BI

March 24, 2021January 25, 2021 by Dibyendu Deb

Data model relationships are the core of Power BI. The report preparation and visualization becomes very easy if data modelling is done well. So here in this article, we will discuss data modelling relationship with a simple example.

A common misconception among Power BI users is that Power BI is all about visualization. But that is not the case. Visualization is only the icing on the cake. Whereas the data modeling part is the cake itself.

Interested in Machine Learning, Deep Learning, Artificial Intelligence? visit here.

You should give 75% of your time in data transformation and modeling. For data visualization rest 25% is enough. A good data model will enable you to dig into the data insight and to know the relationship between all the variables.

You can get a good detailed discussion on data modeling in Power BI here. The document is prepared by Microsoft itself. So no point to discuss the same thing here again. My target will be here to present you a small practical example of data modeling.

Use of multiple narrow tables

I will encourage you to use multiple narrow tables instead of a big flat table with several columns. And here lies the beauty of data modeling and relationships. Power BI can work with big wide table. But it is a kind of constrain while scaling the model.

With time your project is going to increase in size. The tables will increase in rows. Chances are there that table number will also increase as well as the complexity of your model. Fetching required information from a big table with lots of columns is time-consuming. It occupies a good amount of computer memory too.

In this scenario use of multiple tables with few columns can make your life easier.

Data model relationships in Power BI: from the Business Intelligence point of view

Business Intelligence or BI is of very critical importance in industries. Almost all businesses have a well-developed database to store all kind of transactions. Over time such databases get huge and extracting information becomes complex.

Extracting information from such big data requires experts in the field. Business houses appoint software developers to extract data, transform and load in a well-maintained data warehouse.

Creating a data warehouse is essential because no industry wants to disturb its always busy database. It may lead to jeopardies the real-time data transaction process. That’s why software developers keep their required information in a data warehouse of their own.

Now for a person without software development background it is tough to handle a data warehouse. Here comes the power of Power BI. It makes the whole ETL (Extraction, Transform and Load) process of data a cake walk for anyone with zero knowledge in software development.

A practical example of data model relationships in Power BI

Here is a simple example of how you can model relationships in Power BI. The data I used here is crop production data of different states of India. This data is accessible at Data.world. This is a real world data and without any garbage.

Data world already has provided the data in a refined manner. So, very little left for us on the part of data cleaning. And we can straight way jump into the modeling process.

Below is a glimpse of the data file imported into Power BI. This is the screenshot of the data transformation window of Power BI.

The columns are

“State_Name” containing different states of India.
“District_Name” containing state-wise different districts name
“Crop_Year” has the year of production
“Season” has the particular season of crop production
“Crop” is the particular crop name
“Area_ha” this column contains a total area of under the crop in Hectare
“Production_tonnes” has a total production of the season in Tonnes

Now this is not a very big table. Still we can reduce its size and create a custom key to further improve the data model.

Creating a custom key

Using a column to create a custom key needs selecting a variable to create a separate table. Let’s select the “District_Name” to create a new table.

Now, why should I select this column? the answer is easy this column has some redundant values. The same way the “State_Name” also does not have unique values so either of them can be selected.

We will create a separate table with this column where they will be unique with a far lesser number of rows. In this way, these tables will help us data munging, data wrangling and whatever fancy term you want to use 🙂

Trimming the column

The first step is to trim the column to remove any unwanted white spaces from the column. Just select the column in question, right-click and select transform to get the trimming option. See the below image.

Removing the blank values

The second most important step is to replace any blank values. Blank values are a real problem in data analytics. Blank values can lead to spurious result as tools generally not capable to handle them. So, we should filter them out carefully in data transformation stage itself.

We can easily inspect the column quality and distribution from the view tab as shown in the below image.

Checking column quality and distribution from the view tab

So we can conclude that this particular variable is already free from any white space. See that data quality option shows “Empty as 0%”

But if your data has blank values then it is a convenient option to check it. For example, if we check the “Crop_production” variable we can see that the empty as <1%. That means it has some blank rows.

Checking for rows with "null" values — Checking for rows with “null” values

To check for the blank values we need to click the filter icon at the left of the column name. Uncheck the “Select All” option and check only the null values as shown in the above figure. Now Empty fields are 100%.

In order to replace the null values, you have to right click on the column name and select the “Replace Values…” option. In the new window replace the null with 0 and click OK. Now you can check the Empty has become 0% and all null values have been replaced by 0.

As no other column shows any value as Empty, so we have made our data completely free from any blank values. And we can now proceed for next step.

Creating new query and converting to table

In the next step we will create new query using the “Add as New Query” option. See the below image for the steps you should follow to create new query and then converting to a new separate table.

Creating new query and converting to new table

Removing duplicate entries

The newly created column has several duplicate entries. To make it a table with only unique entries, we need to remove the duplicates. As given in the below image, click the option “Remove Duplicates“. Now you have a column with all different districts of India.

Creating table with "District_Name" — Creating table with “District_Name”

Creating IDs for the column

It is very helpful creating a custom ID for the values in newly created column. Since the IDs are created by you, you have full control over it. Creating ID has a simple option in Power BI. See the following image to understand the steps.

The “Add Column” tab has options for both custom column and default column IDs starting from 0 or1. You can choose any of them. For example, here I have selected column ID starting from 0.

Merging the tables

The next step is to merge queries. As we have completed creating custom keys, now we need to merge the designated tables. See the following images to complete the tasks. Go to the “Home” tab and select “Merge Queries” option.

Now a “Merge” window will open and you need to verify that both the tables have exact same number of rows match. Here you can see that in both tables we have 246091 number of matching rows.

Now the newly created column will appear in the table. You can see in the below image that the whole table is displayed as a column. So you need to select the particular column and deselect the option “Use original column as prefix“.

Selecting particular column from merged table

Data model relationships in Power BI: task completed

With the last step, we have completed creating a data model relationship between the two tables. You can verify that in the “Model” view of Power BI. It displays a “One to Many” relationship between the tables.

Final words

So, here is a simple example of data model relationships in Power BI with practical data. It also displays how can you create custom keys to join or merge into different tables. It is an effective way of creating a relationship as you have full control over the keys.

I hope this article will help you to start with data model relationships in Power BI with ease. In case of any doubt you can comment below. Also suggest if there any interesting topic regarding data modeling you have.

With this article, I am planning to start a series of articles over the Power BI application to solve real-world problems. So, keep visiting the website for new and interesting articles.

An introduction to Power BI for data visualization

March 24, 2021January 17, 2021 by Dibyendu Deb

Microsoft’s Power BI is a very popular and most frequently used data visualization business intelligence tool. This is an introductory article on Power BI which will be followed by a series of practical problem-solving articles. So it will be “learning by doing”.

Data exploration and visualization is the most basic yet very important step in data analytics. It reveals important information to understand the data and relationships between the variables. Without a thorough understanding of these variables, we can not approach for in-depth analysis of the data.

Even in data science tasks for advance application of different Machine Learning algorithms, deep learning and artificial intelligence applications, Exploratory Data Analysis (EDA) plays a very important role.

Need for data exploration tools

Although there are programming tools like R and Python which are the most capable of advance data science jobs, using them for data exploration every time is tedious.

For example, we can use Python for EDA but it takes time. In such cases tools like Microsoft Power BI, Tableau etc. comes very handy.

They prepare exhaustive analytics on the data with some beautiful, professional-looking charts, interactive dashboard and real-time reports really quickly. Another added advantage is collaboration with your team by publishing the report through their cloud service.

For industry leaders, it does not matter if you have created a report writing a thousand lines of codes or used a tool like Power BI. They need just a good analytics and insights from the data. So, besides tools like Python and R a data analyst should have a working knowledge of these tools too.

But why Microsoft Power BI for data visualization?

Gartner Inc. renowned research and advisory firm for providing information, advice and tools for different industries across the business sectors in its report on Magic Quadrant for Analytics and Business Intelligence Platforms has ranked Power BI at the top for the consecutive 13 years.

See the below figure where the different Analytics and Business Intelligence Platforms are compared with respect to two different parameters. On the X-axis it shows the completeness of vision and on the Y-axis it has the ability to execute. We can see that with respect to both these parameters, Microsoft finds its place at the top.

Analytics and Business Intelligence Platforms comparison

Increasing popularity: Google trends

Microsoft Power BI has a close competitor as Tableau analytical software for Business Intelligence. Tableau has been a popular choice of business houses since long. Compare to Tableau, Power BI came very late for public use.

Tableau made its presence in the field of business intelligence way back in 2004. Whereas Microsoft Power BI was designed by Ron George only 10 years back in 2010 and made available for public download on July,11 of 2011.

Soon after its availability, it rapidly became popular mainly because of its more affordability, user friendly interface and most importantly its compatibility with hugely popular widely used Microsoft office products.

Below is an example (Google trends) how Power BI, since its inception, gradually overtake Tableau in popularity and became the most Google-searched term in business intelligence tool in recent past.

Installation of Power BI for data visualization

I will discuss the free version of Power BI which is Power BI desktop here and you can install it from the Microsoft store itself and the process is well documented here. Just search Power BI in the windows store and click download. It will start downloading a file sized nearly 500 MB.

Below is the screenshot of my computer when the installation was complete and Power BI was ready to launch.

When it is opened for the first time, you will get to see a window like below.

First screen after Power BI gets installed

Just click “get started” and you are ready to work with the Power BI. The first step is to bring the data in Power BI. Here you will get an idea why Power BI is so popular. It has the capacity to load from any kind of data source you can name.

Power query

Power query is the most powerful feature of Power BI. It helps us clean the messy raw data. In real-world the data, we have to deal is never clean. Sometimes the data is arranged in a fashion which is only human friendly.

A human-friendly data means lots of tags and long strings within the data file for the ease of human understanding. But a machine does not have anything to do with these texts and arrangements. Rather they create problems to get read by a computer.

Power query helps to clean the data and also transform to create desired information with the available data. Below is an image showing how you can create a custom column from the columns available.

Query editor

So in this data transformation step, we need to convert a human-friendly data to a machine-friendly data. Don’t just click on “Load”, spend some time in data cleaning before loading it. And here comes the most powerful part of Power BI “Query editor”.

In the below image a query editor has been used to modify the data. According to your transform action, the query is built by Power BI by default. The query used the popular “M language”. You can make necessary changes in the query to get the desired result.

The syntax of the query is very easy and in case any problem a quick Google search is always helpful.

You need to master the use of Power BI query editor in order to dig deep into the analysis. It is a common misconception that Power BI is only about visualization. Many Power BI users directly jump into the visualization without proper data transformation and data modelling.

But it is a wrong practice. Query editor is refered as the “Kitchen” of Power BI. The better you will able to use it the more information you can extract from the data.

Use narrow tables

A good practice for data modelling is to use narrow tables. A wider table will create a problem when you attempt some complex calculations. Always break down a wider table identifying variables which are redundant and place them in a separate table.

Using tables with lesser columns instead of wide tables

As your table size grows with time i.e. number of rows increases. And there are chances that you need to add more tables in data modelling. In both cases narrow tables will help you to create relationships with ease and fetch required data.

Create as many tables as required for analysis. And later link them using primary key and foreign keys. We will discuss this process in a separate article dedicated to data modelling.

Power query is the built-in data tool for both M.S Excel & Power BI. It helps us to transform the data as we wish in order to build a meaningful model between the dimension tables and fact tables.

Star schema: Fact tables & dimension tables

Fact tables and dimension tables are components of dimensional modelling with a star schema.

Power BI for data visualization: Star schema

Fact tables

These tables are generally long tables containing numeric values.

Dimension table

Now dimension tables are those table which generally contains information related to Who, What, Where, When and How about the fact tables. These are short tables compare to fact tables and contains mainly string variables.

Query settings

It is the particular part of Power BI data transformation window where all the magics happen. It automatically keeps track of all the data transformation steps you perform. You can go to any of these steps any time and make any kind of changes you want also replicate them further.

It kind of helps you to time travel to see the changes, how the data was cleaned and what are the data transformation steps.

DAX: Data Analytics Expressions

These are functions and operators available in Power BI which enable us to perform analytical services, building formulas, data modelling and information extraction (source: Microsoft document).

DAX functions are almost 250 in numbers used in different data transformation applications.

Measures Vs calculated columns

Implicit measures

Implicit measures are already defined measures in Power BI.

Explicit measures

These are measures defined by the Power BI user and has many benefits over the implicit measures. Explicit measures give the Power BI user more freedom, reusability and flexible applicability through connected reports.

Both of these options yield the same result and may appear as with the same functionalities. But it is always a good practice to use Measures over calculated columns for the reusability and flexibility of Measures.

Measures also have a plus side when you want to publish your analytical report with Excel or other applications. If you open any Power BI published file with excel, you can find the Measures you created but not Calculated columns. Hence you can reuse only the Measures not the calculated columns.

Measures make your application more light and memory friendly

Measures are being calculated dynamically and do not consume any space in the file. Hence use of measures instead of calculated columns keeps the Power BI application file very light in size even lighter than an excel file.

Use of Explicit measures also makes your data model execute very fast. You should always keep the scalability factor in mind while building models. Your model should be flexible so that it remains equally efficient with the increasing data size.

Hybrid measures

These measures are to calculate some statistics using variables from different tables. Once such measures are created they become a part of the model and can be used everywhere.

Unpivot columns

This is also a beautiful feature of Power BI. With a click it can make the rows converted to columns and vise versa in a table.

Model & relationship view

As soon as you complete your transformation/editing the data file the files are saved as Power BI files. Power BI by default identifies the relationship between the files and displays them in the Model and relationship view. You can change the model relationship and build your own.

Below is an example of such modelling. A typical star schema consists of one to many relationships from dimension tables to fact tables. In this image, I have one many to many relationship also. But keep in mind you should have a clear idea of why you are creating it.

This modelling part is so important. An efficient data modelling leads to easy and effecitive data analysis. A basic idea of data warehousing can be helpful here.

Once you are done with data transformation and model building, click “Close and apply” to exit the query editor.

Here we begin…

Power BI is such an interesting and powerful analytical platform and I wish to devote a complete blog category on this. As I am learning it the interest continues to grow.

It may appear overwhelming at first. With more practice, things become easier. Even you will become hungry for a more complex problem with messier data.

This is just starting off learning Power BI and here is only a glimpse of some important features. I will be covering more advanced applications in coming articles. I will update you about different analytical techniques as I learn them.

Let me know how you find this article and which topics you want to be covered in coming blogs by commenting below. Your suggestion will help me to improve the quality of articles as well as to introduce new interesting topics.