Joining tables is an important feature to combine information from several tables. We can join tables in Power BI desktop with a very nifty merging feature provided with it..
In this article, I am going to demonstrate how to join tables in Power BI desktop with some practical data. The data are all open source. You can collect them from the links provided and use them to practice.
Combining data
When it comes to combining data in tables, it can be done in two ways. One is you may need to increase the rows of a table with new data. This type of data combination is known as “Appending
“.
Whereas when you add columns with new information with an existing table, it is called “Merging
“.
Power BI provides both of these features under the “Home ” tab (as in the below figure). You need to use them according to your requirement.
This particular article is on joining tables. So we are going to discuss the Merge queries option here only with suitable example.
Lets first discuss different kinds of joins. This will be an overview so that you can make the right decision while selecting join types while merging queries. Also will suggest going through the Microsoft documentation page for details.
The data set
The data set I have used for demonstration purpose is on India’s state-wise crop production collected from data. world. And another data set with India’s state-wise rainfall from different years.
Both the data set presents a real-world experience. The data is collected in raw form and refined using data transformation feature of Power BI. You can go through all the data transformation steps here.
The measures used for different calculations are described in this article.
The first table that is the crop production table contains the area under different crops in hectare (ha) respective to different districts of different Indian states. Whereas the second table i.e. the rainfall table has the rainfall record in mm respective to different districts of Indian states.
Replacing/removing errors
While importing the data from the CSV file or from the web itself, you may face some missing values. When it gets imported in Power BI, the missing values are shown as “Error
“.
Now you can not proceed without handling these errors. One way is to replace these errors with proper values. It may be mean or median of rest of the values of particular variable or you can simple remove the rows with missing values.
In the below screenshot you can see how I have replaced the error with a suitable descriptive value. The “Remove errors
” option will simply remove the corresponding rows.
Now suppose we want information on both the cropped area as well as rainfall of particular state and districts. Then we need to join the tables with proper conditions.
Different kinds of join tables in power bi desktop
Now Power BI will ask you about the particular join type you are interested to apply. And in order to use the correct join, we should have the idea about different joins. So here is a vivid description with examples of different joins.
For demonstration purpose I have picked few rows from both the tables and created two tables.
Left outer join
It is a prevalent joining process and the default one in Power BI, where the left or 1st table (as in the figure below) retains all of its rows and matching rows from the right or 2nd table. As the text from “Merge queries
” option of Power BI displays “All from first, matching from second
“.
Suppose we want to know the rainfall of some particular districts with a definite amount of cropped area. So how will you join the two tables to fetch the information? Here is left outer join to help you. See the below figure
In the above figure the particular information we are interested are colored as green cells. The corresponding information has the same color code in the second table.
So, as per the rule of the left outer join, all the rows (yellow coloured) of the left table and the green-coloured rows of the right table has been joined in the new table.
The Venn diagram at the bottom right corner describes the joining process in colour codes. According to this diagram, the matching rows are called the interaction of the two sets. So here the new table consists of the complete left set and the interaction of the sets.
Right outer join
Now suppose we are interested to know the information exactly opposite to what we have fetched earlier. This time we want to know the cropped area of districts having particular rainfall. So, here the right outer join
we need to perform.
Full outer join
When we are in need of information of all the states and districts with their cropped area and rainfall, we should go for a full outer join.
As the name suggests, this join will return all the records including the matching ones.
Here is the final output from the full outer join. The rows contain both the rainfall and cropped area information including the matching rows (in green colour).
Inner join
Again if we need information on the rainfall of those districts with cropped area data as we have nothing to do with rainfall data for those districts not having any cropped area information. In this situation, the inner join produces the desired result.
See the below figure where we have applied inner join
on both the tables.
See in the above image, only the matching rows are kept and all other rows have been excluded.
Left anti join
Suppose for the sake of data analysis, we need only those states and respective districts for which we don’t have any rainfall data, how can we fetch the required information?
Not to worry here the particular join type we need to apply is left anti join
. Which will keep all the rows from the left table removing all those which have a match in the right table. See the below example.
Here in the above image we just have the required information from the left table of state wise cropped area and everything else have been excluded.
Right anti join
Now suppose we need information exactly opposite the earlier one i.e. we need only the rainfall data of all those districts for which we don’t have any information on the cropped area. The join we will apply here is right anti join
.
See the below demonstration with the two tables.
In the above figure, the crop production tables have been joined with the rainfall table using the right anti join. You can see that all the rows of rainfall table have been retained excluding the matched rows from the crop production table.
Join tables in Power BI desktop
Now lets see a practical application of joining tables in Power BI desktop.
When you click the merge query option of Power BI desktop, you will see the first table as the active table (the selected able while clicking the merge query option).
You need to select a column from the table with unique values and can act as a key. Then a drop-down will allow to chose from the available tables. Again you have to select another column from the second table using which both the tables can be joined.
Then you need to select the particular join option from the drop-down as shown below. All the join options are available in the same sequence as we discussed above.
For example here in the above image, you can see I have selected column “State_name
” from the table “India_statewise_crop_production
” and the column “SUBDIVISION
” from the second table “rainfall_India
” as the key columns for joining.
Now our purpose was to keep all the crop production information with matching rainfall data of the corresponding districts. So, I opted for “Left outer join
” which is also the default option. You can choose any of them as per your requirement.
Now simply click and see the operation in process.
Selecting rows from the joined table
Now the newly created column will appear in the table. You can see in the below image that the whole table is displayed as a column. So you need to select the particular column and deselect the option “Use original column as prefix
“.
See in the above image, by default the complete table appears as row element in the first table. You need to select the particular columns of the table you want here.
Use of advanced editor
Use of advanced editor of Power BI allows more flexibility to change the M script itself yo make desired changes in the output. The M language is at the core of every Power BI application.
See in the below image, the option “Advanced Editor
” under the “Home” tab shows the M script behind the particular application. You can tweak them to change the join type or column names etc.
Final words
So, here is all about on join tables in Power BI desktop. I have tried to cover all the basics of each kind of join with examples. So that the you can understand the logic behind the joins. And apply the right kind for your need.
Joining tables and fetching the exact information is the core of data modelling. In order to obtain a good visual representation of the Power BI report, a good data modelling is must.
I hope this article will help you to get a good grip on this very fundamental operation of Power BI. If you have queries or doubt, please comment below. I would like to answer them.