This article is to introduce you a really super easy data exploration tool from Python. You have to just install and import this simple module. It gets integrated with any python IDE you are using. And D-tale is ready with all its data exploration features and a very easy user interface.
Data exploration is a very basic yet very important step of data analysis. You need to understand the data the relationship between the variables before you dive deep into advance analysis. Basic data exploration techniques like visual interpretation, calculating the summary statistics, identifying the outliers, mathematical operations on variables etc. are very effective to gain a quick idea about your data.
These data exploration steps are necessary for any data science projects. Even in machine learning and deep learning projects also we filter our data through these data exploration techniques. And they involve writing a few lines of Python code which are usually repetitive in nature.
This is a complete mechanical task and writing reusable code helps a bit. But again you need code manipulation to some extent every time new data set in use. Every time we write “dataset.head()” wishing that had there been a user interface to do these basic tasks, it can be a big time saver.
So here comes D-tale to rescue us. D-tale is actually a lightweight web client developed over the Pandas data structure. It provides an easy user interface to perform several data exploration tasks without the need of writing any code.
What is D-tale?
D-tale is an open-source solution developed by SAS to Python conversion for visualizing your data using Pandas data frame. It encapsulates all the coding for implementing Pandas data structure operations in the backend so that you don’t need to bother about coding the same thing repeatedly.
SAS insight function earlier which eventually transformed into D-tale with a wrapper written in pearl script. D-tale also gets easily integrated with python terminals and ipython notebooks. You just need to install it in Python and then import it.
You can refer this link for further knowledge about this tool. It is from the developers and also contains some useful resources. Here is a good video resource by the developer of D-tale Andrew Schonfeld from FlaskCon 2020.
I am using it for some time and really liked it. It has made some of my regular repetitive data exploration tasks very easy. It saves lots of my time.
Here I will discuss in detail how can it be installed and start to use with screenshots from my computer while I have installed it.
The installation part is also a breeze. Within seconds you can install it and start to use. Just open your Anaconda Powershell Prompt from windows start. See the image below.
Now type the following command in Anaconda Powershell Prompt to install the D-tale in your windows.
# To install D-tale conda install dtale -c conda-forge pip install -U dtale
Below is the screenshot of my computer’s anaconda shell. Both the Conda and Pip command has been executed. As you know that both of these commands function in a similar way. The only difference is pip installs from the Python package index whereas Conda installs packages from Anaconda repository.
Now you are ready to use the D-tale. Open your Jupyter notebook and execute the following codes.
# To import Pandas import pandas as pd # To install D-tale import dtale
Example data set
The example dataset I have used here for demonstration purpose has been downloaded from kaggle.com. The data collected by “National Institute of Diabetes and Digestive and Kidney Diseases” contains vital parameters of diabetes patients belong to Pima Indian heritage.
Here is a glimpse of the first ten rows of the data set. I have imported the data set in CSV format using the usual pd.read_csv() command. And to show the table use dtale.show().
The data set has independent variables as several physiological parameters of a diabetes patient. The dependent variable is if the patient is suffering from diabetes or not. Here the dependent column contains binary variable 1 indicating the person is suffering from diabetes and 0 he is not a patient of diabetes.
Data exploration with D-tale
Now you have the Jupyter notebook displaying the data. And you can click the arrow button on the top left-hand corner to open all the data manipulation tools. See the below image the left pan has several options like describe, build column, correlations, charts etc.
Descriptive statistics for variables
This is to describe variables showing some descriptive or summary statistics. It does the same task as df.describe() of pandas does. D-tale enables you to get the same result without writing any code, just click the “describe” from the left panel.
In the below image you can show the descriptive statistics of the variable “Pregnancies” has been displayed along with a box-whisker plot. Select any other variable from the left menu and the summary statistics of that particular variable will be displayed.
Calculation of correlation among the variables
Here is an example of calculating the correlations among the variables. You can see that just on clicking the correlation D-tale has created a beautiful correlation table among all the variables. The depth of colours is very useful to indicate the correlated variables on a glimpse. Like here the dark shade indicated higher correlation.
Chart creation is a very basic yet very useful data exploration technique. Using D-tale you can create different types of charts like Bar, Line, Scatter, Pie, Heatmap etc. D-tale through its interface has done away writing of several lines of codes. Below is an example of creating a scatter plot with this tool.
As you select the chart option from the left panel of D-tale, a new tab in the browser will open with the following options. You need to select the variables you want to create a scatter plot. There are options to choose X and Y variables. You can also use group by the option to select if there is any categorical variable.
If you desire, also select any of the aggregation options available there, or simply go for the scatter option above. A scatter plot between the two variables will be displayed. Below is the scatter plot with all the options for your reference.
The scatter plot contains some tool options as shown in the above image. These tools help you to further dig into the plot’s details. You can investigate any particular point of interest with options like box select or lasso select, change axes setting, to see the data on hover etc.
Other very helpful options to use the chart created here are available as shown in the figure. Like option to pop up the chart in another tab and compare to the another, a link just copy and share, exporting the chart in static HTML which can be attached with e-mail, data export in CSV and finally allows you to copy the Python code to make further customization.
Highlighting the outliers
Another very good and useful feature of D-tale is to highlight the variable wise outliers in a single click. See the below image.
Creating a pie chart
Here is an example of a Pie chart created with D-tale. Pie chart is also a very popular chart format to show proportional distribution of different components. Creating pie chart follows the same simple process. Just choose pie chart and then select variables you want to display.
Another popular chart format is bar plot. It reveals many important properties of the variables and relation between them. For example here I have created a bar plot between the mean blood pressure against age of different individual. It is an very effective way to know how the blood pressure varies with the age of person. Which is otherwise not easily identifiable from the raw data.
Creating the bar plot is the same and very easy. Here also different aggregation options available. For example I have chosen mean to display the blood pressure along the Y axis.
It is a very useful option D-tale provides. You can get the code for the particular data exploration technique used by you. Now you can make any desired change or simply understand how to write a standard code for learning purpose.
Here is the code snippet below used for creating the bar plot above.
This article presents a very helpful data exploration tool which can make your regular data analysis task much easier and quicker. It is a light application and uses Pandas data manipulation libraries underneath.
Its simple and neat user interface gets easily integrated with any IDE you use. Data analysts need a quick idea about the data in hand so that they can plant their advance analytical tasks. So, D-tale can be a tool of choice for them saving considerable time required for writing regular repetitive lines of code.
Wish the article will be helpful. I tried to provide as much information as possible so that you can straightway install and apply it. Do share your experience, how do you find it, is it helpful? Let me know your opinion, any further queries or doubt by commenting below.