data science Archives - Dibyendu Deb

This article discussed two very easy fixes for this problem faced by almost all Jupyter notebook users while doing data science projects. I have faced this issue myself while working folder of Jupyter notebook, the most preferred IDE of data scientists.

Although at the start it did not seem a big problem, as you start using Jupyter on a daily basis, you want it should start from your directory of choice. It helps you being organized, all your data science files at one place.

While I searched the internet thoroughly and got many suggestions, very few of them were really helpful. And it took quite a lot of my time to figure out the process which is really helpful. I thought to write it down as a blog so that in future I don’t have to waste time again to fix the issue and so my readers.

So, without any further ado, lets jump to the solutions…

NB: Being a non-native English speaker, I always take extra care to proofread my articles with Grammarly. It is the best grammar and spellchecker available online. Read here my review of using Grammarly for more than two years.

Try Grammarly here

The easiest way: using anaconda powershell

The first and the quickest solution is to run your Jupyter notebook right from the Anaconda PowerShell. You need to just change the directory to the desired one there and run Jupyter notebook. It is that simple. See the below image

Running the jupyter notebook with anaconda powershell

Here you can see that the default working folder of Jupyter notebook was c:\user\Dibyendu as in the PowerShell. I have changed the directory to E: and simply run the command jupyter notebook. Consequently, PowerShell has run the Jupyter notebook with the start folder as mentioned.

This is very effective and changes the start folder for jupyter notebook very easily. But the problem is that this change is temporary and you have to go through this process every time you open the notebook.

To fix this problem one solution can be to create a batch file with these commands and just run this batch file while you need to work in jupyter notebook.

Creating shortcut with target as the working folder of Jupyter notebook

This solution is my favourite and I personally follow this procedure. Here the steps are explained with screenshots from my system.

You need to first locate the jupyter notebook app in your computer by right clicking the application in your menu as shown in the below image.

Now navigate to the file location and select the application file like the below image. Copy the file in your desktop or any location you want a shortcut of the application.

Location the jupyter notebook application in your computer

Now right-click the application and go to the shortcut tab. The target file you can see here is mentioned as “%USERPROFILE%”, which is indeed the default installation folder for jupyter notebook. That’s why it is the default start folder for the notebook.

Property dialog for Jupyter notebook app

Now you need to replace the “%USERPROFILE%” part with the exact location of your desired directory.

In the above image you can see that I have replaced the “%USERPROFILE%” with the data science folder which contains all of my data science projects. Now just click Apply and then OK. Now to open jupyter notebook click the shortcut and jupyter will open with your mentioned directory as the start folder as in the below image.

jupyter notebook with the data science folder as the start folder

So, the problem is solved. You can use this trick and create multiple shortcuts with different folders as the start folder of jupyter notebook.

This article contains a brief discussion on python functions. In any programming language, be it Python, R, Scala or anything else, functions play a very important role. Data science projects require some repetitive tasks to perform every time to filter the raw data and while data preprocessing. In this case, functions are the best friend of a data scientist. They save them from doing the same task every time by simply calling the relevant function.

Functions, both inbuilt and user-defined are a very basic yet critical component in any programming language and python is no exception. And here is a brief idea about them, so that you can start using the benefit they provide.

Why use Python for data science? Python is the most favourite language among data enthusiasts. One of the reason is Python is very easy to understand and code with compare to any other language.

Besides, there are lots of libraries from third parties which make data science tasks a lot easier. Libraries like Pandas, NumPy, Scikit-Learn, Matplotlib, seaborn all contain numerous modules catering almost all kind of function you wish to perform in data science. Libraries like Tensorflow, Keras are specially designed for deep learning applications.

Please read these articles about the use of Python in Machine Learning and Deep Learning to know more about the use of Python in data science.

If you are a beginner or you have some basic ideas about coding in other programming languages, this article will help you get into python functions as well as creating a new one. I will discuss here some important Python functions, writing your own functions for repetitive tasks, handling Pandas data structure with easy examples.

Like other objects of python like integer, string and other data types function are also considered as the first-class citizen in python. They can be dynamically created, destroyed, defined in other functions, passed as arguments in other functions, returned as values etc.

Particularly if we consider the field of data science, we need to perform several mathematical operations and pass on calculated values further. So, the role of python functions is very crucial in data science to perform any particular repetitive calculation, as nested function, to be used as argument of another function etc.

So without much ado, lets jump into details of it and some really interesting use of function with examples.

Use of Python functions for data science

Using functions is of utmost importance not only in Python but in any programming language. Be it inbuilt function or user-defined functions you should have a clear idea how to use them. Functions are very powerful to make your coding well structured and increases its usability.

Some functions are there in Python, we just need to call these built in functions to perform the assigned tasks. Most of the basic tasks we need to do frequently in data operations are well covered in these functions. To start with I will discuss some of these important built in python functions.

Built in python functions

Let’s start with some important inbuilt functions of Python. These are already included and makes your coding experience much smoother. The only condition is you have to aware of them and frequently use them. The first function we will discuss is help().

So take help()

Python functions take care of most of the tasks we want to perform through coding. But the common question comes into any beginner’s mind is how will he/she know about all these functions?. The answer is to take help.

The help function is there in Python to tell you every detail about any functions you need to know to use them. You just need to mention the function with help. See the example below.

# Using help
help(print)

Here I want to know about the print function, so I mentioned it within the help. Now see the help describes everything you need to know to apply the function. The function header with optional arguments you need to pass, their role. It also contains a brief description of the function, what it does in English.

Interestingly you can know all about the help() function using the help function itself :). It is great to see the output. Please type to see it yourself.

# Using help() for help
help(help)

Again here help has produced all necessary details about itself. It says that help() function is actually a wrapper around pydoc.help that provides a helpful message for the user when he types “help” in the Python interactive prompt.

List() function

A list is a collection of objects of same or different data types. It has very frequent use in storing data and later used for operations in data science. See the below code to create a list with different data types.

# Defining the list item
list_example=["Python", 10, [1,2], {4,5,6}]
# Printing the data type
print(type(list_example))
# Printing the list
print(list_example)
# Using append function to add items
list_example.append(["new item added"])
print(list_example)

Above code creates a list with a string, a digit, array and set. The type function to print the type of data. And at last, the append() function used to add an extra item in the list. Let’s see the output.

So, the data type is list. All the list items are printed. And an item is appended in the list with append() function. Note this function as it is very handy while performing data analysis. You can also create a complete list from scratch only using the append() function, see the below example.

sorted() function

This is also an important function we need frequently while doing numeric computation. For example a very basic use of sort() is while calculating the median of a sample data. To find out the median, we need to sort the data first. By default the function sort() arrange the data in ascending order, but you can do the reverse also by using the reverse argument. See the example below.

# Example of sorted function
list_new=[5,2,1,6,7,4,9]
# Sorting and printing the list
print("Sorting in an ascending order:",sorted(list_new))
# Soritng the list in descending order and printing
print("Sorting in an descending order:",sorted(list_new,reverse=True))

And the output of the above code is as below:

round() function

This function is useful to give you numbers with desired decimal places. The required decimal place is to be passed as an argument. These decimal number has some unique properties. See the below example and try to guess what will be the output, it is really interesting.

# Example of round() function
print(round(37234.154))
print(round(37234.154,2))
print(round(37234.154,1))
print(round(37234.154,-2))
print(round(37234.154,-3))

Can you guess the output. See the second argument can be negative also!. Lets see the output and then explain what the function does to a number.

When the round() function has no argument, it simply discards any decimal digits. It keeps up to two decimals if the argument is 2 and one decimal when it is 1. Now when the second argument is -2 or -3, it simply returns the closest integer with multiple of 100 or 1000.

If you are surprised where on the earth such a feature is useful; then let me tell you that there are some occasions like mentioning a big amount (money, distance, population etc) where we don’t need an exact figure, rather a rounded close number can do the job. In such cases to make the figure easier to remember, round() function with a negative argument is used.

Now there are a lot more in-built functions, we will touch them in other articles. Here as an example I have covered few of them. Lets move on to the next section of user-defined function. It gives you freedom to create your own functions.

User defined functions

After inbuilt functions, here we will learn about user defined functions. If you are learning Python as your first programming language, then I should tell you that functions in any programming language are the most effective as well as an interesting part.

Any coder’s expertise depends on how skilled he is in creating functions to automate the repetitive tasks. Instead of writing code for the same tasks again and again a skilled programmer writes some function for those tasks and just call them when the need arises.

Below is an example how can you create a function of adding two numbers.

# An example of user defined function
def add (x,y):
  ''' This is a function to add two numbers'''
  total=x+y
  print("The sum of x and y is:", total)

The above is an example of creating a function which will add two numbers and then print the output. Let’s call the function to add two numbers and see the result.

I have called the function, passed two digits as arguments and the user-defined function printed the result of adding the numbers. Now anytime I will need to add two numbers I can just call this function instead of writing those few lines again and again.

Now if we want to use help for this function, what will help return? Lets see

See help() function has returned the text I have put within three quoted strings. It is called the docstring. A docstring allows us to describe the use of the function. It is very helpful as complex programmes require a lot of user-defined functions. The function name should indicate its use but many a time it may not enough. In such cases, a brief docstring is very helpful to quickly remind you about the function.

Optional arguments in user-defined function

Sometimes providing an optional argument with the default argument save us writing additional lines. See the following example:

# Defining functions
def hi(Hello="World"):
  print ("Hello",Hello)

hi()
hi("Python")
hi()

Can you guess the output of the following function calls? Just for fun try without seeing the below output. While trying notice that once the function has been called with an optional argument.

Here is the output.

See for the first call of the function, it has printed the default argument. But when we passed “python” as an optional argument, it has overridden the default argument. Again in the third case without any optional argument, the default gets printed. You should try any other combinations come in your mind, it is complete fun and also your concept will get clear.

Nested functions

Nested functions are when you define functions inside another function. This is also one of the very important python basics for data science. Below is an example of a very simple nested function. Try it yourself to check the output.

# Example of nested functions
def outer_function(msg):
  # This is the outer function
  def inner_function():
    print(msg)
  # Calling the inner function
  inner_function()
# Calling the outer function
outer_function("Hello world")

Functions passed as argument of another function

Functions can also be passed as an argument of another function. It may sound a little confusing at first. But it is really a very powerful property among the python basics utilities for data science. First, take an example to discuss it. See the below piece of code to check the property.

# Calling functions in a function
def add(x):
  return 5+x

def call(fn, arg):
  return (fn(arg))

def call_twice(fn, arg):
  return(fn (fn(arg)))
print(
    call(add, 5),
    call_twice(add, 5),
    sep="\n"
)

Again you try to understand the logic and guess the output. Copy the code and make little changes to see the change or error it produces. The output I got from this code as below.

Did you guess it right? See here we have created three functions namely add(), call() and call_twice(). And then passed the add() function into other two functions. The call() function has returned the add function with argument 5 so the output is 10.

In a similar fashion, the call_twice() function has returned 15 due to the fact that it has a return statement with a nested function and argument combination. I know it is confusing to some extent. This is because the logic has not come from a purpose. When you will create such functions to really solve some problem the concept will get clear. So, do some practice with the code given here.

How to change the default working folder of Jupyter notebook in windows PC?