Jupyter Notebook has become the de facto tool for carrying out Data Science experiments with Python. In this post, we'll walk through getting up and running with this tool and provide an introduction to its basic usage.
Jupyter is written in Python and Python is the language that is generally used
for your experiments. However, Jupyter is not limited to Python as you will
see later. So, one of the requirements is that Python is installed. There are a
number of different ways to install Python and how you install Python can
influence how you decide to install Jupyter. For example, you can download
and install Python from the Python web site and then use
pip to install
Instead, we are going to use a very useful distribution, called
anaconda. As well as including Python, Anaconda also installs many useful
Python libraries, that will be required for any Data Science project. In
addition, Anaconda includes an installation of Jupyter Notebook. So, once it
is installed, we are all set. You can find the Anaconda distribution by going
to this link and downloading the
installer for your preferred platform. Windows, macOS and Linux are supported.
There are installers available for both Python 2 and Python 3 but we recommend that you choose version 3. Anaconda supports virtual environments and it is, in fact possible to add Python 2 as a virtual environment later but we will assume Python 3 is being used in this post.
Once you've downloaded the installer, execute it and follow the on screen instructions. On MacOS, you might see the error message shown below. If you do, just re-select Install for me only and it should clear the error.
Open up a command shell in your operating system. Create a new directory
(e.g. dsc_test) and
cd to that directory to get started. Now start Jupyter
Jupyter Notebook runs a web server and the User Interface is served as a web page. The Jupyter Notebook web page should open automatically in a browser window, when run locally. If not, note the URL, which will be printed at the command prompt and paste it in a browser window. By default, Jupyter Notebook runs on localhost:8888. You must leave the server running while you are working with Notebook. Do not close the command prompt window. Open another command prompt if you need to interact with a shell.
You will see the Jupyter user interface and a message that tells you
The Notebook List is empty. Click on the
New button on the right side of
the window and select
Python 3 to create a new notebook. Your new notebook
will open and you will find yourself in the Jupyter Notebook editor. The name
of the new notebook defaults to
Untitled, which can be seen in the top line
of the window. Click on it to rename your notebook (e.g.
Your notebook will be saved in a file with extension
.ipynb in the directory
where you started the server (e.g.
Notebooks are multi-modal, multi-media documents. They are composed of cells, where each cell contains content of a particular type. Some cells contain text, such as documentation and commentary. Other cells contain executable programs or program snippets. Quite often, the output of the program is contained within the cell, for example a table or graph of results. Python is the language most commonly used for programs, although other languages and scripts are supported.
The menu and toolbar allow us to do the most common tasks, such as creating or deleting a cell, executing the code in a cell, copying and pasting, etc. There are keyboard shortcuts for most commands and you can see these by clicking the keyboard icon in the toolbar.
Cells are edited by clicking on them and typing. The currently selected cell
is outlined. A green outline indicates we are in edit mode and a blue outline
indicates not. A drop-down in the toolbar indicates what type of content the
cell contains. It defaults to
code for new cells. In this post, we will
only use the code and markdown types.
Documenting the Notebook
When working with notebooks, it is typical to mix code with the documentation that describes what is being done at each point. Markdown is the most common format used for this. If you are not familiar with Markdown, Github provides a useful introduction on their site github markdown reference.
Let's add a title to our notebook. Select the first cell on the page and set
it's type to markdown, using the drop-down list in the toolbar. Now, click
inside the cell to select edit mode and type in some
# Jupyter Notebook Example 1
Note that the editor interprets the markdown and gives you some sense of how it
will look in a completed notebook. Clicking on
Run or hitting
will show the actual formatted markdown. Double clicking on the cell will
return to edit mode.
Let's add another cell below the first one and add some descriptive text about the purpose of this particular notebook.
Now we're going to add some Python code. It is beyond the scope of this post to cover coding in Python and we would refer you to the web site if you'd like to learn Python.
The sample code produces some output in the form of a plot of some data. For
the purposes of demonstration, we are going to generate some random data and
plot it. Paste the code below into a new cell of type
# # Simple Scatter Plot # import numpy as np import matplotlib.pyplot as plt # Get 20 random x,y co-ordinates, 20 random colours and 20 random areas n = 20 x = np.random.rand(n) y = np.random.rand(n) colours = np.random.rand(n) # area is based on radii between 0 and 15 area = np.pi * (15 * np.random.rand(n))**2 plt.scatter(x, y, s = area, c = colours, alpha = 0.5) plt.show()
The code uses two libraries that are commonly used in data science,
matplotlib. These libraries will have already been installed if you've been
following the installation using Anaconda. If you've installed Python by some
other method, you may need to install these with
pip. Once you've pasted in
the code, go ahead and
run it. You should see a scatter graph of random data.
At this point, your new Jupyter Notebook should look something like this:-
So that's it, you've created your first notebook with Jupyter Notebook.
Notebook updates are automatically saved in the
.ipynb file but you can make
sure by click on the save icon in the toolbar. To exit, click on
Close and Halt
File menu. Once you've closed the Notebook, you can go ahead and close
the browser window and return to the command prompt running the server and exit
that by hitting
Ctrl-C. You can return to you notebook at any time by running
the server from the same directory where your notebook is saved and re-running
jupyter notebook and then selecting the notebook from the list of notebooks.
Hopefully, this brief introduction has whetted your appetite and you'll want to explore further. We encourage you to explore Python at their site and some of the available data science libraries such as numpy and matplotlib.
Because notebooks are stored in a simple JSON file format, they can easily be
shared, with many open source notebooks freely shared on the web. We encourage
you to explore resources such as
Most of the examples are in Github. To download them you can clone the relevant
Github repository or simply grab an individual file. Because Github
automatically renders notebooks, you'll need to view the file in
and copy its contents into a text editor. Once your file is downloaded
and saved with a
.ipyn extension, simply copy it into the folder where you
run Jupyter. It will now show up in your list of notebooks and you'll be able
to open it for editing and running.
Hopefully, you'll develop some interesting notebooks of your own that you can share with the community.