Taking Jupyter Notebook for a Spin

Jupyter Notebook has become the de facto tool for carrying out Data Science experiments with Python. In this post, we’ll walk through getting up and running with this tool and provide an introduction to its basic usage.

Installation

Jupyter is written in Python and Python is the language that is generally used for your experiments. However, Jupyter is not limited to Python as you will see later. So, one of the requirements is that Python is installed. There are a number of different ways to install Python and how you install Python can influence how you decide to install Jupyter. For example, you can download and install Python from the Python web site and then use pip to install Jupyter.

Instead, we are going to use a very useful distribution, called anaconda. As well as including Python, Anaconda also installs many useful Python libraries, that will be required for any Data Science project. In addition, Anaconda includes an installation of Jupyter Notebook. So, once it is installed, we are all set. You can find the Anaconda distribution by going to this link and downloading the installer for your preferred platform. Windows, macOS and Linux are supported.

There are installers available for both Python 2 and Python 3 but we recommend that you choose version 3. Anaconda supports virtual environments and it is, in fact possible to add Python 2 as a virtual environment later but we will assume Python 3 is being used in this post.

Once you’ve downloaded the installer, execute it and follow the on screen instructions. On MacOS, you might see the error message shown below. If you do, just re-select Install for me only and it should clear the error.

macOS installation error

Starting Jupyter

Open up a command shell in your operating system. Create a new directory (e.g. dsc_test) and cd to that directory to get started. Now start Jupyter Notebook:-

jupyter notebook

Jupyter Notebook runs a web server and the User Interface is served as a web page. The Jupyter Notebook web page should open automatically in a browser window, when run locally. If not, note the URL, which will be printed at the command prompt and paste it in a browser window. By default, Jupyter Notebook runs on localhost:8888. You must leave the server running while you are working with Notebook. Do not close the command prompt window. Open another command prompt if you need to interact with a shell.

You will see the Jupyter user interface and a message that tells you The Notebook List is empty. Click on the New button on the right side of the window and select Python 3 to create a new notebook. Your new notebook will open and you will find yourself in the Jupyter Notebook editor. The name of the new notebook defaults to Untitled, which can be seen in the top line of the window. Click on it to rename your notebook (e.g. First Notebook). Your notebook will be saved in a file with extension .ipynb in the directory where you started the server (e.g. "~/dsc_test/First Notebook.ipynb").

Notebook Layout

Notebooks are multi-modal, multi-media documents. They are composed of cells, where each cell contains content of a particular type. Some cells contain text, such as documentation and commentary. Other cells contain executable programs or program snippets. Quite often, the output of the program is contained within the cell, for example a table or graph of results. Python is the language most commonly used for programs, although other languages and scripts are supported.

The menu and toolbar allow us to do the most common tasks, such as creating or deleting a cell, executing the code in a cell, copying and pasting, etc. There are keyboard shortcuts for most commands and you can see these by clicking the keyboard icon in the toolbar.

Cells are edited by clicking on them and typing. The currently selected cell is outlined. A green outline indicates we are in edit mode and a blue outline indicates not. A drop-down in the toolbar indicates what type of content the cell contains. It defaults to code for new cells. In this post, we will only use the code and markdown types.

Documenting the Notebook

When working with notebooks, it is typical to mix code with the documentation that describes what is being done at each point. Markdown is the most common format used for this. If you are not familiar with Markdown, Github provides a useful introduction on their site github markdown reference.

Let’s add a title to our notebook. Select the first cell on the page and set it’s type to markdown, using the drop-down list in the toolbar. Now, click inside the cell to select edit mode and type in some h1 text.

# Jupyter Notebook Example 1

Note that the editor interprets the markdown and gives you some sense of how it will look in a completed notebook. Clicking on Run or hitting Shift-Enter will show the actual formatted markdown. Double clicking on the cell will return to edit mode.

Let’s add another cell below the first one and add some descriptive text about the purpose of this particular notebook.

Adding Code

Now we’re going to add some Python code. It is beyond the scope of this post to cover coding in Python and we would refer you to the web site if you’d like to learn Python.

The sample code produces some output in the form of a plot of some data. For the purposes of demonstration, we are going to generate some random data and plot it. Paste the code below into a new cell of type code.

#
# Simple Scatter Plot
#

import numpy as np
import matplotlib.pyplot as plt
# Get 20 random x,y co-ordinates, 20 random colours and 20 random areas
n = 20
x = np.random.rand(n)
y = np.random.rand(n)
colours = np.random.rand(n)
# area is based on radii between 0 and 15
area = np.pi * (15 * np.random.rand(n))**2

plt.scatter(x, y, s = area, c = colours, alpha = 0.5)
plt.show()

The code uses two libraries that are commonly used in data science, numpy and matplotlib. These libraries will have already been installed if you’ve been following the installation using Anaconda. If you’ve installed Python by some other method, you may need to install these with pip. Once you’ve pasted in the code, go ahead and run it. You should see a scatter graph of random data.

At this point, your new Jupyter Notebook should look something like this:-

Notebook

Finishing Up

So that’s it, you’ve created your first notebook with Jupyter Notebook. Notebook updates are automatically saved in the .ipynb file but you can make sure by click on the save icon in the toolbar. To exit, click on Close and Halt in the File menu. Once you’ve closed the Notebook, you can go ahead and close the browser window and return to the command prompt running the server and exit that by hitting Ctrl-C. You can return to you notebook at any time by running the server from the same directory where your notebook is saved and re-running jupyter notebook and then selecting the notebook from the list of notebooks.

What’s Next

Hopefully, this brief introduction has whetted your appetite and you’ll want to explore further. We encourage you to explore Python at their site and some of the available data science libraries such as numpy and matplotlib.

We also encourage you to explore some of the other things you can do in notebooks. For example, you can add HTML to cells, allowing you to embed images or Video. You can insert other programming or scripting languages, such as Javascript or Shell Scripts. You can use MathJax in markdown cells to add mathematical or chemical equations to your documentation.

Because notebooks are stored in a simple JSON file format, they can easily be shared, with many open source notebooks freely shared on the web. We encourage you to explore resources such as Jupyter Gallery. Most of the examples are in Github. To download them you can clone the relevant Github repository or simply grab an individual file. Because Github automatically renders notebooks, you’ll need to view the file in Raw mode and copy its contents into a text editor. Once your file is downloaded and saved with a .ipyn extension, simply copy it into the folder where you run Jupyter. It will now show up in your list of notebooks and you’ll be able to open it for editing and running.

Hopefully, you’ll develop some interesting notebooks of your own that you can share with the community.