3.1 Jupyter notebooks
This section will give you a brief overview of what a Jupyter notebook is and how to use them, but if you would like a more detailed understanding, please read the official documentation. Jupyter Labs has now been released as a newer version of notebooks, giving you a full IDE (integrated development environment) and more control over the notebooks and working environment. This guide will not explore these features, as we are more interested in how to use the notebook.
Note: throughout this section you can substitute the phrase “Jupyter notebooks” with “Jupyter Labs” if you would prefer to have a full IDE allowing you more control over the system.
Jupyter notebooks are run on Python, though additional things can be downloaded to allow you to use your programming language of choice. For an example of what you can do with Jupyter notebooks, click here, and here for a collection of neat and applied notebooks.
3.1.1 Installing Jupyter notebooks
Mac’s come shipped with a version of Python, but it is most likely outdated, and it doesn’t contain everything we want. In order to get running, I strongly recommend downloading the Anaconda distribution over other distributions, or even just directly from Python’s website. The instructions below will be enough to get you up and running with Jupyter notebooks in your language of choice.
- Download the full Anaconda distribution i.e. not miniconda
- Be sure to choose
Python 3.x
, notPython 2.x
, as it’s the newer version and is forwards-compatible. - Be sure to only install for one user, not the whole system
- Be sure to select
Add Anaconda to my PATH environment variable
under Advanced Options - Be sure to install Anaconda to the drive where your data lives. To do this you will need to manually edit the installation path within the Anaconda installer wizard, otherwise it will automatically end up in the
C:\
drive
- Be sure to choose
- Use
kernels
to connect your programming language of choice with python and the notebook
3.1.2 Kernels
A kernel is program that allows the notebook to connect with, and run, your code. Jupyter comes with the Python code pre-installed, but if you want to use a different language, you will need to download a specific kernel.
Below, the installation instructions are described for common languages used in epidemiology. To see a full list of kernels available for Jupyter, along with the appropriate documentation and installation instructions, follow this link.
3.1.2.1 Installing the Stata kernel
The instructions for installing the stata_kernel
are based from the original documentation here. It should work with Stata 12
(I have tested it). If these instructions do not work for you, it may be that there has been an update to the kernel, at which point, please refer to the original documentation linked above.
Open a command prompt (Windows) / terminal (linux/mac) and type/copy-paste the following commands, pressing enter after each line
pip install stata_kernel
python -m stata_kernel.install
Windows-specific steps
In order to let stata_kernel
talk to Stata, you need to link the Stata Automation library:
- In the installation directory (most likely
C:\Program Files (x86)\Stata12
or similar), right-click on the Stata executable, for example,StataSE.exe
(this will just show asStataSE
, but is listed as an application). ChooseCreate Shortcut
. Placing it on the Desktop is fine. - Right-click on the newly created
Shortcut to StataSE.exe
, chooseProperties
, and append/Register
to the end of the Target field. So if the target is currently"C:\Program Files\Stata12\StataSE.exe"
, change it to"C:\Program Files\Stata12\StataSE.exe" /Register
(note the space before/
). Click OK. - Right-click on the updated
Shortcut to StataSE.exe
; choose Run as administrator.
3.1.2.2 Installing the SAS kernel
*This has not yet been tested here. The instructions for installing the sas_kernel
are based from the original documentation here*
Open a command prompt (Windows) / terminal (linux/mac) and type/copy-paste the following commands, pressing enter after each line. First we need to install a dependency called saspy
that helps the kernel connect SAS
to python
pip install saspy
pip install sas_kernel
You should now see something like this.
Available kernels:
python3 /home/sas/anaconda3/lib/python3.5/site-packages/ipykernel/resources
sas /home/sas/.local/share/jupyter/kernels/sas
Now verify that the SAS Executable is correct
- find the
sascfg.py
file – it is currently located in the install location (see above)[install location]/site-packages/saspy/sascfg.py
. To querypip
for the location of the file, typepip show saspy
. Failing that, this command will search the OS for the file location:find / -name sascfg.py
- edit the file with the correct path the SAS executable and include any options you wish it include in the SAS invocation. See examples in this file
3.1.2.3 Connecting R with Jupyter
If you are hoping to make nice documents and reproducible work using R, I would highly recommend that you use the R Markdown or R Notebook through RStudio application instead. However, if you would prefer Jupyter, then please read on.
It is possible to download an R kernel, much like for Stata and SAS, but it can be a bit fickle, so a different approach is described below. It is important to note that with this method you are installing a fresh version of R, so you will not have access to the packages you have previously installed - you will need to reinstall them in this R environment, which could be done within a Jupyter notebook.
Open a command prompt (Windows) / terminal (Linux/Mac) and enter the following commands:
conda install r-essentials r-igraph
Rscript -e 'install.packages("languageserver")'
If you would rather install an R kernel than a fresh install of R within the Anaconda distribution, you can follow the instructions here. The advantage of this is that it allows the notebook to access previously installed packages as they are not running off a fresh version of R.