Setup ipython notebook on PySpark

HI Guys,

Today, we will gonna go through how to setup ipython notebook on PySpark

Env version

OS-Ubuntu 15.04



1. Please install Spark based on my previous post and remember to add the following two files to ~/.bashrc and source it.

export SPARK_HOME=/usr/local/spark


2. run pyspark at /usr/local/spark


3. Install ipython through Synaptic Packge Manager. Here’s a good tutorial

4. Some packages might be missing

sudo pip install tornado –upgrade

sudo pip install jsonschema

5. Open ipython

ipython notebook

6.Here’s a good tutorial talking about setting up ipython with Spark

7. create spark profile


8. Creat a file in ~/.ipython/profile_spark/startup/ and add the following

import os

import sys

# Configure the environment

if ‘SPARK_HOME’ not in os.environ:

     os.environ[‘SPARK_HOME’] = ‘/usr/local/spark’

# Create a variable for our root path

SPARK_HOME = os.environ[‘SPARK_HOME’]

# Add the PySpark/py4j to the Python Path

sys.path.insert(0, os.path.join(SPARK_HOME, “python”, “build”))

sys.path.insert(0, os.path.join(SPARK_HOME, “python”))

The functionality of this is to open PySpark directly without interactive shell shows up in the terminal.

9. Start the ipython again

ipython notebook –profile spark

10. add ipython env varaible to bashrc to source it.

export  IPYTHON_OPTS=“notebook –pylab inline”

Without the above statement, we may encounter py4j.java_gateway cannot find issue.

11. open one a python2 file and try these command line, press Alt + Enter to execute


from pyspark import SparkContext

sc = SparkContext(“local”, “pyspark”)

12. Let try typical Pi example in Python (refer to this)

The final result should like the following image with Pi value displayed


Congratulations! Now you know how to use IPython notebook with Spark!


