Step by Step of installing Apache Spark on Apache Hadoop

HI guys,

This time, I am going to install Apache Spark on our existing Apache Hadoop 2.7.0.

Env versions

OS-Ubuntu 15.04



1. Install Scala (refer to this)
sudo apt-get remove scala-library scala
sudo dpkg -i scala-2.11.7.deb
sudo apt-get update
sudo apt-get install scala
2.Install Spark
tar -zxvf spark-1.4.0-bin-hadoop2.6.tgz
mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark
3 get hadoop version
hadoop version
It should show 2.7.0
4 add spark home
sudo vi ~/.bashrc
export SPARK_HOME=/usr/local/spark
source ~/.bashrc
5 Spark Version
Since spark-1.4.0-bin-hadoop2.6.tgz is an built version for hadoop 2.6.0 and later, it is also usable for hadoop 2.7.0. Thus, we don’t bother to re-build by sbt or maven tools, which are indeed complicated. If you download the source code from Apache spark org, and build with command
build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package
There are lots of build tool dependency crashed.
So, no bother about building spark.
6 let’s verify our installation
7. launch spark shell (refer to this)
It means spark shell is running
8.  Test spark shell
scala:> sc.parallelize(1 to 100).count()
background info
sc—spark context, Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
parallelize—Distribute a local Scala collection to form an RDD.
count—Return the number of elements in the dataset.
scala:> exit
9 Let’s try another typical example
bin/spark-submit –class org.apache.spark.examples.SparkPi –master local[*] lib/spark-example* 10
The last variable 10 s the argument for the main of the application. For here is the slice number used for calculation Pi

Congratulations! We have finishing Spark installation and next we can start using this powerfull tool to perform data analysis and many other fun stuffs.


  1. AMK · October 17, 2015

    Do we have to repeat this process on all the nodes including namenode and datanodes.


    • cyrobin · November 24, 2015

      Most of the process you have to repeat.


  2. Biswo · February 26, 2017

    I followed the steps suggested to install spark-2.1.0-bin-hadoop2.7.tgz.But am not getting hadoop version
    It should show 2.7.0(Step 3 ).
    I am getting below error.
    /usr/local/spark$ hadoop version
    hadoop: command not found


  3. ghandzhipeng · August 7, 2017

    This is not building Spark on top of an existing hadoop cluster. Essentially you need to add the HADOOP_CONF_DIR in


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s