Step by Step of installing SingleNode Yarn on Ubuntu

HI Guys,
I am back and today’s lets talk about install SingleNode Yarn on Ubuntu. Some you guys may heard of Hadoop but may not know about Yarn. Here’s the explanation from Apache hadoop Yarn, “The fundamental idea of MRv2, Yarn—yet another resource negotiater,  is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.”

Environment versions

OS-Ubuntu 15.04



1. Download VMware here and install it.

VMware is a virtual machine allow you to install ubuntu Linux operation system. There is an other similar product, Virtual Box from Oracle, which may cause some issue when loading ubuntu iso. So let’s use VMware here and feel free to use Virtual box if it works for your machine.

2. Go to BIOS and turn on Virtualization-Technology

It is a technology supporting virtual machines, which means no matter you are using VMware or Virtual Box, you have to turn it on.

Here’s an example


3. Download the Ubuntu Image.

You don’t need to afraid of AMD suffix if you are using intell CPU, read this

Ubuntu download

either 14.0.2 or 15.04 is decent. I am using 15.04

4. Load the .iso file

Here’s a good tutorial about load image on VMware

5. Ethernet connection issues

if you were seeing this image, especially the Ethernet connection illustrated in red rectangular, your connection is fine. Virtual machine is using Ethernet to connect to your laptop. Moreover, if the Ubuntu shows disconnected, however your main machine (Win-7 for example) has internet connection, just simply—-“disconnect”—-“enable networking” in your Ubuntu. This re-connection process sometimes may solve the issue.


let’s start the hadoop! 

actually here are some very good tutorial on youtube and they are installing hadoop 2.7.0 which is already a Yarn version.

I prefer the following one, which is simple and good enough for us.

I have copied an modified Mr.Chaalpritam’s procedure from the second video from his blog

1. update packages in ubuntu

sudo apt-get update

2.  install java development kit (jdk)

sudo apt-get install openjdk-8-jdk

3. Check Java version

java -version

4. Install remote access tool ssh

sudo apt-get install ssh

5. Install rsync

The following command is to install rsync, which is used for keeping copies of a file on two computer systems the same. But essentially ubuntu already has the latest version of it.

sudo apt-get install rsync

6. Generate ssh-DSA public key pair.
Some background information at here. And if you read the link, you will understand your private key is saved as id_dsa

ssh-keygen -t dsa -P ‘ ‘ -f ~/.ssh/id_dsa

7. Save private key id_dsa as authorized_keys

cat ~/.ssh/ >> ~/.ssh/authorized_keys

8. Download hadoop 2.7.0

wget -c

9. Unzip the binary file

sudo tar -zxvf hadoop-2.7.0.tar.gz

10. Move hadoop-2.7.0 to your local user

sudo mv hadoop-2.7.0  /usr/local/hadoop

11. Generate symbolic link to point to the jdk

update-alternatives –config java

12. Set up linux environment

sudo nano ~/.bashrc

put the following lines at the end of the bashrc.

#Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

13. Activate bashrc

source ~/.bashrc

14. Go back to hadoop directory

cd /usr/local/hadoop/etc/hadoop

15. Configure java home in


sudo nano

change export JAVA_HOME=${JAVA_HOME} as the following line
export JAVA_HOME=”/usr/lib/jvm/java-8-openjdk-amd64″

16. Configure core-site.xml
It is the Site-specific configuration for a given hadoop installation

sudo nano core-site.xml


17. Modify YARN configuration options

sudo nano yarn-site.xml

<value> org.apache.hadoop.mapred.ShuffleHandler</value>

18. Change MapReduce configuration options

sudo cp mapred-site.xml.template mapred-site.xml

sudo nano mapred-site.xml


19. Adjust hdfs-site.xml
set hdfs configuration(namenode, datanode, and replications).

we are using standalone mode so set replication as 1 here.

sudo nano hdfs-site.xml


20.  Make directory of namenode and datanode

mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

21. Change ownership of hadoop folder

Allowing your local machine read/write to hdfs

replace the $USER as the username of your local machine

sudo chown $USER:$USER -R /usr/local/hadoop

22. Format the namenode

hdfs namenode -format

23. Start all components.

24. Check the components


If you see all the component shows up like the following image, congratulations!


Usually SecondaryNameNode, ResourceManager and NameNode are sitting in the master machine

DataNode and NodeManager deployed in the slave machine. However, since we are setting a standalone cluster, all the components are sitting in one machine.

25. Let’s run the classical WordCount MapReduce job in Yarn.
25.1 Download and unzip wordcount sample input file. (refer to this)



25.2 go to hadoop directory

cd /usr/local/hadoop

25.3 create user file folder

bin/hdfs dfs -mkdir /user

If you met this warning


Don worry, you might read this

25.4 build your user directory

bin/hdfs dfs -mkdir /user/$USERNAME

You can verify by looking at your namenode port 50070. My $USERNAME is robin


25.5 put input file from local machine to HDFS (hadoop file system)

bin/hdfs dfs -put  /home/robin/Downloads/4300.txt inputfile

25.5 Using the already built .jar to run wordcout example.

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount inputfile outputfile


If you are seeing this, congratulations! Our Yarn installation is successful!

25.6 Get the file from hdfs to your local machine

bin/hdfs dfs -get /user/robin/outputfile    /home/robin/Downloads/wordcount_output

25.7 Overview the result


We are seeing wordcount_output folder contains

_SUCCESS—-On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS.

part-r-00000—–The output files are by default named part-x-yyyyy where:

  • x is either ‘m’ or ‘r’, depending on whether the job was a map only job, or reduce
  • yyyyy is the mapper or reducer task number (zero based)

refer to this

Congratulations! We have successfully install Yarn on Ubuntu!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s