Apache Storm - Installation and Configuration Tutorial

Installation and Configuration

Welcome to the third chapter of the Apache Storm tutorial (part of the Apache Storm course). This chapter will help you get familiarized the steps for installation and configuration of Storm. Now, let us start with exploring the objectives of this lesson.

Objectives

By the end of this lesson, you will be able to:

  • Choose proper hardware for Storm installation

  • Install Storm on an Ubuntu system

  • Configure Storm

  • Run Storm on an Ubuntu system

  • Describe the steps to set up a multi-node Storm cluster.

Moving on, let us discuss the Storm versions.

Steps involved in Installation and Configuration of Apache Storm

Following are the steps involved in installation and configuration of apache storm.

  1. Choosing the Storm Version

  2. OS Selection

  3. Machine Selection

  4. Preparation for Installation

  5. Downloading Kafka

  6. Downloading Storm

  7. Installing Kafka

Wish to have in-depth knowledge of Apache Storm? Check out our Course Preview!

Choosing the Storm Version

Storm has multiple versions. You need first to choose the latest stable version for installation. Zookeeper is a prerequisite to run Storm on the system. So, you start by installing zookeeper.

Since Zookeeper comes as a part of Kafka and you will learn about Kafka interface to Strom later in the lesson, let us start by installing Kafka before installing Storm. Version 0.8.2 is the current stable version of Kafka and the current stable version of Storm is Version 0.9.5.

The stable version of Kafka can be downloaded from: https://www.apache.org/dyn/closer.cgi?path=/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz

The stable version of Storm can be downloaded from: http://apache.mirrors.pair.com/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz

Next, let us look at selecting the right Operating System.

OS selection

You can choose any of the Linux operating systems for installation.

Ubuntu 12.04 or later – Installed on our virtual machine is a good choice. You can also choose other Linux systems like Red Hat Enterprise Linux (referred to as RHEL) or CentOS (Free version of RHEL) or Debian systems.

Let’s move on to understand how to select an appropriate machine for Storm installation

Machine Selection

Storm needs a good memory and adequate processing power. Below are the recommended machine configurations.

For development systems,

  • Minimum of 2GB RAM.

  • 1 CPU for Storm.

  • 1 TB hard disk.

For production systems,

  • Minimum 16GB RAM/

  • Up to 32GB of RAM per machine (recommended)

  • At least 6-Core CPUs (recommended)

  • Processors which are 2GHz or more.

  • 4x2TB hard disks.

  • 1 GB Ethernet.

Next, let us look at how to prepare the machine for installation.

Preparing for Installation

Following are the prerequisite software for installing Storm.

Java JRE 1.7 or higher Oracle JRE recommended but works with Open JRE as well Zookeeper, which can be installed from Kafka repository

Now, let us look at the steps to download the software.

Download Kafka

Kafka can be downloaded directly from the Apache Kafka website:

wget http://mirrors.advancedhosters.com/apache/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz

You may choose a different mirror based on your location by checking: https://www.apache.org/dyn/closer.cgi?path=/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz

Do this on each machine where Zookeeper has to be installed. The file with .tgz or .tar.gz extension is called tarball. Tarball is a compressed tar archive on Linux.

Next, let us learn how to download Storm.

Download Storm

Storm can be downloaded directly from the Apache Storm website:

wget http://apache.mirrors.pair.com/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz

Do this on each machine where Storm has to be installed.

Now, we will learn how to install Kafka.

Install Kafka Demo 01

After the download, the archives have to be unzipped and moved to the proper location.

Step 1

Unzip the package using tar utility:

tar –xzf Kafka_2.9.1-0.8.2.1.tgz

Step 2

Move to proper directory:

sudo mv Kafka_2.9.1-0.8.2.1 /usr/local/Kafka

Note that sudo may ask for the password.

Now that Kafka is installed, let us install Storm.

Install Storm

You need to follow the same steps to download Storm download:

Step 1

Unzip the package using tar utility:

tar -xzf apache-storm-0.9.5.tar.gz

Step 2

Move to proper directory:

sudo mv apache-storm-0.9.5.tar.gz /usr/local/storm

Next, let us set up a path for Kafka and Storm.

Set up Path for Kafka and Storm

Step 1

Edit the .bashrc file in the home directory and add Kafka directory to the path Access the home directory by using the cd command as below: cd

Step 2

Edit the .bashrc file using the vi command as below: vi .bashrc In vi, the below-mentioned lines are added at the end of the file. i command is used to go to the insert mode in vi and escape command to get out of the insert mode in this manner:

export KAFKA_PREFIX=/usr/local/Kafka export PATH=$PATH:$KAFKA_PREFIX/bin export STORM_PREFIX=/usr/local/storm export PATH=$PATH:$STORM_PREFIX/bin

Step 3

Now, get out of the insert mode using the escape command and save the file with :wq

Note that all the above commands are case sensitive, so you need to type exactly as shown. Restart bash for changes to take effect as shown below: exec bash This will set up the path to include the Kafka and Storm directory.

Moving on, let us see how to configure memory settings.

Configuring Low Memory Settings

Some development systems have low memory, so by default heap memory settings will not work on them.

Below changes are required for a development cluster with low memory:

Step 1

Change the directory to bin directory of kafka installation.

cd /usr/local/Kafka/bin

Step 2

Next, edit the zookeeper-server-start.sh file using vi editor. You can use i to enter the insert mode in vi and escape to get out of the insert mode. (Escape key is normally located at the top left corner of the keyboard) vi zookeeper-server-start.sh.

In the above file, replace the line

export KAFKA_HEAP_OPTS="-Xmx512M -Xms512M" with export KAFKA_HEAP_OPTS="-Xmx64M -Xms64M"

Press escape and save the file with :wq.

Next, edit the Kafka-server-start.sh file using vi.Kafka-server-start.sh.

In the above file, replace the line

export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" with export KAFKA_HEAP_OPTS="-Xmx128M -Xms128M"

Press escape and save the file with :wq.

Now, let us look at how to configure zookeeper for Storm.

Configuring Zookeeper for Storm

Since Storm uses zookeeper for distributed coordination, you need to configure zookeeper to work with Kafka.

Modify the zookeeper.properties file in the Kafka configuration directory a shown below:

cd /usr/local/Kafka/config vi zookeeper.properties

Check the file and add the following lines if already not present: initLimit=5 syncLimit=2 maxClientCnxns=0 server.1=localhost:2888:3888

Press Escape and save the file with :wq.

Then, exit the editor. Use the below command to create a myid file for zookeeper: echo 1 > /tmp/myid sudo cp /tmp/myid /tmp/zookeeper/myid

Next, let us learn how to configure Kafka.

Configuring Kafka

The changes given below are required for Kafka configuration.

cd /usr/local/Kafka/config vi server.properties

Replace the line given below:

broker.id=0 with broker.id=1.

Check that the default port is set to 9092: port=9092.

Check that zookeeper is set to connect at port 2081: zookeeper.connect=localhost:2181.

If you have multiple zookeeper instances, you can specify them as mentioned above, separated by commas.

Now, you will learn to modify some more Kafka properties.

Few more changes are to be done on the server.properties file.

Add the following two lines at the end of the file:

queued.max.requests=1000 auto.create.topics.enable=false

The last line ensures that the topics are explicitly created before creating a message for the topic. Press escape to exit insert mode and save the file with :wq

Moving on, you will learn how to start the Kafka server.

Start the Zookeeper and Kafka servers

First, you need to start the zookeeper server with the below command:

sudo nohup

/usr/local/kafka/bin/zookeeper-serverstart.sh

/usr/local/kafka/config/zookeeper.properti

es > /tmp/zk.out 2>/tmp/zk.err &

Enter the Simplilearn password, if asked for a password.

Note that sudo is used so that you have proper permissions.

The & (ampersand) is added at the end so that the process runs in the background.

For background processes, nohup is added in the beginning so that the background process does not end, even if your session is terminated. The standard output from the server is sent to /tmp/zk.out file and the standard error is sent to /tmp/zk.err file with the 2> option.

Next, start the Kafka server with the below command:

sudo nohup /usr/local/kafka/bin/kafkaserver-start.sh

/usr/local/kafka/config/server.properties >

/tmp/kafka.out 2>/tmp/kafka.err &

sudo and nohup are used here in the same way as explained in the previous command.

Next, let us look at creating directories for Storm.

Create Directories for Storm

Create lib directory for Storm:

  • sudo mkdir -p /var/lib/storm

  • sudo chmod 777 /var/lib/storm

As you have learned how to install Storm and create the directories for Storm.

Next, you will learn how to configure Storm.

Configuring Storm

Given here are the changes required for Storm configuration.

Change directory to the Storm configuration directory using the command mentioned below

cd /usr/local/storm/conf

Edit the storm.yaml file using the command: vi storm.yaml.

Replace these lines #storm.zookeeper.servers: # - “server1” with storm.zookeeper.servers: - “localhost”

Specify the address of the nimbus host: nimbus.host: “nimbus1” Change nimbus1 to the IP address of the machine.

Now, you will configure the storm memory parameters.

Configuring Storm Memory Parameters

Continue modifying the same file to specify the memory for Java processes, you can start with 128MB for all the processes. Add the lines mentioned below.

If the lines already exist, modify to change the numbers.

nimbus.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

ui.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"

worker.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true“

Specify the data directory for Storm.

storm.local.dir: "/var/lib/storm"

Press escape and save the storm.yaml file with :wq.

Next, let us look at starting the Storm servers.

Start Storm Servers

Let us start the Storm nimbus and supervisor servers. You will also need to start the Storm UI to monitor through a web interface.

Start Storm nimbus server on master node:

nohup bin/storm nimbus >/tmp/nimbus.out 2>/tmp/nimbus.err &

Start Storm supervisor on each worker node:

nohup bin/storm supervisor > /tmp/supervisor.out 2>/tmp/supervisor.err &

Start the Storm UI for monitoring; you can check this at port 8080 using your favourite browser.

nohup storm ui >/tmp/ui.out 2>/tmp/ui.err &  

Next, let us run a sample Storm program.

Run a Sample Storm Program

Here, you will run a sample program created by Simplilearn that processes the logfile.

cd /tmp

wget simplilearncdn/logfile

wget simplilearncdn/LogProcessTopology.jar

storm jar LogProcessTopology.jar

storm.starter.LogProcessTopology test1

storm list

The command mentioned above will give the following output:

Topology_name

Status

Num_tasks

Num_workers

Uptime_secs

test1

Active

7

1

23

Next, let us check the output of the sample Storm program.

Check the Output

The output of the sample program is in /tmp/stormoutput.txt directory. You can check the content of this file with the command: cat /tmp/stormoutput.txt.

The output will be displayed as shown below:

INFO:1

ERROR:1

WARNING:1

ERROR:2

WARNING:2

INFO:2

ERROR:3

WARNING:3

ERROR:4

WARNING:4

Note that the actual output might be different in your case.

Next, let us check the Storm UI.

Check the UI

You can check the storm processes using Storm UI at port 8080.

Use your browser (Firefox or Chrome) and IP_address:8080, where IP_address will be the IP address of your virtual machine.

The diagram shows the Storm UI from the browser at port 8080. It shows the cluster summary, topology summary, supervisor summary as well as Nimbus server configuration parameters. You can see that the topology test1 is currently running.  

Storm process using UI

Now, you will learn how to stop the Storm topology.

Stop the Storm Topology

You can stop the running storm topology with the help of the following command: storm kill test1 Verify that the topology is not running with storm list.

This will produce the following output:

Topology_name

Status

Num_tasks

Num_workers

Uptime_secs

test1

KILLED

7

1

315

Let us now check the log files.

Looking for more information on Apache Storm? Watch our Course Preview!

Check the Log Files

The log files are created in the folder /user/local/storm/logs

cd /usr/local/storm/logs

This directory will have the following files:

Nimbus.log

Supervisor.log

Worker-pid.log

vi can be used to check the content of the log files.

Setting Up Multi-node Storm Cluster

To set up a multi-node cluster, let us take an example of setting up a 3 node cluster with nodes and IP addresses: node1, node2, and node3

First, install Kafka and Storm on each machine as discussed earlier. That is, download the Kafka and Storm tarballs, unzip the compressed archive and move the expanded directory to /usr/local/Kafka and /usr/local/storm respectively.

This has to be done on each of the three nodes.

Moving on to the second step of setting up multi-node Storm cluster.

Setup zookeeper on each node: 

cd /usr/local/kafka/config

vi zookeeper.properties

Add the following lines if not present already:

initLimit=5

syncLimit=2

maxClientCnxns=0

server.1=node1:2888:3888

server.2=node2:2888:3888

server.3=node3:2888:3888

Press the escape key, save the file with :wq and then exit the editor

Note that node1, node2, node3 are the IP addresses of the 3 servers.

Setup the myid file for Zookeeper

The third step is to Setup the myid file for zookeeper.

The command mentioned below is used to create the myid file for zookeeper:

On node1:

echo 1 > /tmp/myid

sudo cp /tmp/myid /tmp/zookeeper/myid

On node2:

echo 2 > /tmp/myid

sudo cp /tmp/myid /tmp/zookeeper/myid

On node3:

echo 3 > /tmp/myid

sudo cp /tmp/myid /tmp/zookeeper/myid

Note that the content of myid file is different on each server.

Moving on to the fourth step.

The storm broker properties need to be set up for which the changes mentioned below are required for storm configuration on each machine:

cd /usr/local/storm/conf vi storm.yaml

Replace the following lines:

#storm.zookeeper.servers: # - “server1”

with

storm.zookeeper.servers: - “node1” -”node2” -”node3”

Specify the address of the nimbus host.

Please note that you will have Nimbus running on only the master node – node1 in our cluster. nimbus.host: “node1”

Moving on to the fifth step of the set up.

Some more changes are made to storm.yaml file.

For childopts, specify the memory for child opts, you can start with 128M

nimbus.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

ui.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"

worker.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

Specify the data directory for storm: storm.local.dir: "/var/lib/storm"

Press escape and save the storm.yaml file with :wq.

Finally, the last step of the set up is to Start the zookeeper server on each node.

sudo nohup /usr/local/Kafka/bin/zookeeper-server-start.sh /usr/local/Kafka/config/zookeeper.properties > /tmp/zk.out 2>/tmp/zk.err &

Start the Storm Nimbus server on node1:

nohup storm nimbus > /tmp/nimbus.out 2>/tmp/nimbus.err &

Start the Storm supervisor process on each node:

nohup storm supervisor > /tmp/supervisor.out 2>/tmp/supervisor.err &

This completes the setup of the multi-node Storm cluster.

Summary

Here are the key takeaways.

  • Storm has multiple versions and the latest stable version of Storm is 0.9.5.

  • Proper OS and machine configurations should be chosen before starting

  • the installation.

  • Kafka installation is used to install zookeeper.

  • Storm can be installed by downloading the latest tarball.

  • After the installation of zookeeper and Storm, both of them need to be configured.

  • After the configuration of zookeeper and Storm changes, the zookeeper server has to be

  • started before starting Storm.

  • Storm command can be used to submit a topology to Storm.

  • To set up a multi-node Storm cluster, a six-stepped process needs to be followed.

Conclusion

This concludes the lesson: Introduction to the Installation and Configuration of Storm. In the next lesson, you will learn about the Advanced Storm Concepts.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

We use cookies on this site for functional and analytical purposes. By using the site, you agree to be cookied and to our Terms of Use. Find out more

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)

By proceeding, you agree to our Terms of Use and Privacy Policy

We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*

By proceeding, you agree to our Terms of Use and Privacy Policy