Apache Cassandra - Course Overview Tutorial

This is the introductory lesson of the Apache Cassandra Tutorial, which is a part of the Apache Cassandra Course. This lesson will give the tutorial overview, its prerequisites, and the value it will offer you.

Let us begin with the objectives of this lesson.

Objectives

By the end of this lesson, you will be able to:

  • Describe Cassandra and its features

  • Describe when Cassandra is used

  • Demonstrate how to work with the Command Line Interface of Cassandra

  • List the advantages and limitations of Cassandra

  • Demonstrate how to install a VMware player.

  • Describe the need for big data and NoSQL

  • Explain the fundamental concepts of Cassandra

  • Describe the architecture of Cassandra

  • Demonstrate data model creation in Cassandra

  • Use Cassandra database interfaces and demonstrate Cassandra database configuration.

Let us discuss Cassandra Introduction in the next section.

What is Apache Cassandra?

NoSQL is the common term used for all databases that do not follow the traditional Relational Database Management System or RDBMS principles. It can work with denormalized data. Cassandra is a NoSQL database that is highly scalable and big data ready.

Cassandra has the following characteristics:

  • Highly scalable

  • Big data ready

  • Distributed database

  • High-performance database

Behind the Name

Cassandra derives its name from Greek mythology. Cassandra was the daughter of King Priam of Troy and his wife, Hercuba. Cassandra had the power to predict the future, but she also had a curse that nobody would believe her predictions.

True to its name, the Cassandra database holds a lot of promise for the future but suffers from many limitations, as well.

In the next section, we will talk about the history of Cassandra.

History of Cassandra

Cassandra was initially developed in 2008 at Facebook as a combination of the BigTable data store used by Google and the Dynamo data store used by Amazon. It was developed by Avinash Lakshman, the author of Amazon Dynamo and Prashant Malik. It was developed to solve the inbox search problem of Facebook. It has evolved a lot since its inception in 2008.

It started with the concept of column families and super column families but later evolved as a key-value store. You can still get messages from Cassandra about column families. Version 1.0 was released in 2011. In 2014, Cassandra became an Apache open-source project, and the current version 2.0 was released.

Let us discuss the main features of Cassandra in the next section.

Main Features of Cassandra

The following are the main features of Cassandra:

  • Cassandra is a key-value database.

  • Data is stored as tables and columns

  • every table has a primary key.

  • Further, Cassandra has a limited SQL interface.

  • Also, it provides very fast read and writes.

A sample Cassandra query is as shown below:

Select ticker, value

From stocks

Where ticker = ‘xyz’

Order by ticker

Observe that the syntax is similar to that of SQL in a Relational Database Management system.

Next, let us know when Cassandra is used.

When is Cassandra Used?

Cassandra is used when:

  • A huge amount of information needs to be stored very quickly. For example, when you are processing telecom switch data or stock market data, a huge volume of data is generated every minute.

  • You want the full indexed search to get the data quickly, and the data needs to be sorted in a predetermined order. Full-indexed search is search performed using a key.

  • You expect an upsurge in data size. Cassandra enables scaling by adding more nodes as the data grows.

  • You want a highly fault-tolerant cluster with no single point of failure.

  • You need high performance for both data read and write.

Let us see a simple Cassandra program in the next section.

Simple Cassandra Program

An example of a simple Cassandra program is as shown below:

Create table stocks (ticker text primary key, value int);

Insert into stocks (ticker, value) values ('abc',200);

Insert into stocks (ticker, value) values ('unc',400);

Insert into stocks (ticker,value) values ('xyz',300);

select ticker, value

from stocks

where ticker = 'xyz' ;

 

Stocks

abc,200

xyz,300

unc,400

 

Result

xyz 300

In this example, you create a table, insert a few records, and fetch data from the table. You can see that the Cassandra syntax is very similar to the standard SQL syntax. The example creates the table called stocks with two columns, inserts three rows into this table, and selects data from the table for a particular key.

Observe that Cassandra uses the primary key of the table to fetch the data. The data is also stored in the primary key order.

Let us look into the Cassandra Command-line Interface in the next section.

Cassandra Command Line Interface

Cassandra provides a command line interface that is similar to the Linux shell.

It can be invoked with cqlsh if that is in your path or using bin/cqlsh from the Cassandra base directory.

You can use cqlsh –h to get help and arguments for the command.

Before starting cqlsh, you need to set the CQLSH_HOST parameter to the address of one of the hosts where Cassandra is running.

Within cqlsh, you can access the history of commands with the up arrow. Most commands have elaborate help available, and you can access it using the help command within Cassandra.

You can exit the shell using the exit command. Note that you need to terminate each command with a semicolon.

The following image shows how to run cqlsh and also depicts the sample output from cqlsh.

In the next section, we will learn about the advantages of Cassandra.

Advantages of Cassandra

Cassandra has many advantages for processing big data like:

  • It is highly fault tolerant with no single point of failure. This means that if any node in the cluster fails, other nodes will take over and complete the work.

  • Every node in the cluster is identical as there are no masters or slaves in Cassandra. Therefore, one machine cannot become the bottleneck in the system.

  • Further, you can add a machine to the cluster or remove a machine from the cluster any time without downtime.

  • Cassandra also provides very fast data writes allowing real-time processing of big data.

  • Cassandra outperforms many other NoSQL databases regarding many performance benchmarks.

In the next section, let us explore the limitations of Cassandra.

Limitations of Cassandra

Cassandra is not a general purpose database due to some following limitations:

  • First, it doesn’t provide aggregation of data with the group by, sum, min or max like relational databases. Any aggregation has to be pre-computed and stored.

  • Second, there are no joins of tables, so data has to be denormalized before getting stored in Cassandra.

  • Third, it doesn’t support additional search clauses or conditions. Only keys or indexes can be used for the search. We will talk more about this restriction later in the course.

  • Lastly, there is no sorting provided on non-key fields.

Let us talk about VMware in the next section.

VMware

In a later lesson, you will learn to install Cassandra on an Ubuntu Linux system. However, if you need to work on an operating system other than Linux, you can access the software provided by VMware.

This software allows running one operating system on another using a virtual machine. This is facilitated by VMware Player. For non-commercial use, VM Player can be downloaded and used free of cost from the VMware website.

Let us learn about the Simplilearn virtual machine in the next section.

Simplilearn Virtual Machine

Simplilearn has created a virtual machine on VMware Player. This machine, known as Hadoop Pseudo Server, comes with a preinstalled Ubuntu 12.04 LTS operating system and Hadoop setup. It can be opened with the VMWare Player and can be used for installing Cassandra. Hadoop Pseudo Server can be downloaded from the given link.

Let us talk about PuTTY in the next section.

PuTTY

PuTTY is a popular, free tool for connecting to Linux systems from Windows through a remote terminal. It overcomes some of the limitations of the VM. For example, it allows moving the mouse pointer with ease, scrolling in the window, and copying and pasting text.

PuTTY can be downloaded from the given link.

http://winscp.net/download/putty-0.64-installer.exe

Let us talk about WinSCP in the next section.

WinSCP

Winscp is a popular tool for copying files between Windows and Linux. It stands for Windows secure copy. It can be used to copy the files from local Windows to the Ubuntu VM running in VM Player.

WinSCP can be downloaded from the given link.

Let us explore the Apache Cassandra Tutorial Overview in the next section.

Overview

The Apache Cassandra training tutorial provides:

  • Details on the fundamentals of big data and NoSQL databases

  • An overview of Cassandra and its features.

  • Knowledge of the architecture and data model of Cassandra

  • An overview of the installation, configuration, and monitoring of Cassandra.

  • Knowledge of the Hadoop ecosystem of products around Cassandra.

In the next section, we will look at the target audience of the Apache Cassandra tutorial.

Target Audience

The key beneficiaries of this Apache Cassandra Tutorial are:

  • Professionals aspiring for a career in NoSQL databases and Cassandra.

  • Analytics professionals, research professionals, IT developers, testers, and project managers

  • Other aspirants and students, who wish to gain a thorough understanding of Apache Cassandra

Prerequisites of Apache Cassandra Tutorial

Fundamental Knowledge of any programming language is a prerequisite for this tutorial. Participants are expected to have a basic understanding of any database, SQL, and query language for databases. Working knowledge of Linux or Unix based systems is an added advantage for this course, although it is not mandatory.

Let us explore the value of Apache Cassandra to the professionals in the next section.

Value of Apache Cassandra to Professionals

The Apache Cassandra tutorial is best-suited for professionals who want to:

  • Demonstrate their expertise in the fast-growing big data industry

  • Reap the  benefits of the growing demand in NoSQL databases

  • Benefit from the shortage of Cassandra trained professionals

  • Take their organization towards big data analytics using Cassandra

  • Be experienced in tools used to process huge amounts of data

  • Be in the forefront of the big data technology, which is expected to be in demand for the next ten years

Let us take a look at the lessons covered in the Apache Cassandra Tutorial.

Lessons Covered

There are eight core lessons in this Apache Cassandra tutorial apart from the current lesson, ‘Course Overview.’ Take a look at the table below.

Lesson No

Chapter Name

What You’ll Learn

Lesson 1

Overview of Big Data and NoSQL Database

In this chapter, you’ll be able to:

  • Describe the 3 Vs of big data

  • Discuss some use cases of big data

  • Explain Apache Hadoop and the concept of NoSQL

  • Describe various types of NoSQL databases

Lesson 2

Introduction to Cassandra

In this chapter, you’ll be able to:

  • Describe Cassandra and its features

  • Describe when Cassandra is used

  • Demonstrate how to work with the Command Line Interface of Cassandra

  • List the advantages and limitations of Cassandra

  • Demonstrate how to install a VMware player

Lesson 3

Cassandra Architecture

In this chapter, you’ll be able to:

  • Describe the Cassandra architecture, components of Cassandra, and the effects of Cassandra architecture.

  • Explain the partitioning of data in Cassandra, Cassandra topology, and various failure scenarios handled by Cassandra.

Lesson 4

Cassandra Installation and Configuration

In this chapter, you’ll be able to:

  • State the various versions of Cassandra

  • Explain the steps to install and configure Cassandra on the Ubuntu system

  • List the steps to install Cassandra on CentOS

Lesson 5

Cassandra Data Model

In this chapter, you’ll be able to:

  • Describe Cassandra data model and the components of Cassandra data model

  • Explain the functions of DDL and DML statements and discuss the SELECT statement restrictions in Cassandra

Lesson 6

Cassandra Interfaces

In this chapter, you’ll be able to:

  • List the various interfaces to Cassandra

  • Describe the command line interface in Cassandra

  • Describe options and commands in the Cassandra command line interface.

  • Explain the Java interface to Cassandra

  • Write a Java program to connect to Cassandra

  • Explain the steps to compile and run a Java Program for Cassandra.

Lesson 7

Cassandra Advanced Architecture

In this chapter, you’ll be able to:

  • Explain partitioning

  • Describe replication strategies and consistency levels in Cassandra

  • Explain time to live and tombstones

  • Demonstrate the use of the node tool utility and the installation and configuration of the OpsCenter utility

Lesson 8

Apache Ecosystem around Cassandra

In this chapter, you’ll be able to:

  • Describe Apache Storm

  • Explain Apache Kafka, and discuss the real-time analytics platform tools.

  • Describe Apache Spark.

  • Discuss Spark and Scala.

Summary

Let us summarize the topics covered in this lesson.

  • Cassandra is a key-value NoSQL data store.

  • Cassandra is a highly fault-tolerant database.

  • Cassandra was started in 2008 at Facebook and became an Apache project in 2014.

  • Cassandra supports tables, columns, and simple SQL statements.

  • Cassandra provides fast reads and writes, thus supporting real-time data processing.

  • Cassandra does not support aggregates or joins.

  • If you need to work on an operating system other than Linux, you can assess the software provided by VMWare.

  • PuTTY is a popular, free tool for connecting to Linux systems from Windows through a remote terminal.

  • Winscp is a popular tool for copying files between Windows and Linux.

Conclusion

With this, we conclude the Apache Cassandra tutorial overview. The next lesson provides an overview of big data and NoSQL.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

We use cookies on this site for functional and analytical purposes. By using the site, you agree to be cookied and to our Terms of Use. Find out more

Request more information

For individuals
For business
Name*
Email*
Phone Number*
Your Message (Optional)

By proceeding, you agree to our Terms of Use and Privacy Policy

We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Email*
Phone Number*
Company*
Job Title*

By proceeding, you agree to our Terms of Use and Privacy Policy