Step by Step: CentOS Installation for Cassandra/DSE Node
The purpose of this blog is to provide a step by step guide to installing CentOS in preparation of a Cassandra installation. By following this blog you will have a clean node that is specifically ready for a Cassandra (DSE) install.
Note: We will be using a VirtualBox image as the baseline for this blog and for example purposes only. It is not recommended to run Cassandra or DSE on VirtualBox in production. Please note the bare metal installation steps are fairly similar and we will call out the differences along the way.
Note: For help in selecting the right hardware for Cassandra, please visit the DataStax documentation here: Architecture Planning
Step 1: Prepare Your Installation Media:
We used the x86_64 LiveCD iso version of CentOS when we installed on our VirtaulBox image
When doing a bare metal install, you will need to create installation media. We created a USB key from the LiveCD version of the install. If you need help creating a bootable USB Key, just ask Uncle Google and you will receive a lot of guidance. Also, be sure to change any BIOS settings necessary to ensure your machine will see the USB Key when booting.
Step 2: Start the Installation
Once your bootable media is created and you have configured your server to see the bootable media, it is time to start the install. I created a new VirtualBox image with 4 GB of RAM and 8 GB of disk space for the purposes of this blog. I started the machine and selected the start-up disk.
Hit esc when CentOS starts loading. This will bring you to the bootable directory. Select Install to install with the GUI, it's what we are going to use for this blog. If you're up for it, go with Text mode :)
Now hit enter and let's get the install started!
Step 3: Configure the Basics
On the first screen click Next.
Then, choose your language, we chose English, click Next.
Then, select your keyboard, we chose U.S. English, click Next.
Then, select Basic Storage Devices, click Next.
- Note: Do not install or use any type of NAS to support Cassandra. Cassandra is a distributed system. One of the main benefits of Cassandra is that it doesn't have a single point of failure. Leveraging any type of NAS would create a single point of failure for the system, which is undesirable. You may ask, what about all the Amazon installs of Cassandra? When installing on Amazon, we leverage ephemeral storage, not shared storage.
Select Yes, wipe my data, click Next.
Then, give your machine a hostname, click Next.
Select your timezone, click Next.
Select the root password, click Next.
Step 4: Configure Storage
Okay, here's where we did some very specific things for Cassandra.
We chose to install without a swap drive. Doing this means if the node runs out of memory, it will "blow up". We are disabling swap for the following reasons:
1) Cassandra is a distributed system. If a node goes down, that's okay, the system will continue operating and we will get an opportunity to know why, i.e. an opportunity to learn!
2) Masking a memory issue with swapping is undesirable with Cassandra as it will cause disk performance issues. We would much rather find out we have exhausted memory resources and take corrective action. Again, Cassandra is a distributed system. In fact, we call it "anti-fragile", thanks Mr. Taleb for that concept. Cassandra installations, and teams, tend to get better when node failures are observed and corrected.
3) Cassandra is a database system. Like any database system, we want to explicitly control as much disk I/O as possible. We do not want undesired disk I/O form memory swapping.
4) Our colleague, and fellow Bald-Jonathan, Jonathan Shook said it was a good idea and that guy's awesome.
On the following menu select Create Custom Layout, click Next
Delete all partitions that exist on your hard drives. This assumes you have nothing important on the hard drives. If you have anything of importance on your system, then stop the installation and move your data. Once all partitions are deleted, you should see something like the following.
Each physical driver should be empty and should only show "Free" under the physical drive. Because we are using a virtual machine for this exercise, we only see sda.
Note: You should have at least 2 separate drives in your machine when installing Cassandra, preferably more. Try to use SSD's. Here's a good link to physical disk recommendations for Cassandra: Hardware Recommendations (SSDs)
Now create a set of standard partitions:
1) Boot partition for the OS
2) Cassandra partition
-- If you have only 2 disks and are not using SSDs, it is recommended to place the OS and data together on the same partition, thus isolating the Commit Log.
-- If you are utilizing SSDs, then you can place the commit log and data on the same partition.
-- We aren't going to discuss the differences of using RAID or JBOD in this blog. That will be a topic for another day.
Here's a screenshot of what you will see when you click the Create button. Select Standard Partition.
Here's a screenshot of the partition information. Your information will differ. It is recommended to use xfs for the file system type if you are able.
Here's a screenshot of the Cassandra partition. We are using SSDs, so we only have one partition.
Finally, here's what our disk layout looks like once we are ready for the install.
Note, that in a real, non Virtual, install we would have the /boot partition on a separate disk from the /cassandra partition
Click Next.
When prompted stating you have not specified a swap partition, select Yes.
Next, select Format on the format prompt and then click continue.
Now, select Next when prompted about installing a boot loader.
You are now installing CentOS!
Once the installer completes you will see the following message.
Give yourself a high five, then reboot. Walk through the last few steps.
Note: In Production installations, be sure to use an NTP to keep the Cassandra node clocks synchronized.
Step 5: Configure the OS for Cassandra.
In this step we are going to do a few items to get the OS ready for the Cassandra install:
1) update the OS
2) enable SSH so we can use a terminal. BTW: have you used Cluster shell? It makes working with clusters a lot easier. Check it out here:
http://sourceforge.net/projects/clusterssh/ Thanks to Johnny Miller for making us aware of this tool!
3) disable unwanted "stuff", including the UI if you installed it
-- if needed disable the firewall and SE Linux.
- we will not cover this topic. It's up to you to decide if this is a good idea for your environment.
4) since we are using SSDs, we need to tweak a few settings to ensure the system can use SSDs properly
5) install the right version of JAVA
Note: We are using a root to install and interact with the system. Using root as a user is never recommended in a Production environment.
Update the OS:
1) run yum update
Enable SSH:
1) To enable sshd one time run the following: /sbin/service sshd start
2) To enable sshd on startup run the following: chkconfig sshd on
Disable Unwanted Stuff
1) We disabled the UI by opening the following in your favorite editor: /etc/inittab
-- now change the 5 to a 3 in the following line id:5:initdefault: to id:3:initdefault:
-- now reboot
2) We turned off a lot of unwanted services by following these guides:
3) Because we are running a demo machine, we disabled SE Linux and the Firewall
-- to disable SE Linus open the following file in your favorite editor and change enabled to disabled: /etc/sysconfig/selinux
-- to disable the Firewall run the following:
# service iptables save
# service iptables stop
# chkconfig iptables off
Enable SSDs on the OS
We want the OS to understand how to interact with SSDs. Most OS's are configured for HDDs. The following three commands will help the OS understand how to leverage SSDs to their fullest by tweaking the schedule and two disk io specific settings.
echo noop > /sys/block/sda/queue/scheduler
echo 0 > /sys/block/sda/queue/read_ahead_kb
echo 0 > /sys/block/sda/queue/rotational
Install JAVA
Here is a good Cassandra guide to installing the right version JAVA. Please note that the openSDK is not a good choice for Cassandra.
Cassandra 1.2 JAVA Installation
Step 6: Install Cassandra
Well, at this point you are ready to install Cassandra/DSE. The documentation for installing Cassandra and DSE is pretty straight forward.
We will follow-up with another blog to show you how to create a demo environment that contains a few individual nodes hosted on a single machine.
Good luck and feel free to reach out with questions or feedback.