Increase the EBS volume in running EMR cluster

4 minute read
Content level: Advanced

This article offers instructions on how to configure additional Elastic Block Store (EBS) volumes for HDFS or YARN to increase the storage capacity of a running Amazon EMR cluster.

This process can be particularly useful when you encounter termination of the core node due to an unhealthy state caused by exceeding the disk usage threshold. By adding more EBS volumes, you can alleviate the issue of running out of storage space on the core node, ensuring the stability and continued operation of your EMR cluster.

To increase the storage capacity of your HDFS cluster on Amazon EMR, you can add an additional Elastic Block Store (EBS) volume to your core node with following steps:

1. Create a new EBS volume:

  • Access the EC2 console in the AWS Management Console.
  • Navigate to the "Volumes" section under the "Elastic Block Store" service.
  • Click "Create Volume" and specify the desired volume type (e.g., General Purpose SSD), size (in GiB), and the Availability Zone matching your core node's region.
  • Review the settings and click "Create Volume" to provision the new EBS volume.

2. Attach the new EBS volume to your core node:

  • In the EC2 console, locate your core node instance.
  • From the instance's context menu, select "Attach Volume."
  • In the "Attach Volume" dialog, select the newly created EBS volume.
  • Assign a device name for the volume (e.g., /dev/xvdf) and click "Attach."

3. Connect to your core node via SSH using the appropriate method (e.g., EC2 Instance Connect, SSH client with a key pair).

4. Prepare the new volume for use with HDFS:

  • Run lsblk to verify the new volume and its device name.
  • Use sudo fdisk /dev/xvdf to create a new partition on the volume (e.g., /dev/xvdf1). Follow the prompts, and when asked for the partition type, select the default option.
  • Create an XFS file system on the new partition: sudo mkfs.xfs -f /dev/xvdf1.
  • Create a new mount point directory: sudo mkdir /mnt1.
  • Mount the new partition to the mount point: sudo mount -t xfs /dev/xvdf1 /mnt1.
  • Create a dedicated directory for HDFS: sudo mkdir /mnt1/hdfs.

5. Update the HDFS configuration:

  • Open the hdfs-site.xml file (e.g., sudo nano /etc/hadoop/conf/hdfs-site.xml) on your core node.
  • Locate the and properties.
  • Append the new path /mnt1/hdfs to the existing values, separating multiple paths with a comma (e.g., <value>file:///mnt/hdfs,file:///mnt1/hdfs</value>).
  • Save and close the file.

6. Set the correct ownership for the new HDFS directory:

  • Run sudo chown hdfs:hdfs /mnt1/hdfs to grant ownership to the HDFS user and group.

7. Restart the HDFS DataNode service:

  • Stop the DataNode: sudo systemctl stop hadoop-hdfs-datanode.
  • Start the DataNode: sudo systemctl start hadoop-hdfs-datanode.

8. Verify the increased capacity:

  • Run hdfs dfsadmin -report or hdfs dfs -df -h to see the updated storage capacity, which should now include the new volume.

9. If you're also using YARN, update the yarn.nodemanager.local-dirs property in yarn-site.xml to include the new /mnt1/yarn directory, and restart the YARN NodeManager service using sudo systemctl restart hadoop-yarn-nodemanager.

By following these steps, you can effectively increase the storage capacity of your HDFS cluster on Amazon EMR by adding an additional EBS volume to your core node. This process can help alleviate issues related to running out of disk space, ensuring the smooth operation of your big data workloads.

Additionally, to dynamically resize the storage space on your cluster nodes, you can use the bootstrap action script provided in the AWS documentation. Please make sure to test the BA script and ensure that it addresses your requirements in the lower environment. Also, the given script will not scale down the EBS volume when that is not required default. So, you can further customize based on your use-case.

published 19 days ago1345 views