I have a large number of files to copy. I want to copy the files in parallel on an Amazon Elastic File System (Amazon EFS) file system for my Amazon Elastic Compute Cloud (Amazon EC2) instance.
Resolution
Use one of the following tools to run jobs in parallel on an Amazon EFS file system:
GNU parallel
Complete the following steps:
-
To install GNU parallel, run the following commands for the OS that you use.
Amazon Linux and RHEL 6:
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
sudo yum install parallel nload -y
Red Hat Enterprise Linux (RHEL) 7:
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install parallel nload -y
Amazon Linux 2:
sudo amazon-linux-extras install epel
sudo yum install nload sysstat parallel -y
Amazon Linux 2023 (AL2023):
There's no Extra Packages for Enterprise Linux (EPEL) or EPEL-type repository for AL2023. Instead, install packages from the GitHub repository pages for the individual tools. See nload, sysstat, and gnu-parallel on the GitHub website.
Ubuntu:
sudo apt-get install parallel
-
Run one of the following commands to copy the files to Amazon EFS.
With rsync:
sudo time find -L /src -type f | parallel rsync -avR {} /dst
-or-
Without rsync:
sudo time find /src -type f | parallel -j 32 cp {} /dst
-
Run the following command to monitor network traffic and bandwidth on the nload application console:
sudo nload -u M
msrsync
Note: msrsync is compatible only with Python. To run the msrsync script, use Python version 2.7.14 or later.
Complete the following steps:
-
Run the following command to install msrsync:
sudo curl -s https://raw.githubusercontent.com/jbd/msrsync/master/msrsync -o /usr/local/bin/msrsync && sudo chmod +x /usr/local/bin/msrsync
-
To specify the number of rsync processes that you want to run in parallel, run the following command. Include the -p option to show the progress of each job:
sudo time /usr/local/bin/msrsync -P -p X --stats --rsync "-artuv" /src/ /dst/
Note: Replace X with the number of rsync processes.
fpsync
Complete the following steps:
-
Activate the EPEL repository.
-
To install the fpart package, run the following commands for the OS that you use.
RHEL 6:
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
sudo yum install fpart -y
RHEL 7:
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install fpart -y
Amazon Linux 2:
sudo amazon-linux-extras install epel
sudo yum install fpart -y
Amazon Linux 2023:
There's no EPEL or EPEL-type repository for AL2023. Instead, install packages from the repository pages for the individual tools. See nload, sysstat, and gnu-parallel on the GitHub website.
Ubuntu:
sudo apt-get install fpart
Note: In Ubuntu, fpsync is part of the fpart package.
-
Run the following command to synchronize the /dst and /src directories:
sudo fpsync -n X /src /dsthttps://github.com/martinda/gnu-parallel
Note: Replace X with the number of Fpsync processes that you want to run in parallel.