How do I resolve the "Error Code: 503 Slow Down" error on S3DistCp jobs in Amazon EMR?

3 minute read
0

I want to resolve the "Error Code: 503 Slow Down" error that causes my S3DistCp jobs to fail in Amazon EMR.

Resolution

When your data uses Apache Hive-style partitions and you exceed the Amazon Simple Storage Service (Amazon S3) request rate, you receive the following error:

"mapreduce.Job: Task Id : attempt_######_0012_r_000203_0, Status : FAILED Error: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: D27E827C847A8304; S3 Extended Request ID: XWxtDsEZ40GLEoRnSIV6+HYNP2nZiG4MQddtNDR6GMRzlBmOZQ/LXlO5zojLQiy3r9aimZEvXzo=), S3 Extended Request ID: XWxtDsEZ40GLEoRnSIV6+HYNP2nZiG4MQddtNDR6GMRzlBmOZQ/LXlO5zojLQiy3r9aimZEvXzo= at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)"

For example, the following Amazon S3 paths use the same prefix, /year=2019/:

  • s3://awsexamplebucket/year=2019/month=11/day=01/mydata.parquet
  • s3://awsexamplebucket/year=2019/month=11/day=02/mydata.parquet
  • s3://awsexamplebucket/year=2019/month=11/day=03/mydata.parquet

The request quota is 3,500 write requests or 5,500 read requests per second. All operations within a prefix count toward the quotas.

If you can't increase the number of partitions, then use one of the following options to resolve throttling errors on S3DistCp jobs.

Decrease the number of reduce tasks

The mapreduce.job.reduces parameter sets the number of reduce tasks for the job. Amazon EMR automatically sets mapreduce.job.reduces based on the number of nodes in the cluster and the cluster's memory resources.

Run the following command to check the default number of reduce tasks for jobs in your cluster:

hdfs getconf -confKey mapreduce.job.reduces

To set a new value for mapreduce.job.reduces, run the following command:

s3-dist-cp -Dmapreduce.job.reduces=10 --src s3://awsexamplebucket/data/ --dest s3://awsexamplebucket2/output/

Note: Replace 10 with the number that you want to set for reduce tasks. Also, replace the example values with your values.

Increase the EMRFS retry quota

By default, the EMRFS retry quota is set to 4.

Run the following command to check the retry quota for your cluster:

hdfs getconf -confKey fs.s3.maxRetries

To increase the retry quota for a s3-dist-cp job, run the following command:

s3-dist-cp -Dfs.s3.maxRetries=20 --src s3://awsexamplebucket/data/ --dest s3://awsexamplebucket2/output/

Note: Replace 20 with the quota that you want to set for retries. Also, replace the example values with your values.

To increase the retry quota on a new cluster, add a configuration object when you launch a cluster. To increase the retry quota on a running cluster, use the configuration object to override the cluster configuration for the instance group.

Example configuration object for Amazon EMR versions 5.21.0 and later:

[
    {
        "Classification": "emrfs-site",
        "Properties": {
            "fs.s3.maxRetries": "20"
        }
    }
]

When you increase the retry quota for the cluster, Apache Spark and Hive applications can also use the new quota. The following example Spark shell session uses the higher retry quota:

spark> sc.hadoopConfiguration.set("fs.s3.maxRetries", "20")
spark> val source_df = spark.read.csv("s3://awsexamplebucket/data/")  
spark> source_df.write.save("s3://awsexamplebucket2/output/")

Related information

Best practices design patterns: optimizing Amazon S3 performance

Why does my Spark or Hive job on Amazon EMR fail with an HTTP 503 "Slow Down" AmazonS3Exception?

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago