How do I troubleshoot "Error Code: 503 Slow Down" on s3-dist-cp jobs in Amazon EMR?
My S3DistCp (s3-dist-cp) job on Amazon EMR job fails due to Amazon Simple Storage Service (Amazon S3) throttling. I get an error message similar to the following: mapreduce.Job: Task Id : attempt_xxxxxx_0012_r_000203_0, Status : FAILED Error: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: D27E827C847A8304; S3 Extended Request ID: XWxtDsEZ40GLEoRnSIV6+HYNP2nZiG4MQddtNDR6GMRzlBmOZQ/LXlO5zojLQiy3r9aimZEvXzo=), S3 Extended Request ID: XWxtDsEZ40GLEoRnSIV6+HYNP2nZiG4MQddtNDR6GMRzlBmOZQ/LXlO5zojLQiy3r9aimZEvXzo= at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
Short description
"Slow Down" errors occur when you exceed the Amazon S3 request rate (3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket). This often happens when your data uses Apache Hive-style partitions. For example, the following Amazon S3 paths use the same prefix (/year=2019/). This means that the request limit is 3,500 write requests or 5,500 read requests per second.
- s3://awsexamplebucket/year=2019/month=11/day=01/mydata.parquet
- s3://awsexamplebucket/year=2019/month=11/day=02/mydata.parquet
- s3://awsexamplebucket/year=2019/month=11/day=03/mydata.parquet
If increasing the number of partitions isn't an option, reduce the number of reducer tasks or increase the EMR File System (EMRFS) retry limit to resolve Amazon S3 throttling errors.
Resolution
Use one of the following options to resolve throttling errors on s3-dist-cp jobs.
Reduce the number of reduces
The mapreduce.job.reduces parameter sets the number of reduces for the job. Amazon EMR automatically sets mapreduce.job.reduces based on the number of nodes in the cluster and the cluster's memory resources. Run the following command to confirm the default number of reduces for jobs in your cluster:
$ hdfs getconf -confKey mapreduce.job.reduces
To set a new value for mapreduce.job.reduces, run a command similar to the following. This command sets the number of reduces to 10.
$ s3-dist-cp -Dmapreduce.job.reduces=10 --src s3://awsexamplebucket/data/ --dest s3://awsexamplebucket2/output/
Increase the EMRFS retry limit
By default, the EMRFS retry limit is set to 4. Run the following command to confirm the retry limit for your cluster:
$ hdfs getconf -confKey fs.s3.maxRetries
To increase the retry limit for a single s3-dist-cp job, run a command similar to the following. This command sets the retry limit to 20.
$ s3-dist-cp -Dfs.s3.maxRetries=20 --src s3://awsexamplebucket/data/ --dest s3://awsexamplebucket2/output/
To increase the retry limit on a new or running cluster:
- New cluster: Add a configuration object similar to the following when you launch a cluster.
- Running cluster: Use the following configuration object to override the cluster configuration for the instance group (Amazon EMR release versions 5.21.0 and later).
[ { "Classification": "emrfs-site", "Properties": { "fs.s3.maxRetries": "20" } } ]
When you increase the retry limit for the cluster, Spark and Hive applications can also use the new limit. Here's an example of a Spark shell session that uses the higher retry limit:
spark> sc.hadoopConfiguration.set("fs.s3.maxRetries", "20") spark> val source_df = spark.read.csv("s3://awsexamplebucket/data/") spark> source_df.write.save("s3://awsexamplebucket2/output/")
Related information
Best practices design patterns: optimizing Amazon S3 performance
Why does my Spark or Hive job on Amazon EMR fail with an HTTP 503 "Slow Down" AmazonS3Exception?

Relevanter Inhalt
- AWS OFFICIALAktualisiert vor 7 Monaten
- AWS OFFICIALAktualisiert vor einem Monat
- AWS OFFICIALAktualisiert vor 2 Monaten
- AWS OFFICIALAktualisiert vor 2 Monaten