- Más nuevo
- Más votos
- Más comentarios
This error indicates an issue with the S3 request parameter "max-keys" when your PySpark job is trying to list objects in the S3 bucket. The error message states that "max-keys must be between 1 and 1000," suggesting there's a configuration problem with how your Spark application is interacting with S3.
Based on your attempts, I notice a few potential issues:
-
Configuration timing: You're setting the configurations after the Spark session has already been created. These configurations should be set when initializing the SparkSession.
-
Configuration method: You're using a mix of spark.conf.set() and hadoop_conf.set() methods, which might be causing inconsistency.
-
Configuration naming: The exact property name might be different depending on your Hadoop/S3A version.
Try the following approach:
# Set configurations when creating the SparkSession spark = SparkSession.builder \ .appName("YourApp") \ .config("spark.hadoop.fs.s3a.connection.maximum", "100") \ .config("spark.hadoop.fs.s3a.max.total.tasks", "100") \ .config("spark.hadoop.fs.s3a.list.version", "1") \ .config("spark.hadoop.fs.s3a.max-keys", "1000") \ .config("spark.hadoop.fs.s3a.list.version.max-keys", "1000") \ .config("spark.hadoop.fs.s3a.listing.max-keys", "1000") \ .getOrCreate()
Additionally, check the following:
-
Verify that the S3 bucket "test" exists and that your EMR cluster has the proper IAM permissions to access it.
-
Ensure that the path structure is correct. The error might also occur if the specified path doesn't exist.
-
Check if there are any bucket policies or access control settings that might be interfering with your access.
-
If you're using a VPC endpoint for S3, verify that it's correctly configured.
If the issue persists, you might want to try using the s3:// protocol instead of s3a:// as they use different implementations for accessing S3.
Sources
Glue Error: error occurred while calling o228.pyWriteDynamicFrame. com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist | AWS re:Post
5.2 - Spark troubleshooting and performance tuning | AWS Open Data Analytics
Troubleshoot Amazon S3 errors from AWS SDK exceptions | AWS re:Post
Contenido relevante
- preguntada hace 17 días
- preguntada hace 4 meses
- preguntada hace 17 días
- preguntada hace 7 meses
- OFICIAL DE AWSActualizada hace un año

Tried this configs are not working!