- Newest
- Most votes
- Most comments
Hello Bhaskar,
The output file names are something that is handled by the Spark execution engine. Spark does not offer any configuration option to set the name of these files, so there's no way to configure this on Glue. As a workaround, what some of our customers do is to run a small boto3 script at the end of the ETL job to list all the contents of the job's output path and then rename the objects in there. This could be run as code appended to the end of the job, as a separate ETL Python Shell job that is triggered whenever your job finishes, or even as a Lambda Function that is executed when the job is done.
Please find the sample script for this.
====Script====
BUCKET_NAME = <bucket-name>
PREFIX = <prefix-name>
## Glue Transformations and actions code ##
#Sleep 30s to ensure consistency
import boto3
client = boto3.client('s3')
response = client.list_objects(
Bucket='<BUCKET-NAME>',
Prefix='prefix1/prefix2/',
)
name = response["Contents"][0]["Key"]
#Copy the object with the new name and delete old one
client.copy_object(Bucket=BUCKET_NAME, CopySource=BUCKET_NAME+name, Key=PREFIX+"new_name")
client.delete_object(Bucket=BUCKET Key=name)
============
In order for me to troubleshoot further, by taking a look at the logs in the backend, please feel free to open a support case with AWS using the following link with the sanitized script, the job run and we would be happy to help.
Relevant content
- asked 5 months ago
- Accepted Answerasked 3 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 2 years ago