Direkt zum Inhalt

AWS Job Bookmark not working and not creating temp files

0

I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary file in s3 bucket. The logic work as expected where it creates folder and files inside this folder from source table but job bookmark logic did not work.

Do i have set anything else or what is missing ?

`

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datetime


args = getResolvedOptions(sys.argv, ['target_BucketName', 'JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

outputbucketname = args['target_BucketName']

timestamp = datetime.datetime.now().strftime("%Y%m%d")
filename = f"aks{timestamp}"
output_path = f"{outputbucketname}/{filename}"


# Script generated for node AWS Glue Data Catalog
AWSGlueDataCatalog_node1712075257312 = glueContext.create_dynamic_frame.from_catalog(database="obsdatachecks", table_name="_obst_rw_omuc_baag__obst1_aks", transformation_ctx="AWSGlueDataCatalog_node1712075257312")

# Script generated for node Amazon S3
AmazonS3_node1712075284688 = glueContext.write_dynamic_frame.from_options(frame=AWSGlueDataCatalog_node1712075257312, connection_type="s3", format="csv", format_options={"separator": "|"}, connection_options={"path": output_path, "compression": "gzip", "partitionKeys": []}, transformation_ctx="AmazonS3_node1712075284688")


job.commit() 

`

gefragt vor einem Jahr152 Aufrufe
1 Antwort
2

Hi,

Since the logic work as expected but you cannot see any temporary files in your S3 bucket. I would recommend you to check that the Glue Job configuration correctly specifies the path to the temporary files for the S3 storage bucket. Ensure that the --TempDir parameter[1] is specified at runtime or that the correct S3 bucket path is used to store the temporary files. If not specified, AWS Glue generates these temporary files in the S3 temporary directory, which is different from the output_path specified in your script.

Note, if no customized temp path, Glue will use the default temp path (usually something like s3://aws-glue-assets-<region>-<account_id>/temporary/)

References: [1] Job Parameter reference

Thanks Yuki

AWS
SUPPORT-TECHNIKER
beantwortet vor einem Jahr
EXPERTE
überprüft vor einem Jahr
EXPERTE
überprüft vor einem Jahr
  • I have specified the s3 bucket path in Temporary path option. But still its not creating any bookmark files in this s3 bucket.

  • I think the below code is causing the issue with job bookmark. But this is important for me as i am defining how the folder name should be created:

    outputbucketname = args['target_BucketName']

    timestamp = datetime.datetime.now().strftime("%Y%m%d") filename = f"aks{timestamp}" output_path = f"{outputbucketname}/{filename}"

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.