AWS Job Bookmark not working and not creating temp files

0

I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary file in s3 bucket. The logic work as expected where it creates folder and files inside this folder from source table but job bookmark logic did not work.

Do i have set anything else or what is missing ?

`

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datetime


args = getResolvedOptions(sys.argv, ['target_BucketName', 'JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

outputbucketname = args['target_BucketName']

timestamp = datetime.datetime.now().strftime("%Y%m%d")
filename = f"aks{timestamp}"
output_path = f"{outputbucketname}/{filename}"


# Script generated for node AWS Glue Data Catalog
AWSGlueDataCatalog_node1712075257312 = glueContext.create_dynamic_frame.from_catalog(database="obsdatachecks", table_name="_obst_rw_omuc_baag__obst1_aks", transformation_ctx="AWSGlueDataCatalog_node1712075257312")

# Script generated for node Amazon S3
AmazonS3_node1712075284688 = glueContext.write_dynamic_frame.from_options(frame=AWSGlueDataCatalog_node1712075257312, connection_type="s3", format="csv", format_options={"separator": "|"}, connection_options={"path": output_path, "compression": "gzip", "partitionKeys": []}, transformation_ctx="AmazonS3_node1712075284688")


job.commit() 

`

RahulD
asked 24 days ago43 views
1 Answer
1

Hi,

Since the logic work as expected but you cannot see any temporary files in your S3 bucket. I would recommend you to check that the Glue Job configuration correctly specifies the path to the temporary files for the S3 storage bucket. Ensure that the --TempDir parameter[1] is specified at runtime or that the correct S3 bucket path is used to store the temporary files. If not specified, AWS Glue generates these temporary files in the S3 temporary directory, which is different from the output_path specified in your script.

Note, if no customized temp path, Glue will use the default temp path (usually something like s3://aws-glue-assets-<region>-<account_id>/temporary/)

References: [1] Job Parameter reference

Thanks Yuki

AWS
SUPPORT ENGINEER
Yuki_N
answered 24 days ago
profile picture
EXPERT
reviewed 23 days ago
profile picture
EXPERT
reviewed 24 days ago
  • I have specified the s3 bucket path in Temporary path option. But still its not creating any bookmark files in this s3 bucket.

  • I think the below code is causing the issue with job bookmark. But this is important for me as i am defining how the folder name should be created:

    outputbucketname = args['target_BucketName']

    timestamp = datetime.datetime.now().strftime("%Y%m%d") filename = f"aks{timestamp}" output_path = f"{outputbucketname}/{filename}"

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions