跳至內容

AWS Job Bookmark not working and not creating temp files

0

I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary file in s3 bucket. The logic work as expected where it creates folder and files inside this folder from source table but job bookmark logic did not work.

Do i have set anything else or what is missing ?

`

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datetime


args = getResolvedOptions(sys.argv, ['target_BucketName', 'JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

outputbucketname = args['target_BucketName']

timestamp = datetime.datetime.now().strftime("%Y%m%d")
filename = f"aks{timestamp}"
output_path = f"{outputbucketname}/{filename}"


# Script generated for node AWS Glue Data Catalog
AWSGlueDataCatalog_node1712075257312 = glueContext.create_dynamic_frame.from_catalog(database="obsdatachecks", table_name="_obst_rw_omuc_baag__obst1_aks", transformation_ctx="AWSGlueDataCatalog_node1712075257312")

# Script generated for node Amazon S3
AmazonS3_node1712075284688 = glueContext.write_dynamic_frame.from_options(frame=AWSGlueDataCatalog_node1712075257312, connection_type="s3", format="csv", format_options={"separator": "|"}, connection_options={"path": output_path, "compression": "gzip", "partitionKeys": []}, transformation_ctx="AmazonS3_node1712075284688")


job.commit() 

`

已提問 1 年前檢視次數 152 次
1 個回答
2

Hi,

Since the logic work as expected but you cannot see any temporary files in your S3 bucket. I would recommend you to check that the Glue Job configuration correctly specifies the path to the temporary files for the S3 storage bucket. Ensure that the --TempDir parameter[1] is specified at runtime or that the correct S3 bucket path is used to store the temporary files. If not specified, AWS Glue generates these temporary files in the S3 temporary directory, which is different from the output_path specified in your script.

Note, if no customized temp path, Glue will use the default temp path (usually something like s3://aws-glue-assets-<region>-<account_id>/temporary/)

References: [1] Job Parameter reference

Thanks Yuki

AWS
支援工程師
已回答 1 年前
專家
已審閱 1 年前
專家
已審閱 1 年前
  • I have specified the s3 bucket path in Temporary path option. But still its not creating any bookmark files in this s3 bucket.

  • I think the below code is causing the issue with job bookmark. But this is important for me as i am defining how the folder name should be created:

    outputbucketname = args['target_BucketName']

    timestamp = datetime.datetime.now().strftime("%Y%m%d") filename = f"aks{timestamp}" output_path = f"{outputbucketname}/{filename}"

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。