내용으로 건너뛰기

AWS Job Bookmark not working and not creating temp files

0

I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary file in s3 bucket. The logic work as expected where it creates folder and files inside this folder from source table but job bookmark logic did not work.

Do i have set anything else or what is missing ?

`

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datetime


args = getResolvedOptions(sys.argv, ['target_BucketName', 'JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

outputbucketname = args['target_BucketName']

timestamp = datetime.datetime.now().strftime("%Y%m%d")
filename = f"aks{timestamp}"
output_path = f"{outputbucketname}/{filename}"


# Script generated for node AWS Glue Data Catalog
AWSGlueDataCatalog_node1712075257312 = glueContext.create_dynamic_frame.from_catalog(database="obsdatachecks", table_name="_obst_rw_omuc_baag__obst1_aks", transformation_ctx="AWSGlueDataCatalog_node1712075257312")

# Script generated for node Amazon S3
AmazonS3_node1712075284688 = glueContext.write_dynamic_frame.from_options(frame=AWSGlueDataCatalog_node1712075257312, connection_type="s3", format="csv", format_options={"separator": "|"}, connection_options={"path": output_path, "compression": "gzip", "partitionKeys": []}, transformation_ctx="AmazonS3_node1712075284688")


job.commit() 

`

질문됨 일 년 전152회 조회
1개 답변
2

Hi,

Since the logic work as expected but you cannot see any temporary files in your S3 bucket. I would recommend you to check that the Glue Job configuration correctly specifies the path to the temporary files for the S3 storage bucket. Ensure that the --TempDir parameter[1] is specified at runtime or that the correct S3 bucket path is used to store the temporary files. If not specified, AWS Glue generates these temporary files in the S3 temporary directory, which is different from the output_path specified in your script.

Note, if no customized temp path, Glue will use the default temp path (usually something like s3://aws-glue-assets-<region>-<account_id>/temporary/)

References: [1] Job Parameter reference

Thanks Yuki

AWS
지원 엔지니어
답변함 일 년 전
전문가
검토됨 일 년 전
전문가
검토됨 일 년 전
  • I have specified the s3 bucket path in Temporary path option. But still its not creating any bookmark files in this s3 bucket.

  • I think the below code is causing the issue with job bookmark. But this is important for me as i am defining how the folder name should be created:

    outputbucketname = args['target_BucketName']

    timestamp = datetime.datetime.now().strftime("%Y%m%d") filename = f"aks{timestamp}" output_path = f"{outputbucketname}/{filename}"

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠