- 最新
- 最多得票
- 最多評論
Hi,
what is happening is that when you read with the bookmark and there is no data, Glue generates an empty dynamicframe and Spark SQL cannot map the columns he has to an empty schema.
The solution is to add a check to see if any of dynamicframe generated based on the source data is empty , and then use a if branch to decide if to exit with cleanly with a message that no process was needed or to move forward.
You can insert a Custom Transform with the following code:
def validate_source_table (glueContext, dfc) -> DynamicFrameCollection:
noData = 0
for i in range (0, len(list(dfc.keys()))):
if dfc.select(list(dfc.keys())[i]).toDF().rdd.isEmpty():
print(' no data in table: ' , list(dfc.keys())[i])
noData =1
if noData==1:
print('committing and exiting job without processing.')
job.commit()
quit('No new Data. Exiting')
print(noData)
return dfc
the job will still fail but it will give a better exit code specifying that there is no New Data and it is exiting.
after the custom transform you need to have to SelectFromCollection and name it properly based on the schema.
the SQL node will have as parent the SelectFromCollection transofrms.
hope this helps
相關內容
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前
Hi, are there any complete examples of how to address this problem? I'm getting the following error "'DynamicFrameCollection' object has no attribute 'toDF'" using your snippet
Additionally, what's the best way to debug such custom code?