- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
Hi,
what is happening is that when you read with the bookmark and there is no data, Glue generates an empty dynamicframe and Spark SQL cannot map the columns he has to an empty schema.
The solution is to add a check to see if any of dynamicframe generated based on the source data is empty , and then use a if branch to decide if to exit with cleanly with a message that no process was needed or to move forward.
You can insert a Custom Transform with the following code:
def validate_source_table (glueContext, dfc) -> DynamicFrameCollection:
noData = 0
for i in range (0, len(list(dfc.keys()))):
if dfc.select(list(dfc.keys())[i]).toDF().rdd.isEmpty():
print(' no data in table: ' , list(dfc.keys())[i])
noData =1
if noData==1:
print('committing and exiting job without processing.')
job.commit()
quit('No new Data. Exiting')
print(noData)
return dfc
the job will still fail but it will give a better exit code specifying that there is no New Data and it is exiting.
after the custom transform you need to have to SelectFromCollection and name it properly based on the schema.
the SQL node will have as parent the SelectFromCollection transofrms.
hope this helps
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor 3 Jahren
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 2 Jahren
Hi, are there any complete examples of how to address this problem? I'm getting the following error "'DynamicFrameCollection' object has no attribute 'toDF'" using your snippet
Additionally, what's the best way to debug such custom code?