- Le plus récent
- Le plus de votes
- La plupart des commentaires
Hi,
what is happening is that when you read with the bookmark and there is no data, Glue generates an empty dynamicframe and Spark SQL cannot map the columns he has to an empty schema.
The solution is to add a check to see if any of dynamicframe generated based on the source data is empty , and then use a if branch to decide if to exit with cleanly with a message that no process was needed or to move forward.
You can insert a Custom Transform with the following code:
def validate_source_table (glueContext, dfc) -> DynamicFrameCollection:
noData = 0
for i in range (0, len(list(dfc.keys()))):
if dfc.select(list(dfc.keys())[i]).toDF().rdd.isEmpty():
print(' no data in table: ' , list(dfc.keys())[i])
noData =1
if noData==1:
print('committing and exiting job without processing.')
job.commit()
quit('No new Data. Exiting')
print(noData)
return dfc
the job will still fail but it will give a better exit code specifying that there is no New Data and it is exiting.
after the custom transform you need to have to SelectFromCollection and name it properly based on the schema.
the SQL node will have as parent the SelectFromCollection transofrms.
hope this helps
Contenus pertinents
- demandé il y a un an
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a un an
Hi, are there any complete examples of how to address this problem? I'm getting the following error "'DynamicFrameCollection' object has no attribute 'toDF'" using your snippet
Additionally, what's the best way to debug such custom code?