How to replicate Glue bookmark in custom spark aws glue script, where i am not able to use dynamic frame to read the data but need to process only new files in the source S3 folder

0

How to replicate Glue bookmark in custom spark aws glue script, where i am not able to use dynamic frame to read the data but need to process only new files in the source S3 folder

KG
질문됨 6달 전187회 조회
1개 답변
0

You can't use bookmarks without DynamicFrame, why are you enable to use it (you can convert to DataFrame as soon as you read)?
Otherwise you have to do your own listing of files and passing that truncated list to Spark

profile pictureAWS
전문가
답변함 6달 전
  • the files i am trying to read have a gzip compression and also base64 encoded then its a json file, i am unable to read the file in a dynamic frame , and i want to read only new files which come in the source folder

  • I think the issue there is the base64, that's not standard and you would need to read as text, then decode then parse as json. You can do that with plain Spark but then you would lose the bookmarks. What you could do is read with DynamicFrame as csv (even though is not csv), all the encoding will be in one column and if you don't have memory issues you could decode and parse.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠