Does Redshift Spectrum support Incremental Ingestion

0

I have created an External Table using Redshift Spectrum and using AWS Glue to crawl deeply nested json files coming into s3 bucket every second.

I was able to populate a redshift table by extracting values from external table. But facing an issue with the incremental ingestion as all the values are loaded into Redshift everytime for new files being crawled.

How to capture the new data in s3 bucket so that only new columns got loaded into Redshift table.

Sample file: { "Messages":[ { "Attributes":{"docType":"Test"}, "Data":{ "key1":"value1", "key2":["a1":"v1", "a2":"v2"], "key3":{{"b1":"u1"}, "b2":[{"c1":"u2"}, {"c2":"u3"} ] } } }, { "Attributes":{"docType":"Test2"}, "Data":{ "key1":"value1", "key2":["a1":"v1", "a2":"v2"], "key3":{{"b1":"u1"}, "b2":[{"c1":"u2"}, {"c2":"u3"} ] } } } ] }

sravan
질문됨 2달 전207회 조회
2개 답변
0

You cannot just read the new columns, for that you would need a columnar format like parquet.
Also incremental ingestion normally refers to loading new files, for that you could use Glue bookmarks (running a Glue job instead of Spectrum) or putting new files on different folders(partitions) and telling Spectrum to load just that)

profile pictureAWS
전문가
답변함 2달 전
  • How to dynamically change the partition values so that we could automate this job !

  • If you mean filtering partitions, you would need to build your query with the values you need, for instance using the current date for date related columns

0
profile pictureAWS
전문가
답변함 2달 전
  • We want to explore the options to load the data without flattening the json using AWS Glue job to reduce the billing

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인