DynamoDB - scanning a 7.5GB table

0

IHAC who's trying to load into memory a DynamoDB table with 7.5GB and 7M rows, each 0.4KB, and finds it takes about ten minutes.

Assuming the partition key is not an issue, what would be a good approach to speed the process up? I thought about:

  1. Loading in parallel instead of sequential
  2. Batching small items and loading in bulks (via Firehose, for example)

I'd be happy to hear your ideas!

Nir_Sh
질문됨 4년 전2233회 조회
2개 답변
0
수락된 답변

In general, we recommend the customer perform parallel Scan with multiple threads. The following AWS documentation provides a detailed explanation on how Parallel Scan works:

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan

With the Scan API call, each Scan returns 1 MB data. A single thread can perform around 20 Scan API calls in a second, assuming the round-trip latency for each API call is around 50 ms. In other words, you can Scan around 20 MB data per second per thread.

On an Amazon EC2 instance with 16 vCPU cores, run one thread per vCPU core. You will achieve something like 300 MB/s in throughput. With this first approximate, you can get things done in 7500 / 300 = 25 seconds, which is less than 1 minute.

If you want to load data from Amazon DynamoDB into memory, then you will need a data structure that can be shared by multiple threads and is also thread-safe. In Java you will need to think about things like Concurrent Collections.

If you want to load data from Amazon DynamoDB onto disk (export), you might want to refer to the DDBImportExport project on github, which I developed as a demo on how to use perform parallel Scan in DynamoDB. This implementation uses Python, but the same logic can be easily re-implemented in any other languages.

AWS
답변함 4년 전
profile picture
전문가
검토됨 한 달 전
0

Best practices for querying and scanning data: Taking advantage of parallel scans and Choosing TotalSegments https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-query-scan.html

AWS
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠