- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hi,
Did you test the DDB bulk import feature (from S3) to compare with EMR: https://aws.amazon.com/about-aws/whats-new/2022/08/amazon-dynamodb-supports-bulk-imports-amazon-s3-new-dynamodb-tables/
This blog post may also help: https://aws.amazon.com/blogs/database/amazon-dynamodb-can-now-import-amazon-s3-data-into-a-new-table/
I would personally try this first to see the performance I get from this simpler way to ingest large data. If your EMR task is required to process this data, before loading it then you can use the S3 bulk import as a baseline for performances.
If you get performances you want with S3 bulk import, you can also envision to use EMR for a processing of the data with results in S3 that you would then bulk import into DDB via S3 import.
Best,
Didier
Hi, is this bulk import capable of handling large documents means larger tha dynamo db limit, we are using spark because some docs are bigger hence we persist them to s3. Can you help how can we mitigate this?
Hi,
Can you please suggest appropriate way to handle this load in given time. I would like to get efficient and performance friendly solution. Would like to focus on getting performance through EMR.
-
Please check if your EMR cluster is utilized maximum resources to execute the spark application. You can configure MaximumResourceAllocation property available in EMR spark to leverage the available resources.
-
If your EMR cluster requires more nodes based on your requirement, further go ahead and extend it or use** auto scaling** as needed.
-
You can check at DynamoDB side and make sure it utilizes the full provisioned capacity(WCU). Also, consider you have EMR cluster and dynamoDB table in the same region. You can also leverage this document to further optimize the performance.
Contenuto pertinente
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 3 anni fa
Hi, is this bulk import capable of handling large documents means larger tha dynamo db limit, we are using spark because some docs are bigger hence we persist them to s3. Can you help how can we mitigate this?
@Didier?