Process large data file ( up to 10 GB) and save to RDS

0

Hi I have an architecture like below
user upload file -> S3 -> lambda trigger glue job -> glue job pull the file, read content, and save to a record in a table in Aurora Postgres
Everything is good with a small file, but when file size increases (up to 10Gb), I think the architecture can not fit anymore. I researched and found that Postgres has something like an external table that can store much more data than a regular table. I also think about switching to a NoSQL database like Dynamo or MongoDB I have some questions:

  • Aurora Postgres is good for storing and searching large content?
  • If Aurora Posgres is not good then which NoSQL database fits this scenario?
1 回答
1

Hello.

Aurora Postgres is good for storing and searching large content?

PostgreSQL is a relational database, so it is suitable for performing complex queries such as table joins.
Therefore, I think it is suitable for performing complex updates and searches.

If Aurora Posgres is not good then which NoSQL database fits this scenario?

NoSQL such as DynamoDB is faster than relational databases, which are faster to read and write.
However, it is not suitable for systems that require complex data updates.

We recommend that you check the following documentation to select the appropriate database for your solution.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.WhyDynamoDB.html

profile picture
专家
已回答 8 个月前
profile pictureAWS
专家
已审核 8 个月前
  • Can you tell me more about architecture? Did it fit with a 10GB file size? Aurora Postgres is good for storing and searching large content -> I mean in 1 row, file content will be stored in 1 row with each file. If file size is 10GB, that row will hold a 10GB data string There is no complex query, just a simple search, the update mechanism will be performed by glue job through the glue database.

  • If you don't have complex queries, I think NoSQL such as DynamoDB is better. Also, if one row is about 10GB in size, I think it would be difficult to store it in a relational database.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则