Process large data file ( up to 10 GB) and save to RDS

0

Hi I have an architecture like below
user upload file -> S3 -> lambda trigger glue job -> glue job pull the file, read content, and save to a record in a table in Aurora Postgres
Everything is good with a small file, but when file size increases (up to 10Gb), I think the architecture can not fit anymore. I researched and found that Postgres has something like an external table that can store much more data than a regular table. I also think about switching to a NoSQL database like Dynamo or MongoDB I have some questions:

  • Aurora Postgres is good for storing and searching large content?
  • If Aurora Posgres is not good then which NoSQL database fits this scenario?
1 個回答
1

Hello.

Aurora Postgres is good for storing and searching large content?

PostgreSQL is a relational database, so it is suitable for performing complex queries such as table joins.
Therefore, I think it is suitable for performing complex updates and searches.

If Aurora Posgres is not good then which NoSQL database fits this scenario?

NoSQL such as DynamoDB is faster than relational databases, which are faster to read and write.
However, it is not suitable for systems that require complex data updates.

We recommend that you check the following documentation to select the appropriate database for your solution.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.WhyDynamoDB.html

profile picture
專家
已回答 8 個月前
profile pictureAWS
專家
已審閱 8 個月前
  • Can you tell me more about architecture? Did it fit with a 10GB file size? Aurora Postgres is good for storing and searching large content -> I mean in 1 row, file content will be stored in 1 row with each file. If file size is 10GB, that row will hold a 10GB data string There is no complex query, just a simple search, the update mechanism will be performed by glue job through the glue database.

  • If you don't have complex queries, I think NoSQL such as DynamoDB is better. Also, if one row is about 10GB in size, I think it would be difficult to store it in a relational database.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南