Process large data file ( up to 10 GB) and save to RDS

0

Hi I have an architecture like below
user upload file -> S3 -> lambda trigger glue job -> glue job pull the file, read content, and save to a record in a table in Aurora Postgres
Everything is good with a small file, but when file size increases (up to 10Gb), I think the architecture can not fit anymore. I researched and found that Postgres has something like an external table that can store much more data than a regular table. I also think about switching to a NoSQL database like Dynamo or MongoDB I have some questions:

  • Aurora Postgres is good for storing and searching large content?
  • If Aurora Posgres is not good then which NoSQL database fits this scenario?
1 Answer
1

Hello.

Aurora Postgres is good for storing and searching large content?

PostgreSQL is a relational database, so it is suitable for performing complex queries such as table joins.
Therefore, I think it is suitable for performing complex updates and searches.

If Aurora Posgres is not good then which NoSQL database fits this scenario?

NoSQL such as DynamoDB is faster than relational databases, which are faster to read and write.
However, it is not suitable for systems that require complex data updates.

We recommend that you check the following documentation to select the appropriate database for your solution.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.WhyDynamoDB.html

profile picture
EXPERT
answered 7 months ago
profile pictureAWS
EXPERT
reviewed 7 months ago
  • Can you tell me more about architecture? Did it fit with a 10GB file size? Aurora Postgres is good for storing and searching large content -> I mean in 1 row, file content will be stored in 1 row with each file. If file size is 10GB, that row will hold a 10GB data string There is no complex query, just a simple search, the update mechanism will be performed by glue job through the glue database.

  • If you don't have complex queries, I think NoSQL such as DynamoDB is better. Also, if one row is about 10GB in size, I think it would be difficult to store it in a relational database.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions