DynamoDB vs Aurora
We are planning to upload approx 1.5 lakhs documents per day in S3 and planning to store its metadata in the DB(1 row/document). We also have a search requirement on this metadata for which we are planning to use AWS OpenSearch. So, basically the DB is going to be used as System of record only. Wanted to check what will be the better DB to store this huge volume of data. Dynamo DB or Aurora ?
DynamoDB can handle any throughput as long as the items are well distributed. There is a limitation on DynamoDB of 400kb per item, so if your document metadata exceeds this size then it is not the option for you.
DynamoDB is suited to OLTP workloads whereas Aurora is more suited to OLAP workloads. If your use-case needs high throughput and low latency, then DynamoDB is a better option. If your use-case involves analytical queries, then Aurora is a better option.
|Amazon Aurora||Amazon DynamoDB|
|It was developed by Amazon in 2015.||It was developed by Amazon in 2012.|
|It is MySQL and PostgreSQL compatible cloud service by Amazon.||It is hosted, scalable database service by Amazon with data stored in Amazon cloud.|
|It provides concept of Referential Integrity. Hence, no Foreign Keys.||It does not provide concept of Referential Integrity. Hence, no Foreign Keys.|
|Immediate Consistency is used to ensure consistency in distributed system.||Eventual Consistency and Immediate Consistency are used to ensure consistency in distributed system.|
|Its Primary database model is Relational DBMS.||Its Primary database models are Document store and Key-value store.|
|It supports Server-side scripting.||It does not support Server-side scripting.|
|Partitioning can be done with horizontal partitioning.||It supports sharding as partitioning method.|
|It supports SQL query language.||It does not support SQL query language directly, but via PartiQL API.|
|It supports only one replication method – Master-slave replication.||It supports replication methods.|
|It does not offers API for user-defined Map/Reduce methods.||It does not offers API for user-defined Map/Reduce methods. But maybe implemented via Amazon Elastic MapReduce.|
Thank you for the question and I just wanted to add more information in this context regarding concurrency. This will help you to take a better decision in terms of which engine to use. I do not have visibility on your application but I understand a lot IO (Write and Read) will be catered by this metadata database.
MySQL/Aurora MySQL, utilize MVCC (Multi Version Concurrency Control) mechanism 1, and implements locks 2 on tables/rows with different transaction isolation levels 3. For example, on Aurora MySQL database with the default settings such as the isolation level (repeatable read 3), it would not allow update on a table/row/rows if there has already been a transaction writing to it.
Since the write operation would require exclusive lock, so the earlier write transaction which has already acquired that lock, would block the later transaction who is trying to get that lock for the same rows or table which the query scans for. When the later transactions are blocked longer than innodb_lock_wait_timeout 5, it would abort with error like "Lock wait timeout exceeded; try restarting transaction".
Therefore, the pros is the data consistency is ensured, but, the cons is there could be blocking issue happens. For reads, repeatable read isolation level ensures reads are not blocked, and, reads are consistent within the same transaction 3.
To avoid blocking, there are a few options, such as review the query logic, minimize the concurrent write operations on the same records, configure retry mechanism on the application side to retry the transactions which aborted due to timeout on the lock wait.
I have researched this for Dynamodb as well, can see for Dynamodb, there is a feature called "optimistic locking" 4, and it can be used to ensure the writes are protected from being overwritten by the writes of others, and vice versa 4. You may refer to link 4 for more information on this feature and how to use it.
That being said, please do accept answers if the answers resolve your question. That's a best practice in Q&A communities both for other viewers and also out of respect to other's time spent answering the questions.
As always, Happy Cloud Computing.
2 Innodb locks: https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html
3 MySQL isolation levels: https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html
4 Dynamodb optimistic locking: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html
5 innodb_lock_wait_timeout: https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout
Modeling DynamoDBAccepted Answer
Serverless vs EKS/EC2Accepted Answer
S3 calls costing for HBASE on EMRasked 2 months ago
Where can I see custom metadata or tags in WorkDocs?asked 3 months ago
AWS Datasync to transfer the PDF and xml togather
Cost of at rest encryption in S3asked 2 months ago
Can we export the mongodb using the data pipeline?Accepted Answerasked 6 years ago
Is WorkDocs a good fit for transactional/customer-based organization of documentsasked 4 months ago
DynamoDB vs AuroraAccepted Answer
Using EC2 vs Workspaceasked a year ago