By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Best AWS Database Solution for Managing Contact Data with Google Sheets Integration

0

Hi everyone,

I'm working on a project where we need to store and manage contact details in a database that integrates with Google Sheets via a Google Script. Currently we have this working but the script checks against a master spreadsheet, which is causing a lot of timeouts and long delivery time - we plan to use a database in place of the master sheet to reduce execution time and timeouts. Our goal is to maintain a simple database that holds around 60K rows of text-based data across 4-6 columns (fields like HubSpot Ref ID, Other Reference ID's, Email Address, Instagram Handle). Here’s the setup and the challenges we’re trying to address:

Requirements

Database Choice: We need a database that can handle simple text data and allows querying for duplicates quickly. The initial database would store about 60K rows and might grow moderately over time.

Google Sheets Integration: Our team inputs data into Google Sheets, and we have Google Scripts to check this database for existing duplicates based on the fields above.

Performance & Scalability: The current method, querying another Google Sheet for duplicates, results in timeouts and errors, so we’re exploring a switch to AWS for greater efficiency and scalability.

Considered Solutions The main question that we are looking get answered here, is which option on AWS would best suit us. We’re looking at the following options and would love input on which might be best for this use case:

Amazon RDS (with MySQL): This seems like a straightforward option and provides SQL compatibility. Does RDS offer enough flexibility for fast duplicate checks from Google Sheets via Google Scripts, or would another AWS database be more efficient?

Amazon Aurora: We’ve read that Aurora offers better performance, but would it be overkill for a project of this size? Would Aurora’s added cost and features make sense for storing and querying simple contact data?

Amazon DynamoDB: Since DynamoDB is a NoSQL option, we’re curious if it might provide a more cost-effective solution. Given its high scalability, could DynamoDB handle duplicate checks efficiently, or would we face limitations with querying capabilities compared to a relational database?

Additional Questions

Free Tier Usage: For initial setup, can we start with the free tier in RDS or DynamoDB, and is it possible to let a developer set it up using a shared account? We’d like to build and test without incurring high costs until it’s fully implemented.

Further Guidance: Are there AWS resources or forums where we could get specific support on configuring this for efficient Google Sheets integration? I’ve read about Amazon IQ and Repost, but if there are other resources, please let us know.

Thanks in advance for any advice on the best AWS database solution for this type of project! We’re eager to streamline this process and avoid timeouts, so any insights into setup, efficiency, or costs would be greatly appreciated.

2 Answers
0

Based on your requirements and the options you're considering, here's a recommendation for your AWS database solution:

Amazon DynamoDB would likely be the best fit for your use case. It's a NoSQL database that excels at handling simple data structures and provides fast, consistent performance at any scale. For your project with 60K rows of text-based data across 4-6 columns, DynamoDB would be more than capable and cost-effective.

Key advantages of DynamoDB for your project:

  1. Scalability: DynamoDB can easily handle your current data volume and scale as needed without performance degradation.

  2. Performance: It offers single-digit millisecond latency, which is crucial for your duplicate checking operations from Google Sheets.

  3. Simplicity: As a fully managed service, you don't need to worry about database administration tasks.

  4. Cost-effectiveness: You only pay for the resources you use, making it potentially more economical than running a relational database for your relatively simple data structure.

  5. Flexible querying: While not as flexible as SQL databases, DynamoDB supports both key-value and document data models, which should be sufficient for your duplicate checking needs.

Regarding your other options:

Amazon RDS (MySQL) or Aurora would be overkill for your current requirements. These relational databases are more suitable for complex data structures and relationships, which your project doesn't seem to need. They also come with higher operational complexity and potentially higher costs.

For the free tier usage, DynamoDB offers a generous free tier that includes 25 GB of storage and enough read and write capacity for most small-scale applications. This should be sufficient for your initial setup and testing. You can indeed use a shared account for a developer to set it up, but ensure you follow AWS best practices for account security and access management.

To integrate DynamoDB with Google Sheets via Google Scripts, you'll need to use AWS SDK for JavaScript. While this might require a bit more setup compared to querying another Google Sheet, it will significantly improve performance and reliability.

For further guidance and support, AWS provides extensive documentation for DynamoDB. You can also leverage AWS re:Post (formerly AWS Forums) for community support. Amazon IQ is a good option if you need personalized expert help for implementation.

Remember to design your DynamoDB table structure carefully to optimize for your most frequent query patterns, particularly for duplicate checking. This might involve creating composite keys or using secondary indexes to enable efficient querying on multiple attributes.

Sources
Database - AWS Well-Architected Framework (2022-03-31)
Database - AWS Well-Architected Framework (2023-04-10)
Choosing an AWS database service - Choosing an AWS database service
Database configuration, backup, and failover - Web Application Hosting in the AWS Cloud

profile picture
answered 24 days ago
  • Thanks for this - would be looking for someone to validate this and let us know which may be best! Super grateful for any and all help - thanks in advance!!

0

Hi. Based on your requirements I agree with the previous response, I would also suggest to use DynamoDB. However, you need to be aware of a few things:

  1. Indexes. If you want to query data in DynamoDB on multiple fields, you will need to create an index for each field. This usually leads to a higher price of DynamoDB
  2. Data synchronization. As I understood, your team inserts data into sheets while you want to check duplicates in the Database connected to the sheet. This might lead to data loss due to multiple sources of truth. You need to have one source of truth, either a Google Sheet or a database.
  3. In general, there is no direct integration of Google Sheets with DynamoDB. You would have to have some middleware custom application in between.
  4. As an alternative you might want to look at other databases with more native integration with Google Sheets. Google Big

Using SQL (relational) database might also be an option, in case you will want to use this data in other applications and in case your data schema grows in a way, that relational models will be more efficient.

profile picture
answered 14 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions