Storing records for work, where should I start?

0

I have an excel file (395 KB) containing three years' worth of records from my job that I would like to put into a database. Here's how I'm envisioning it: -Each record (row) is an item, there are 3,079 records total over three spreadsheets. -I want to put each record into a table according to year, and query it based on particular attributes (columns). I've been looking at DynamoDB since it seems like it would make the most sense according what I've read in the developer guide. I already made a test table and have put an item in it, but I'm worried about pricing. Spoke to a rep who said it should fall under the free tier since the file I'm working from isn't that big. Also recommended using S3 bucket instead, but I don't know if that method will produce the results I want.

I pretty much want to store ~3,000 items in a database taken from a smaller excel file without breaking the bank. Tried using the pricing calculator, but that didn't help much.

TLDR: I want to make a database but have no idea what I'm doing. Any recommendations on what I should do?

asked a year ago217 views
1 Answer
2
Accepted Answer

First, to pricing: Look at the on-demand pricing page for DynamoDB. The first 25 GB of storage is free; and the read/write requests are very low cost so you might only end up spending cents depending on your usage. You can also see from the free tier page that this is perpetual - not just for the first year.

Second to the database design - some tips that I've learned the hard way over the years:

  • With DynamoDB the structure is completely up to you and the best way to think about it is as a de-normalised database. That is, you might end up repeating a bunch of data but it's cheaper to store and retrieve that way even if it seems inefficient compared to a relational database.
  • That brings the challenge of querying the database: Pulling a record back out based on a unique identifier is extremely fast and very inexpensive. But scanning the table for (say) "all records on a particular date" can be "expensive" in terms of the amount of data that has to be read in order to satisfy the query and that can also lead to higher charges.
  • Therefore, consider carefully how you want to retrieve the data. If you just want to bring things back based on date that's totally fine; but you'll also need some other uniqueness in there to select which record from that date you want.
  • You can always add indexes to make querying simpler; but those add to the cost of the database table. It might be more cost effective to use a different schema instead. Don't be afraid (especially with the amount of data that you have) to just "start again".

Finally, there are many guides on the internet about how to design a schema. Start here and then experiment.

profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions