How are Neptune I/O's calculated for billing purposes?

Question

I've just started experimenting with Neptune for the past month, and my usage has been pretty minimal. I've probably only got about a dozen nodes in the database, and I've been doing some queries through the Gremlin Console and through AWS Lambda functions just to familiarize myself with how things work. I'm the only person who has access to this account.  
  
I just got my first month's bill, and it says that I've had 797,681 I/O's, which seems impossible! I'm not worried about the $0.16 charge for that this month, but I'm pretty concerned about how this might scale.  
  
I've only had one Neptune cluster running for a total of 124.266 hours this month. I'd estimate that as much as 25% of that time I wasn't interacting with the cluster at all. When I was interacting with it, however, I'd estimate I was only submitting about 100 queries in an hour at the very most, and often much less than that. Yet, at 797,681 I/O's in my billing, that works out to an average of 6,419 queries per hour!   
  
Am I missing something here? How does Amazon calculate I/O's? I had assumed that a single, multi-step query would count as a single I/O, but might it actually count as 100's of I/O's, assuming that there are multiple steps, and each of those steps may need to touch multiple quads in order to find the desired information? If that's the case, it seems like the cost has the potential to quickly get out of hand. If someone could clarify how I/O's are calculated, I'd appreciate it!

Accepted Answer

IOs occur whenever the database needs to fetch data from the underlying storage layer, plus there is a low number of IOs caused by periodic pings. The majority of IOs typically happen (a) during the data loading process and (b) when querying databases where the database does not entirely fit into main memory. Conversely, this means that querying is free of IO cost if your data (or the portion of the data touched during querying) fits into the portion of the memory that is dedicated for buffering data.   
  
CloudWatch offers you two metrics, VolumeReadIOPS and VolumeWriteIOPS,  that allows to understand and monitor IO cost. If you’re seeing consistently high IO during query execution it may become beneficial to switch to a larger instance type (but in your case, an overall of <1B IOs per month seems fairly low). I would recommend monitoring these metrics as you develop and run benchmarks for your application.   
  
If you would like to get support in looking at your queries that are causing IOs in more detail please reach out to our support team with your cluster ID. We’d be happy to help.

How are Neptune I/O's calculated for billing purposes?

相關內容