Better control over storage tiering

0

I had another thread going about storing data in sitewise in a way I can query across the hot and cold storage tiers a little more seamlessly. A few ideas came up, one of them was to create a shadow by having property change notifications published to IoT Core, then an IoT Core action stores the data somewhere else. I think if I did it that way I could disable cold storage entirely, and just have hot storage have a long enough retention to cover live views and dashboards. Then I could store the data for analytics in something like S3, but I don't know why I'd really do that - I think it probably makes more sense to just stream to Timestream. The thing I lose by all of this is the connection to the models. What I gain is that the cold data is in a real time-series database, rather than something that's mainly useable only through something like Athena (or a pile of custom code).

What I'm wondering, though, is: why not do this in the reverse instead? In other words, have data ingested through IoT core, then an IoT core action writes it to sitewise? I don't think you pay for an API fee on SiteWise when you do it this way. The message ingestion rate (per message) is $1/1M which is the same. You do pay for the action, but you can then set a second action so that incoming data gets written both to SiteWise and to Timestream. I'm trying to figure out if there's a big cost driver and I don't really see it. If SiteWise is writing to IoT Core, then to process that you still need to pay IoT core charges for the rule and action invocation. I feel like it's a wash cost-wise unless your volumes are really high.

I really like the multi-tiered approach to SiteWise. There are just two things I would improve. One is that I would like a way to query data across tiers without knowing where it is. That's really the whole reason I started thinking about this.

The other is I would like other cold storage options, maybe even custom ones where when data is being expired sitewise just invokes a lambda and you can use custom logic to figure out where to put the data. Accessing cold storage files with Athena is great, but it's a little slow, a little pricey, and a little less flexible.

It's a bit odd to me that SiteWise doesn't just integrate with Timestream out of the box.

Thoughts?

profile picture
wz2b
asked 7 months ago163 views
1 Answer
0
Accepted Answer

Hi wz2b. If your devices are connecting directly to IoT Core (as opposed to a SiteWise gateway ingesting directly to SiteWise), then I think it makes sense to use IoT rules to route data to SiteWise and to your preferred cold storage option (if you don't want to use SiteWise cold tier).

What I gain is that the cold data is in a real time-series database, rather than something that's mainly useable only through something like Athena (or a pile of custom code).

Fair enough, but please be aware there are lots of different ways to consume the data once it's in S3. A few examples:

Sometimes ETL is used to land the data somewhere like Redshift.

I really like the multi-tiered approach to SiteWise. There are just two things I would improve. One is that I would like a way to query data across tiers without knowing where it is. That's really the whole reason I started thinking about this. The other is I would like other cold storage options, maybe even custom ones where when data is being expired sitewise just invokes a lambda and you can use custom logic to figure out where to put the data. Accessing cold storage files with Athena is great, but it's a little slow, a little pricey, and a little less flexible. It's a bit odd to me that SiteWise doesn't just integrate with Timestream out of the box.

I can't speak to roadmap here, but we hear you, and your patience will likely be rewarded.

profile pictureAWS
EXPERT
Greg_B
answered 7 months ago
  • Hey thanks for the response, that's all great info. You're right, I wasn't really considering what other things someone might want to do with the data, like ETL or feeding it into other AWS services; I kind of think I can do all those things by gluing to timestream, too, but I think I'd lose the connection back to the structure (models etc). which is a good reason to do it this way.

    I think the main thing is figuring out how to unify the querying. If you're looking at SiteWise data using Grafana, for example, with a hot tier retention of 30 days, it work great but when you zoom out past the retention period there's nothing there.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions