Better control over storage tiering

0

I had another thread going about storing data in sitewise in a way I can query across the hot and cold storage tiers a little more seamlessly. A few ideas came up, one of them was to create a shadow by having property change notifications published to IoT Core, then an IoT Core action stores the data somewhere else. I think if I did it that way I could disable cold storage entirely, and just have hot storage have a long enough retention to cover live views and dashboards. Then I could store the data for analytics in something like S3, but I don't know why I'd really do that - I think it probably makes more sense to just stream to Timestream. The thing I lose by all of this is the connection to the models. What I gain is that the cold data is in a real time-series database, rather than something that's mainly useable only through something like Athena (or a pile of custom code).

What I'm wondering, though, is: why not do this in the reverse instead? In other words, have data ingested through IoT core, then an IoT core action writes it to sitewise? I don't think you pay for an API fee on SiteWise when you do it this way. The message ingestion rate (per message) is $1/1M which is the same. You do pay for the action, but you can then set a second action so that incoming data gets written both to SiteWise and to Timestream. I'm trying to figure out if there's a big cost driver and I don't really see it. If SiteWise is writing to IoT Core, then to process that you still need to pay IoT core charges for the rule and action invocation. I feel like it's a wash cost-wise unless your volumes are really high.

I really like the multi-tiered approach to SiteWise. There are just two things I would improve. One is that I would like a way to query data across tiers without knowing where it is. That's really the whole reason I started thinking about this.

The other is I would like other cold storage options, maybe even custom ones where when data is being expired sitewise just invokes a lambda and you can use custom logic to figure out where to put the data. Accessing cold storage files with Athena is great, but it's a little slow, a little pricey, and a little less flexible.

It's a bit odd to me that SiteWise doesn't just integrate with Timestream out of the box.

Thoughts?

profile picture
wz2b
已提問 7 個月前檢視次數 172 次
1 個回答
0
已接受的答案

Hi wz2b. If your devices are connecting directly to IoT Core (as opposed to a SiteWise gateway ingesting directly to SiteWise), then I think it makes sense to use IoT rules to route data to SiteWise and to your preferred cold storage option (if you don't want to use SiteWise cold tier).

What I gain is that the cold data is in a real time-series database, rather than something that's mainly useable only through something like Athena (or a pile of custom code).

Fair enough, but please be aware there are lots of different ways to consume the data once it's in S3. A few examples:

Sometimes ETL is used to land the data somewhere like Redshift.

I really like the multi-tiered approach to SiteWise. There are just two things I would improve. One is that I would like a way to query data across tiers without knowing where it is. That's really the whole reason I started thinking about this. The other is I would like other cold storage options, maybe even custom ones where when data is being expired sitewise just invokes a lambda and you can use custom logic to figure out where to put the data. Accessing cold storage files with Athena is great, but it's a little slow, a little pricey, and a little less flexible. It's a bit odd to me that SiteWise doesn't just integrate with Timestream out of the box.

I can't speak to roadmap here, but we hear you, and your patience will likely be rewarded.

profile pictureAWS
專家
Greg_B
已回答 7 個月前
  • Hey thanks for the response, that's all great info. You're right, I wasn't really considering what other things someone might want to do with the data, like ETL or feeding it into other AWS services; I kind of think I can do all those things by gluing to timestream, too, but I think I'd lose the connection back to the structure (models etc). which is a good reason to do it this way.

    I think the main thing is figuring out how to unify the querying. If you're looking at SiteWise data using Grafana, for example, with a hot tier retention of 30 days, it work great but when you zoom out past the retention period there's nothing there.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南