Keep schema across environments

0

I have a need to keep consistency of schemas across environments and I am after a recommendation on best practice. I understand timestream is schemaless. Schema is dynamically created according to the metrics we write to the engine. This is all good, until it isn't. When I am evolving my application, some times, I have data that comes first to my non-production environments. The behaviour of the application should be the same across environments. If in production I don't have data, the queries will bring back no rows and the code execution ends. However, what I get, is errors in production of the type:

failed to run timestream query: operation error Timestream Query: 
Query, https response error StatusCode: 400, RequestID: U3NQ24QTXMOUYPQGDWHJXPS7QU, 
ValidationException: line 26:11: Column 'new_column_with_no_value_in_prod' does not exist

I am thinking adding a step in my pipeline that ensures to write a dummy record, to ensure I have the same schema across environments. Here I have plenty of options how to achieve this. I am wondering though, is there any recommended approach? Am I missing something?

已提問 3 個月前檢視次數 129 次
1 個回答
0

Hi,

You may want to use same strategy as the one used in this blog post. They create schema for other reasons than yours. But, the method they use remains applicable to your use case.

See https://aws.amazon.com/blogs/database/store-and-analyze-time-series-data-with-multi-measure-records-magnetic-storage-writes-and-scheduled-queries-in-amazon-timestream/

They provide code samples that you can reuser and adapt to your situation.

Best,

Didier

profile pictureAWS
專家
已回答 3 個月前
  • Thank you Didier,

    Just to be clear I understand your recommendation here:

    Based on what I see in that post, the strategy is: write data to ensure schema. They have a python script that is run to populate some data first (run.sh).

    In other words, if I want to handle this in my pipeline, I would also have a step in my deployment to production, either before or after deployment, that would trigger a dummy insertion of data, to ensure the schema has been created?

    My problem is that not all changes I introduce, immediately should have an effect in production, as not all the measures we are collecting are always readily available in production. However, as I explained previously, absence of data should be supported by the application, as absence of data is just an empty resultset and my application can handle that scenario.

    So, is the recommendation here:

    • Ensure to always have data in production, even if that data is just dummy data.

    Is that correct?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南