Keep schema across environments

0

I have a need to keep consistency of schemas across environments and I am after a recommendation on best practice. I understand timestream is schemaless. Schema is dynamically created according to the metrics we write to the engine. This is all good, until it isn't. When I am evolving my application, some times, I have data that comes first to my non-production environments. The behaviour of the application should be the same across environments. If in production I don't have data, the queries will bring back no rows and the code execution ends. However, what I get, is errors in production of the type:

failed to run timestream query: operation error Timestream Query: 
Query, https response error StatusCode: 400, RequestID: U3NQ24QTXMOUYPQGDWHJXPS7QU, 
ValidationException: line 26:11: Column 'new_column_with_no_value_in_prod' does not exist

I am thinking adding a step in my pipeline that ensures to write a dummy record, to ensure I have the same schema across environments. Here I have plenty of options how to achieve this. I am wondering though, is there any recommended approach? Am I missing something?

質問済み 3ヶ月前129ビュー
1回答
0

Hi,

You may want to use same strategy as the one used in this blog post. They create schema for other reasons than yours. But, the method they use remains applicable to your use case.

See https://aws.amazon.com/blogs/database/store-and-analyze-time-series-data-with-multi-measure-records-magnetic-storage-writes-and-scheduled-queries-in-amazon-timestream/

They provide code samples that you can reuser and adapt to your situation.

Best,

Didier

profile pictureAWS
エキスパート
回答済み 3ヶ月前
  • Thank you Didier,

    Just to be clear I understand your recommendation here:

    Based on what I see in that post, the strategy is: write data to ensure schema. They have a python script that is run to populate some data first (run.sh).

    In other words, if I want to handle this in my pipeline, I would also have a step in my deployment to production, either before or after deployment, that would trigger a dummy insertion of data, to ensure the schema has been created?

    My problem is that not all changes I introduce, immediately should have an effect in production, as not all the measures we are collecting are always readily available in production. However, as I explained previously, absence of data should be supported by the application, as absence of data is just an empty resultset and my application can handle that scenario.

    So, is the recommendation here:

    • Ensure to always have data in production, even if that data is just dummy data.

    Is that correct?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ