Keep schema across environments

0

I have a need to keep consistency of schemas across environments and I am after a recommendation on best practice. I understand timestream is schemaless. Schema is dynamically created according to the metrics we write to the engine. This is all good, until it isn't. When I am evolving my application, some times, I have data that comes first to my non-production environments. The behaviour of the application should be the same across environments. If in production I don't have data, the queries will bring back no rows and the code execution ends. However, what I get, is errors in production of the type:

failed to run timestream query: operation error Timestream Query: 
Query, https response error StatusCode: 400, RequestID: U3NQ24QTXMOUYPQGDWHJXPS7QU, 
ValidationException: line 26:11: Column 'new_column_with_no_value_in_prod' does not exist

I am thinking adding a step in my pipeline that ensures to write a dummy record, to ensure I have the same schema across environments. Here I have plenty of options how to achieve this. I am wondering though, is there any recommended approach? Am I missing something?

asked 3 months ago123 views
1 Answer
0

Hi,

You may want to use same strategy as the one used in this blog post. They create schema for other reasons than yours. But, the method they use remains applicable to your use case.

See https://aws.amazon.com/blogs/database/store-and-analyze-time-series-data-with-multi-measure-records-magnetic-storage-writes-and-scheduled-queries-in-amazon-timestream/

They provide code samples that you can reuser and adapt to your situation.

Best,

Didier

profile pictureAWS
EXPERT
answered 3 months ago
  • Thank you Didier,

    Just to be clear I understand your recommendation here:

    Based on what I see in that post, the strategy is: write data to ensure schema. They have a python script that is run to populate some data first (run.sh).

    In other words, if I want to handle this in my pipeline, I would also have a step in my deployment to production, either before or after deployment, that would trigger a dummy insertion of data, to ensure the schema has been created?

    My problem is that not all changes I introduce, immediately should have an effect in production, as not all the measures we are collecting are always readily available in production. However, as I explained previously, absence of data should be supported by the application, as absence of data is just an empty resultset and my application can handle that scenario.

    So, is the recommendation here:

    • Ensure to always have data in production, even if that data is just dummy data.

    Is that correct?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions