Skip to content

Glue Schema Registry Checkpoint Definition

1

I have some questions about Glue Schema Registry Checkpoint.

The checkpoint definition in AWS documentation doesn't seem to be right. In this glue schema documentation under schema versioning and compatibility section, it says

A schema version that is marked as a checkpoint is used to determine the compatibility of registering new versions of a schema.

However, I find that is not true. Based on my experiment, when you register a new version of schema, it always need to be compatible with previous version. For example,

  • v3 needs to be compatible with v2,
  • v2 needs to be compatible with v1

Based on the documentation, it implies that

  • v3 just needs to be compatible with v1 not v2 given that v1 is checkpoint version.

but that will give me a failed status schema saying incompatible schema definition Here is another bit of documentation in a different page says

If the version changed the compatibility mode, the version will be marked as a checkpoint.

This is looking right to me, as I found that if user need to change the schema compatibility, they need to change the checkpoint to be the latest version.

To conclude, I wonder

  • what is its actual definition?
  • what is it used for?
  • does this affect schema validation?
asked 3 years ago1.1K views
1 Answer
0
Accepted Answer

Hi,

The Glue Schema Registry Checkpoint changes to the most recent version of the schema by default when we change/modify the schema definition or compatibility mode.If you want to specify a specific version of the schema, you must use CLI/SDK; you can specify the version using the updateSchema API, and you can also specify the latest version of the schema.

Where as when a new version of a schema is submitted to the registry, the compatibility rule applied to the schema name is used to determine if the new version can be accepted. There are 8 compatibility modes: NONE, DISABLED, BACKWARD, BACKWARD_ALL, FORWARD, FORWARD_ALL, FULL, FULL_ALL. Here the BACKWARD compatibility implies that the new schema will be compared to the checkpoint schema. If the schemas do not match, then it will reject the new version of the schema. With "backward" compatibility, you are only able to delete fields and add new fields.

Thank you.

AWS
SUPPORT ENGINEER
answered 3 years ago
AWS
EXPERT
reviewed 3 years ago
  • If I'm understanding this correctly, when you register a new version of the schema, that new version should become the checkpoint. Is that correct? I have registered new versions via the web console and observed the previous version remaining tagged as the checkpoint (i.e. add v2, but v1 remains the checkpoint).

    The registry documentation (https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html) introduces checkpoints early on, but then completely ignores them. As it currently stands, the documentation states that BACKWARD compatibility is relative to the previous schema version, not the checkpoint (which is the behavior the OP and I observe). Can you clarify what is expected here?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.