Skip to content

Zero-ETL Aurora PostgreSQL → Redshift: Table stuck in ResyncInitiated for hours

0

A table with approximately 1,000 rows and a primary key has been stuck in ResyncInitiated state for over 2 hours after triggering a refresh via ALTER DATABASE INTEGRATION REFRESH TABLE.

The table initially failed with "Found unsupported value in column" error. After fixing the source data, we triggered a refresh. The table moved to ResyncInitiated but never progressed to Synced.

Environment:

  • Aurora PostgreSQL 17.9
  • Redshift provisioned (ra3.large)
  • Same region
  • 309 other tables synced successfully

Steps taken:

  1. Verified source data is clean — no unsupported values remain
  2. Checked svv_integration_table_state — shows ResyncInitiated, no error reason
  3. Checked sys_load_error_detail — no entries for this table
  4. Checked CloudWatch metrics for the integration — no data points returned
  5. Checked RDS and Redshift events — nothing related

Questions:

  1. What could cause a small table to remain in ResyncInitiated for hours with no errors?
  2. Is there a way to view detailed logs for the zero-ETL resync process?
  3. Can a single table resync be canceled without deleting the entire integration?
  4. Is the QUERY_ALL_STATES feature available on provisioned clusters, and what is the correct syntax?
3 Answers
3

To specifically address the "ResyncInitiated" stall, check these high-impact areas:

1. Aurora Replication Health

Zero-ETL relies on logical replication slots. Run this on your Aurora source:

SELECT * FROM pg_replication_slots WHERE slot_name LIKE '%redshift%';

If active is false or confirmed_flush_lsn isn't advancing, the bottleneck is on the source side (e.g., WAL lag or locked transactions).

2. Detailed System Views

On provisioned RA3, use these for deeper status codes than the standard SVV views:

  • SELECT * FROM SYS_ZEROETL_INTEGRATION_TABLES_STATUS;
  • SELECT * FROM SYS_ZEROETL_INTEGRATION_STATUS; (Check last_error_message).

3. The "QUERY_ALL_STATES" Syntax

This is a parameter for AWS CLI/SDK. not SQL.

  • CLI Syntax: aws redshift describe-table-statistics --provisioned-identifier <your-cluster-id> --query-all-states

4. How to "Cancel" or Reset

There is no native cancel command. To force a reset for a single table:

  1. Temporarily exclude the table via your integration filter.

  2. Wait for it to be removed from Redshift.

  3. Re-include it to trigger a fresh handshake.

5. Lock Contention

Check for exclusive locks on the Aurora source table. Even a small table will hang in ResyncInitiated indefinitely if the Zero-ETL process cannot acquire a lock to start the initial snapshot.

EXPERT
answered a month ago
0

Thanks for the detailed response. We tried the system views:

  • SYS_ZEROETL_INTEGRATION_STATUS doesn't exist on our provisioned cluster
  • sys_integration_table_state_change and sys_integration_activity do exist and show state transitions, but only the final state — no insight into what's happening during the 3-hour gap
    between ResyncInitiated and Synced

The table eventually synced after ~3 hours, but our core concerns remain:

  1. No error skipping — one bad row (e.g., an infinity timestamp) fails the entire table. There's no equivalent to DMS's LOG_ERROR or COPY's MAXERROR. In a production environment where we
    don't control all data inputs, this is a major risk.
  2. No resync visibility — during the 3 hours, CloudWatch had zero data points for the integration, sys_load_error_detail was empty, and no events appeared in RDS or Redshift. We had no way to know if it was progressing, stuck, or failing silently.
  3. No granular control — can't cancel a single table resync, can't modify the integration filter while it's syncing, can't skip a table that's blocking. The only nuclear option is deleting and recreating the entire integration.
  4. Resync duration — 3 hours for a 1,000-row table suggests the resync overhead is fixed and independent of table size. Is this expected?

We're coming from DMS CDC which had different problems (silent data loss, stuck checkpoints), but at least offered configurability. Zero-ETL feels like a black box — great when it works,
impossible to troubleshoot when it doesn't.

answered a month ago
EXPERT
reviewed a month ago
  • hm, it’s fair to say that Zero-ETL is currently a 'happy path' tool. As soon as your data quality fluctuates or you need tight SLAs for resyncs, the control provided by DMS or Glue remains indispensable for now.

0

When a table is stuck in ResyncInitiated state during a zero-ETL integration refresh, here are some insights based on your situation:

What could cause the delay:

While the sources don't specifically address ResyncInitiated state delays, there are a few factors that could contribute to this issue. First, ensure that your Amazon Redshift provisioned cluster is not paused, as this would prevent any data replication from proceeding. Additionally, verify that all required database parameters in your Aurora PostgreSQL source are correctly configured - if parameters were recently modified, you may need to reboot the DB cluster for changes to take effect.

Viewing detailed logs:

Unfortunately, the available documentation doesn't provide information about accessing detailed logs specifically for the zero-ETL resync process beyond what's visible in the svv_integration_table_state system view and sys_load_error_detail tables that you've already checked.

Canceling a single table resync:

The documentation doesn't describe a method to cancel an individual table resync operation without affecting the entire integration. The resynchronization process is designed to proceed automatically without manual intervention once triggered.

Alternative approaches to consider:

Since you've already verified the source data is clean, you might try running the ALTER DATABASE INTEGRATION REFRESH TABLES command again for this specific table. If the table continues to remain stuck, you may need to consider whether any recent DDL operations on the source table could be affecting the resync process.

It's worth noting that for Amazon SageMaker AI lakehouse zero-ETL integrations, if issues arise that cannot be resolved, the only option is to delete the integration and create a new one, as zero-ETL integrations do not support refresh or resync operations in that context. However, this is a last resort option.

If the issue persists after these checks, you may need to contact AWS Support for deeper investigation into the integration's internal state.
Sources
Troubleshooting zero-ETL integrations - Amazon Redshift
Troubleshooting Aurora zero-ETL integrations - Amazon Aurora

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.