Skip to content

Rds mysql to iceberg table in s3 cdc using firehose

0

Tried replicating this new feature of cdc from rds mysql to iceberg table in s3 followed the aws documentation from setting up private Mysql Rds creating and adding parameter group to it with modified properties increased retention period to 30 hrs created target group with the ipv4 address of rds created nlb with the previous target group as target and created vpc endpoint service with previous nlb and modified the acceptance criterion and added firehose in allowed principle then configured firehose also i have used aws secret manager for rds credentials and connected with rds through ec2 jump host for this still not getting data in firehose and there are no errors as well.

2 Answers
0

I understand you're trying to set up change data capture (CDC) from an RDS MySQL database to Apache Iceberg tables in S3 using Amazon Data Firehose, and you're encountering issues. Let's review the process and potential areas to check:

  1. Database Configuration: You've correctly set up the MySQL RDS instance with a parameter group that includes the necessary properties for CDC. Increasing the retention period to 30 hours is also a good step.

  2. Network Configuration: You've created a target group with the RDS instance's IPv4 address, set up a Network Load Balancer (NLB) with this target group, and created a VPC endpoint service using the NLB. This setup is intended to allow Firehose to connect to your RDS instance securely.

  3. Firehose Configuration: You've configured Firehose and used AWS Secrets Manager for RDS credentials, which is the recommended approach for secure credential management.

  4. Connectivity: You mentioned using an EC2 jump host to connect to the RDS instance, which suggests your RDS is in a private subnet.

Given that you're not seeing any errors but also not getting data in Firehose, here are some areas to investigate:

  1. Verify that the Firehose delivery stream is correctly configured to use the VPC endpoint service you created.

  2. Check that the VPC endpoint service is properly associated with your Firehose delivery stream.

  3. Ensure that the necessary permissions are in place for Firehose to access the RDS instance, Secrets Manager, and S3.

  4. Confirm that the database user specified in Secrets Manager has the required permissions for CDC operations on the RDS instance.

  5. Verify that the target Iceberg tables in S3 are properly set up and Firehose has the necessary permissions to write to them.

  6. Check Firehose metrics and logs for any indications of connection issues or data processing problems.

  7. Ensure that your RDS instance is actually generating CDC events. You might want to make some test changes to the database to trigger CDC events.

  8. Verify that the VPC endpoint service's acceptance settings allow Firehose to establish a connection.

If after checking these areas you still don't see data flowing, you may want to engage AWS support for further troubleshooting, as they can provide more detailed diagnostics on the Firehose service side.
Sources
Replicate changes from databases to Apache Iceberg tables using Amazon Data Firehose (in preview) | AWS News Blog
Replicate database changes to Apache Iceberg Tables with Amazon Data Firehose - Amazon Data Firehose
Amazon Data Firehose supports continuous replication of database changes to Apache Iceberg Tables in Amazon S3 - AWS

answered a year ago
0

Hi Akshat,

You're definitely on the right path — setting up CDC from RDS MySQL to Iceberg via Amazon Data Firehose involves several moving parts, and it's great that you've followed the full chain including Secrets Manager, NLB, VPC endpoint, and private connectivity.

Here are some specific troubleshooting areas to help you identify where things might be stalling:

Things to Double-Check:

  1. Is the Source Generating CDC Events? Confirm that binary logging is enabled on your RDS MySQL instance (binlog_format = ROW, binlog_row_image = FULL, etc.).

Make sample insert/update/delete operations on the source table to trigger CDC.

Ensure tables you’re modifying are not excluded by the Firehose source table filters.

  1. Firehose Delivery Stream Settings Check that your VPC configuration in Firehose points to the correct VPC endpoint service backed by your NLB.

Make sure the IAM role used by Firehose has permissions to:

Read from AWS Secrets Manager

Connect to your RDS MySQL (via NLB)

Write to the destination S3 bucket

  1. VPC Endpoint Service Acceptance Confirm the VPC endpoint service has accepted the connection from Firehose.

Ensure Firehose is listed as an allowed principal in the endpoint policy.

  1. Monitoring & Logs Firehose may not show errors unless logging is enabled.

Go to Monitoring > CloudWatch Logs for your delivery stream and check for failures or silent drops.

Enable Amazon Data Firehose extended S3 logging to a separate bucket temporarily.

  1. Target S3/Iceberg Table Setup Make sure your Iceberg tables are properly cataloged (e.g., using AWS Glue) and the Firehose destination format matches.

Firehose needs write access to S3 and your Glue catalog, if applicable.

Tips: Latency can occur — by default Firehose buffers records before pushing to S3 (60s / 1MB).

You can force flushing during tests by lowering the buffer size temporarily.

Test Firehose by pointing it to a public MySQL source (or a less restricted env) to isolate the issue.

If after checking all this you're still not seeing data flow, I recommend opening a support case with AWS so they can look into the internal diagnostics for your Firehose stream. Since this CDC → Iceberg feature is in preview, it's also possible you're hitting a silent edge case.

Let me know if you’d like help validating the IAM role or Firehose config in detail!

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.