Firehose RDS Postgres CDC to Iceberg Issues

0

Hi, I'm following the instructions for the new preview feature to stream RDS Postgres CDC to Apache Iceberg tables. I've gotten as far getting my Firehose to talk to my RDS Postgres via a VPC endpoint service. However, my Cloudwatch logs is now showing me this error:

"message": "The database user configured in Firehose Delivery Stream does not have sufficient permission to read from source tables, replication, or watermark table. Database must have logical replication enabled. You need to grant the user REPLICATION privilege to add the source table to a publication, CREATE privileges on the database to add publications, SELECT privileges on the source tables to copy the initial table data, INSERT and SELECT privileges on the watermark table to add snapshot watermarks. ERROR: permission denied for schema replication", "errorCode": "DatabaseSource.MissingPermission"

I'm not a RDS admin by any means, but I did read up on the grants and what a publication is. The documentation for Firehose database connectivity specifies two stored procedure that need to be created and invoked. First sproc creates the Postgres role and grants that Firehose will use to read in the CDC changes, update the watermark table, and add the role to a group. The second sproc here add shared ownership for specific tables with Firehose, though, it's not clear to me the purpose for this and whether it's actually needed.

Given the above, I thought that would complete the DB setup, but after researching the above message, I realized that transaction logs need to be enabled per https://docs.aws.amazon.com/firehose/latest/dev/das-sql-procedure-enable.html, which seems to indicate that I need to manually create publications for the resources that I want to replicate to Iceberg? If so, this seems rather clunky to do given that in my Firehose stream definition, I had specified the tables that I want to replicate, so creating publications would mean an additional provisioning step? Knowing this feature is still in preview, happy to do this to get the data flow going, so I'm curious what SQL commands folks have issues to create the publications in addition to whatever grants are needed by the Firehose role. Note that per the Firehose docs, my source is against the primary RDS endpoint - not ideal, but I could not get the reader endpoint to work.

Thanks!

asked 17 days ago77 views
1 Answer
0

Thank you for providing such a detailed description of your issue. It seems you're encountering permission-related problems while setting up the Firehose RDS Postgres CDC to Apache Iceberg tables feature. Let's address your concerns step by step.

  1. Permissions: The error message indicates that the database user configured in your Firehose Delivery Stream lacks sufficient permissions. To resolve this, ensure that the user has the following privileges:

    • REPLICATION privilege to add source tables to a publication
    • CREATE privileges on the database to add publications
    • SELECT privileges on the source tables for initial data copy
    • INSERT and SELECT privileges on the watermark table
  2. Logical Replication: The error also mentions that logical replication must be enabled on the database. This is a crucial step for CDC to work properly.

  3. Publications: You're correct that publications need to be created for the tables you want to replicate. While it may seem redundant given that you specified tables in your Firehose stream definition, publications are a PostgreSQL feature necessary for logical replication. They define which tables should be replicated.

  4. Transaction Logs: Enabling transaction logs is indeed necessary, as they record all database changes that Firehose needs to replicate.

To create publications for your tables, you can use SQL commands like this:

CREATE PUBLICATION publication_name FOR TABLE table1, table2, table3;

Replace 'publication_name' with a meaningful name and list all the tables you want to replicate.

As for grants, ensure you've run something like:

GRANT SELECT ON ALL TABLES IN SCHEMA public TO firehose_user;
GRANT USAGE ON SCHEMA public TO firehose_user;
GRANT ALL PRIVILEGES ON DATABASE your_database TO firehose_user;
ALTER USER firehose_user WITH REPLICATION;

Replace 'firehose_user' with the actual user you've set up for Firehose, 'public' with your schema name if different, and 'your_database' with your actual database name.

While it may seem like an additional step to create publications, it's a necessary part of setting up logical replication in PostgreSQL. This approach allows for more granular control over what gets replicated, which can be beneficial in more complex scenarios.

Regarding using the primary RDS endpoint instead of the reader endpoint, this is typically recommended for CDC setups to ensure all changes are captured accurately.

Remember, as this feature is in preview, some aspects of the setup process might be refined in future releases to make it more streamlined.
Sources
Replicate database changes to Apache Iceberg Tables with Amazon Data Firehose - Amazon Data Firehose
[Solved] Connectivity Issues with Kinesis Firehose Preview Capability and RDS (CDC) | AWS re:Post
Prerequisites to use database as a source - Amazon Data Firehose

profile picture
answered 17 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions