- Newest
- Most votes
- Most comments
Your plan to use Salesforce CDC events captured by EventBridge and process them in Lambda to update an Iceberg or Hudi table in your Data Lake sounds like a good solution. Using Athena to update the table is a good choice as well, especially if your team is primarily database developers.
You mentioned that one difficulty is overlaying the changes onto the last full record, but you could consider using Delta Lake instead of Iceberg or Hudi. Delta Lake has built-in support for handling CDC and can manage the change data more efficiently. With Delta Lake, you can create a table that stores the full version of the record and a transaction log that tracks the changes. Delta Lake can apply the changes to the table and maintain its version history automatically.
Overall, it sounds like you have a solid plan in place. You could start with Iceberg or Hudi and switch to Delta Lake if you find that the process of overlaying changes is too cumbersome. Keep in mind that this solution will require some ongoing maintenance and monitoring to ensure that the data is being captured correctly and that the process is running smoothly.
Relevant content
- asked 6 years ago
- asked 3 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 3 years ago