Data read and update real-time near real time by LOBs

0

Hello team,

We're aiming to establish a centralized data repository to streamline the handling of large data payloads originating from different Lines of Business (LOBs).

The concept is to intercept a substantial data payload from one LOB, store it in the central Data repository, and allow other LOBs to update/read specific sections of this payload.

the objective is to facilitates easy access to the most recent data updates by any LOB

and to avoid direct API calls or data replication between LOBs.

The objective is to consolidate data in a single location for convenient access by all stakeholders.

The captured payload is in JSON format, typically around 50KB (a complex object cluster containing numerous entities and sub-entities related to client information).

For our central data repository, we decided to select Aurora DB as a solution for this. (which could alternatively be DynamoDB). So anyone interested in accessing specific data segments can retrieve them directly from this centralized location.

I'm seeking insights on the optimal format for storing this big payload in the central Aurora DB. Should we retain it as a JSON payload, or should we consider creating a comprehensive Entity-Relationship Diagram (ERD) with associated tables, relationships, and constraints and store this payload on those tables (can be challenging task) ?

if JSON is better to store it as one big JSON payload, or to split the JSON payload in some few tables Like client info JSON payload in one table, contact info JSON in another, child info JSON in another,...

We're particularly interested in the best approach to facilitate real-time reading and updating of the centralized data by different LOBs.

Your expertise and input on this matter would be greatly appreciated.

Thank you.

1 Answer
3

Good Morning Jess,

Choosing the optimal format for storing your data in a centralized Aurora DB to facilitate efficient reading and updating by different LOBs involves weighing the pros and cons of storing the data as a JSON payload versus normalizing it into relational tables. Let's break down the considerations: Storing as JSON in a Single Table:

Pros: Simplicity: Storing the entire JSON payload in a single column can be straightforward and align well with the structure of your incoming data. Quick Reads: Retrieving the entire payload in one query can be efficient for applications that need the full dataset.

Cons: Limited Query Flexibility: Extracting specific data elements within the JSON can be complex and less efficient, especially for ad hoc queries. Indexing Challenges: It may be challenging to efficiently index and query specific fields nested within the JSON structure. Normalization into Relational Tables:

Pros: Query Flexibility: Breaking down the JSON into relational tables based on entities (e.g., client info, contact info, child info) allows for optimized querying. Data Integrity: Enforcing constraints and relationships through the use of foreign keys ensures data integrity. Performance Optimization: Indexing and optimizing queries become more straightforward with normalized data.

Cons: Increased Complexity: Designing an ERD and managing the relationships between tables can be more complex upfront. Data Retrieval Overhead: Joining tables to reconstruct the original payload can add overhead, especially if the data structure changes frequently.

Recommendation: Given your scenario where data payloads are around 50KB and contain complex structures, consider a hybrid approach:

Store the Full JSON Payload: Keep a primary table where the entire JSON payload is stored for quick retrieval when needed.

Normalize Specific Entities: Identify key entities within the payload (e.g., client, contact, child) and create separate relational tables for them.

Use Views or Materialized Views: Create database views or materialized views that provide a normalized representation of the data for efficient querying.

Implement Indexing Strategically: Use indexing to optimize queries on specific fields or entities within the JSON payload or relational tables.

This approach combines the simplicity of storing the entire payload with the flexibility and performance gains of relational tables where necessary. It allows LOBs to access the data in the way that best suits their needs—whether that's retrieving the full payload or querying specific entities. Additionally, leveraging database features like views and indexing can further optimize performance and data accessibility.

Ismael Murillo

AWS
answered 14 days ago
  • Thank you for you answer! I would love also to have some insight from peoples how have experience on Aurora or have worked in similar use case to share their thoughts / experience .. rather then a GenAI response

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions