Skip to content

aws glue relationalize method ?

0

following sample/example provided by aws here - https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html#aws-glue-programming-python-samples-legislators-writing

few json files are processed via crawler , which creates glue catalog tables , these tables are joined based on keys to create on dynamic dataframe with all the data, from what i can tell

l_history = Join.apply(orgs,
                       Join.apply(persons, memberships, 'id', 'person_id'),
                       'org_id', 'organization_id').drop_fields(['person_id', 'org_id'])
print "Count: ", l_history.count()
l_history.printSchema()

and in step after that , relationalize function is called, to build dynamic frame collection , such that it can be written to a relational database like redshift .

dfc = l_history.relationalize("hist_root", "s3://glue-sample-target/temp-dir/")
dfc.keys()

Question: this example , build multiple tables along with the "hist_root" table mentioned in above line. I have two json files, sample below. i follow same steps as example above , combine the dataframes with id , product_id keys , and apply relationalize . but it gives me only one table . I assume based on the quantity and nature of my data just one table with everything technically works as well, but my goal is to push this to rds and i would like to have two tables , one first called product with id and product , another table with product details .

is there a way to force , the relationalize method to produce two different tables ?

file 1

id        product
abc     apple
efg      banana
....

file 2

price  |   description   | product_id 
20                                     abc
30     |    ......                 |   efg
asked 2 years ago430 views
1 Answer
0

The relationalize() function in AWS Glue is designed to flatten nested schema and create multiple tables from nested fields. However, in your case, since the data is already flat and not nested, relationalize() will not create multiple tables.

If you want to create two separate tables, one for product with id and product, and another for product details, you don’t need to use relationalize(). Instead, you can create two separate DynamicFrames, one for each table, and write them separately to your RDS instance.

EXPERT
answered 2 years ago
  • @Giovanni - thank you for your response. is there a way , i can retain foreign key relationship between these two tables , when i write it to the database?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.