following sample/example provided by aws here - https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html#aws-glue-programming-python-samples-legislators-writing
few json files are processed via crawler , which creates glue catalog tables , these tables are joined based on keys to create on dynamic dataframe with all the data, from what i can tell
l_history = Join.apply(orgs,
Join.apply(persons, memberships, 'id', 'person_id'),
'org_id', 'organization_id').drop_fields(['person_id', 'org_id'])
print "Count: ", l_history.count()
l_history.printSchema()
and in step after that , relationalize function is called, to build dynamic frame collection , such that it can be written to a relational database like redshift .
dfc = l_history.relationalize("hist_root", "s3://glue-sample-target/temp-dir/")
dfc.keys()
Question: this example , build multiple tables along with the "hist_root" table mentioned in above line. I have two json files, sample below. i follow same steps as example above , combine the dataframes with id , product_id keys , and apply relationalize . but it gives me only one table . I assume based on the quantity and nature of my data just one table with everything technically works as well, but my goal is to push this to rds and i would like to have two tables , one first called product with id and product , another table with product details .
is there a way to force , the relationalize method to produce two different tables ?
file 1
id product
abc apple
efg banana
....
file 2
price | description | product_id
20 abc
30 | ...... | efg
@Giovanni - thank you for your response. is there a way , i can retain foreign key relationship between these two tables , when i write it to the database?