- Newest
- Most votes
- Most comments
Just a small tweak is needed to get the filename:
yfs_category_item_df = glueContext.create_dynamic_frame.from_options(
format_options={"attachFilename": "your_filename_column_name"},
connection_type="s3",
format="parquet",
connection_options={
"path": [
#"s3://" + args['s3_bucket'] + "/" + args['s3_key']
f"s3://abc/oms/YFS_CATEGORY_ITEM/"
],
"recurse": True,
},
transformation_ctx="yfs_category_item_df",
)
Change the value of attachFilename in format_options to have the column name you desire.
The sink being JDBC (BTW for Redshift it would be better to use the Redshift connector) has no impact on what you are trying to do, you could show() the DF before writing and if the filename is there, it will be written like any other columns.
You could use the option attachFilename on the source, see: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html#aws-glue-programming-etl-format-shared-reference or convert to DataFrame, call the function and then back to DynamicFrame (input_file_name() is not an "option", it's a SQL function)
Thanks for your input. "attchFilename" cannot b added when we are using JDBC method to write to Redshift. it threw an error. You are right, input_file_name() is a SQL function only. The way I gave my statement was wrong. When I said "i tried using input_file_name() option ", I meant that this function was also tried and it did not work.
Relevant content
- asked a year ago
- asked 7 years ago
i tried this also, but did not work. I got blank value. I read somewhere that while using CRWALER/Glue Table only this works. Anyway, I found a workaround. IN the code you have mentioned above, I am using "args['s3_key']". This value is coming from Lambda for me. I am passing this value to a variable, and I am using that variable in my "Post Query" while doing a "write_dynamic_frame.from_jdbc_conf". Thanks for your inputs.
This doesn't work and I REALLY need this to work.
This line: https://github.com/awslabs/aws-glue-libs/blob/master/awsglue/context.py#L234 prevents options being passed to the source when "parquet" is the format. But I bypassed that line and even then I get the error:
An error occurred while calling o90.setFormat. Formats not supported for SparkSQL data sources. Got parquet
Anyone have any magic?