Glue ETL job write part-r-00 files to same bucket as my input. Any way to change this?

0

I read in files from an S3 bucket, convert to a Spark DataFrame, transform, convert back to a Dyanmic DataFrame and then write to Data Catalog. This creates a bunch of part-r-00 files in the same bucket as my input so then my script then tries to read and process those files as well! Does it have to create these files? Is it possible to set a different bucket for these files? If not is it possible to have my ETL only read files that end in .csv?

S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
    format_options={"quoteChar": '"', "withHeader": True, "separator": ","},
    connection_type="s3",
    format="csv",
    connection_options={"paths": ["s3://bpf-load-forecast/lfo_data/"], "recurse": True},
    transformation_ctx="S3bucket_node1",
)
.
# convert from Dynamic DataFrame to Spark DataFrame
.
.
# transformations
.
.
# convert from Spark DataFrame to Dyanmic DataFrame
.
.
DataCatalogtable_node2 = glueContext.write_dynamic_frame.from_catalog(
    frame = dynamic_df,
    database = db_name,
    table_name = tbl_name,
    transformation_ctx = "DataCatalogtable_node2",
)
bfeeny
asked 2 years ago1154 views
2 Answers
1
Accepted Answer

I figured this out. When Glue Data Catalog wanted my "Data Store" folder (which is where it stores the part-r files), I entered the same folder as my S3 source files. Simply changed this to a new empty folder and that fixed this.

bfeeny
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago
0

I am facing the same challenge now, but I don't see the "Data Store" section in the new interface. Can you kindly share some pointers?

Mike
answered 12 days ago
  • Actually I have been able to resolve this with a classifier. that uses OpenCSVSerDe and identifies the delimiter, quotechar etc in the file.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions