Glue ETL job write part-r-00 files to same bucket as my input. Any way to change this?

0

I read in files from an S3 bucket, convert to a Spark DataFrame, transform, convert back to a Dyanmic DataFrame and then write to Data Catalog. This creates a bunch of part-r-00 files in the same bucket as my input so then my script then tries to read and process those files as well! Does it have to create these files? Is it possible to set a different bucket for these files? If not is it possible to have my ETL only read files that end in .csv?

S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
    format_options={"quoteChar": '"', "withHeader": True, "separator": ","},
    connection_type="s3",
    format="csv",
    connection_options={"paths": ["s3://bpf-load-forecast/lfo_data/"], "recurse": True},
    transformation_ctx="S3bucket_node1",
)
.
# convert from Dynamic DataFrame to Spark DataFrame
.
.
# transformations
.
.
# convert from Spark DataFrame to Dyanmic DataFrame
.
.
DataCatalogtable_node2 = glueContext.write_dynamic_frame.from_catalog(
    frame = dynamic_df,
    database = db_name,
    table_name = tbl_name,
    transformation_ctx = "DataCatalogtable_node2",
)
bfeeny
demandé il y a 2 ans1166 vues
2 réponses
1
Réponse acceptée

I figured this out. When Glue Data Catalog wanted my "Data Store" folder (which is where it stores the part-r files), I entered the same folder as my S3 source files. Simply changed this to a new empty folder and that fixed this.

bfeeny
répondu il y a 2 ans
AWS
EXPERT
vérifié il y a 2 ans
0

I am facing the same challenge now, but I don't see the "Data Store" section in the new interface. Can you kindly share some pointers?

Mike
répondu il y a 23 jours
  • Actually I have been able to resolve this with a classifier. that uses OpenCSVSerDe and identifies the delimiter, quotechar etc in the file.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions