Uploading a Dataframe to AWS S3 Bucket from SageMaker

0

After successfully uploading CSV files from S3 to SageMaker notebook instance, I am stuck on doing the reverse.

I have a dataframe and want to upload that to S3 Bucket as CSV or JSON. The code that I have is below:

bucket='bucketname'
data_key = 'test.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
df.to_csv(data_location)
I assumed since I successfully used pd.read_csv() while loading, using df.to_csv() would also work but it didn't. Probably it is generating error because this way I cannot pick the privacy options while uploading a file manually to S3. Is there a way to upload the data to S3 from SageMaker?

已提問 5 年前檢視次數 4114 次
1 個回答
0
已接受的答案

One way to solve this would be to save the CSV to the local storage on the SageMaker notebook instance, and then use the S3 API's via boto3 to upload the file as an s3 object. S3 docs for upload_file() available here.

Note, you'll need to ensure that your SageMaker hosted notebook instance has proper ReadWrite permissions in its IAM role, otherwise you'll receive a permissions error.

code you already have, saving the file locally to whatever directory you wish

file_name = "mydata.csv"
df.to_csv(file_name)

instantiate S3 client and upload to s3

import boto3

s3 = boto3.resource('s3')
s3.meta.client.upload_file(file_name, 'YOUR_S3_BUCKET_NAME', 'DESIRED_S3_OBJECT_NAME')
Alternatively, upload_fileobj() may help for parallelizing as a multi-part upload.

已回答 5 年前
profile picture
專家
已審閱 10 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南