Skip to content

Way to edit a file in S3

0

Hello, I am storing some AWS Batch and CloudWatch logs on S3 in CSV format. I wrote a python script that gets data for the past 24 hours. With this data, I am going to use Athena to grab the data in S3 and use them on Amazon QuickSight. I heard that files in S3 are immutable, but I want to maintain those data in one file, since I don't want to assign new database on QuickSight everyday. Is the only way to edit the file is to download the file from S3 bucket, make changes and upload the new file with the same key? Or is there another way to do this?

2 Answers
4
Accepted Answer

Hi johnkimm,

Please try the below solution it will be helpful to you to resolve your issue.

If the daily data is substantial, continually appending to a single file can become inefficient. Instead, consider using a partitioned approach, where you store each day's data as a separate file in S3. Athena supports querying partitioned data efficiently, and you can still use QuickSight without creating a new database every day.

Store Data in Partitioned Format: Save each day's data with a specific key pattern, such as ' logs/year=2024/month=05/day=24/data.csv '.

Define Partitions in Athena: Define the partitions in your Athena table. You can automate the process of adding new partitions using AWS Glue or by running MSCK REPAIR TABLE in Athena.

Query Partitioned Data in Athena: Use Athena to query data across multiple partitions. For example, to query data from the last 7 days, you can use a query like:

SELECT * FROM logs WHERE date >= date_sub(current_date, interval '7' day)

Visualize in QuickSight: Create a dataset in QuickSight based on your Athena queries. QuickSight can handle the data aggregation and visualization without needing a single static file.

EXPERT
answered 2 years ago
EXPERT
reviewed 2 years ago
EXPERT
reviewed 2 years ago
EXPERT
reviewed 2 years ago
EXPERT
reviewed 2 years ago
  • oh wow I didn't know that! I'll give this a try! Thank you.

3

hello,

1.Download the File: Use the AWS SDK or any other method to retrieve the file from your S3 bucket. You'll need the bucket name and the key (file path) to locate the file.

2.Modify the File: Once you have the file downloaded, make the necessary changes to it. This could involve appending new data, updating existing data, or any other modifications you need.

3.Upload the Updated File: After making the modifications, upload the updated file back to the same location in your S3 bucket. Be sure to specify the bucket name and key correctly.

Example Process: Let's say you have a CSV file named data.csv in your S3 bucket and you want to append new data to it using Python:

import boto3
import pandas as pd

# Initialize S3 client
s3_client = boto3.client('s3')

# Download the file
bucket_name = 'your-bucket-name'
file_key = 'data.csv'
response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
file_content = response['Body'].read()

# Modify the file (example: append new data)
new_data = pd.DataFrame({'column1': [1, 2, 3], 'column2': ['a', 'b', 'c']})
existing_data = pd.read_csv(file_content)
updated_data = pd.concat([existing_data, new_data])

# Upload the updated file
updated_file_content = updated_data.to_csv(index=False)
s3_client.put_object(Bucket=bucket_name, Key=file_key, Body=updated_file_content.encode('utf-8'))

print("File updated successfully.")

Updating a file in S3 involves downloading the file, making modifications locally, and then uploading the updated version back to S3. This process ensures that you maintain the integrity of your data while allowing for necessary changes. i hope this is helpful thank you

EXPERT
answered 2 years ago
  • I see. So I guess that is the only way. Thank you!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.