Questions tagged with Amazon Simple Storage Service

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

1
answers
0
votes
34
views
asked 2 months ago

Best Practice: looking to structure my aws lambda function with pandas manipulations

Im new to AWS Lambda so please take it easy on me :) Im getting a lot of errors in my code and Im not sure the best way to troubleshoot other than look at the cloudwatch console and adjust things as necessary. If anyone has any tips for troubleshooting Id appreciate it! Heres my plan for what I want to do and please let me know if this makes sense: upload file to s3 bucket -> 2. upload triggers a lambda to run (inside this lambda is a python script that modifies the data source. The source data is a messy file) -> 3. store the output to the same s3 bucket in a separate folder - > 4. (future state) perform analysis on the new json file. I have my s3 bucket created and I have setup the lambda to trigger when a new file is added. That part is working! I have added my python script (which works on my local drive) portion to the lambda function w/in the code section of lambda. The errors am getting errors consist of saying that my 6 global variables (df_a1-df_aq) are not defined. If I move them out of the function then it works, however when I get to the merge portion of my code I am getting an error saying that says "cannot merge a series without a name" I gave them a name using the name= object and Im still getting this issue. ```try: import json import boto3 import pandas as pd import time import io print("All Modules are ok ...") except Exception as e: print("Error in Imports ") s3_client = boto3.client('s3') #df_a1 = pd.Series(dtype='object', name='test1') #df_g1 = pd.Series(dtype='object', name='test2') #df_j1 = pd.Series(dtype='object', name='test3') #df_p1 = pd.Series(dtype='object', name='test4') #df_r1 = pd.Series(dtype='object', name='test5') #df_q1 = pd.Series(dtype='object', name='test6') def Add_A1 (xyz, RC, string): #DATA TO GRAB FROM STRING global df_a1 IMG = boolCodeReturn(string[68:69].strip()) roa = string[71:73].strip() #xyzName = string[71:73].strip() #ADD RECORD TO DATAFRAME series = pd.Series (data=[xyz, IMG, roa], index=['XYZ', 'IMG', 'Roa']) df_a1 = df_a1.append(series, ignore_index=True) def Add_G1 (xyz, RC, string): global df_g1 #DATA TO GRAB FROM STRING gcode = string[16:30].strip() ggname = string[35:95].strip() #ADD RECORD TO DATAFRAME series = pd.Series (data=[xyz, gcode, ggname], index=['XYZ', 'Gcode', 'Ggname']) df_g1 = df_g1.append(series, ignore_index=True) def Add_J1 (xyz, RC, string): #DATA TO GRAB FROM STRING global df_j1 xyzName = string[56:81].strip() #ADD RECORD TO DATAFRAME series = pd.Series (data=[xyz, xyzName], index=['XYZ', 'XYZName']) df_j1 = df_j1.append(series, ignore_index=True) def Add_P01 (xyz, RC, string): global df_p1 #DATA TO GRAB FROM STRING giname = string[50:90].strip() #ADD RECORD TO DATAFRAME series = pd.Series (data=[xyz, giname], index=['XYZ', 'Giname']) df_p1 = df_p1.append(series, ignore_index=True) def Add_R01 (xyz, RC, string): global df_r1 #DATA TO GRAB FROM STRING Awperr = boolCodeReturn(string[16:17].strip()) #PPP= string[17:27].lstrip("0") AUPO = int(string[27:40].lstrip("0")) AUPO = AUPO / 100000 AupoED = string[40:48] #ADD RECORD TO DATAFRAME series = pd.Series (data=[xyz, AUPO, Awperr, AupoED], index = ['XYZ', 'AUPO', 'Awperr', 'AupoED']) df_r1 = df_r1.append(series, ignore_index=True) def Add_Q01 (xyz, RC, string): global df_q1 #DATA TO GRAB FROM STRING #PPP= string[17:27].lstrip("0") UPPWA = int(string[27:40].lstrip("0")) UPPWA = UPPWA / 100000 EDWAPPP = string[40:48] #ADD RECORD TO DATAFRAME series = pd.Series (data=[xyz, UPPWA, EDWAPPP], index = ['XYZ', 'UPPWA', 'EDWAPPPer']) df_q1 = df_q1.append(series, ignore_index=True) def boolCodeReturn (code): if code == "X": return 1 else: return 0 def errorHandler(xyz, RC, string): pass def lambda_handler(event, context): print(event) #Get Bucket Name bucket = event['Records'][0]['s3']['bucket']['name'] #get the file/key name key = event['Records'][0]['s3']['object']['key'] response = s3_client.get_object(Bucket=bucket, Key=key) print("Got Bucket! - pass") print("Got Name! - pass ") data = response['Body'].read().decode('utf-8') print('reading data') buf = io.StringIO(data) print(buf.readline()) #data is the file uploaded fileRow = buf.readline() print('reading_row') while fileRow: currentString = fileRow xyz = currentString[0:11].strip() RC = currentString[12:15].strip() #this grabs the code the indicates what the data type is #controls which function to run based on the code switcher = { "A1": Add_A1, "J1": Add_J1, "G1": Add_G1, "P01": Add_P01, "R01": Add_R01, "Q01": Add_Q01 } runfunc = switcher.get(RC, errorHandler) runfunc (xyz, RC, currentString) fileRow = buf.readline() print(type(df_a1), "A1 FILE") print(type(df_g1), 'G1 FILE') buf.close() ##########STEP 3: JOIN THE DATA TOGETHER########## df_merge = pd.merge(df_a1, df_g1, how="left", on="XYZ") df_merge = pd.merge(df_merge, df_j1, how="left", on="XYZ") df_merge = pd.merge(df_merge, df_p1, how="left", on="XYZ") df_merge = pd.merge(df_merge, df_q1, how="left", on="XYZ") df_merge = pd.merge(df_merge, df_r1, how="left", on="XYZ") ##########STEP 4: SAVE THE DATASET TO A JSON FILE########## filename = 'Export-Records.json' json_buffer = io.StringIO() df_merge.to_json(json_buffer) s3_client.put_object(Buket='file-etl',Key=filename, Body=json_buffer.getvalue()) t = time.localtime() current_time = time.strftime("%H:%M:%S", t) print("Finished processing at " + current_time) response = { "statusCode": 200, 'body': json.dumps("Code worked!") } return response `
1
answers
0
votes
23
views
asked 2 months ago

Find out hidden costs when moving to AWS

Hello everyone 👋 I am opening this post to ask for some costs-related information as I would like to get an estimation of how much money I would be paying for the architecture of my service. The architecture is quite simple. I would like to ship some data from Google Cloud infrastructure to an AWS S3 bucket and then download it in an EC2 machine to process it. This is a picture of that diagram: ![AWS arch diagram](/media/postImages/original/IMu5VnT59oTDuiZ7T2sYOHPg) With regards to costs, and as far as I have found, I would **only** be paying the data transfer Google Cloud to the AWS one as **network egress costs** and the costs related to hosting the information in S3. As stated in the ["overview of data transfer costs for common architectures"](https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/) and [Amazon S3 pricing guide - data transfer section](https://aws.amazon.com/s3/pricing/): - I don't pay for data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region) if I use an Internet Gateway to access that data. - I don't pay for data transferred in from the internet. Am I right? Am I missing anything in this diagram (sorry for the networking abstraction I'm making in the diagram above. As stated in the paragraph above, I'd be accessing the S3 bucket by using an Internet Gateway, both EC2/S3 running in the same region). Thanks lots in advance!
1
answers
0
votes
40
views
asked 2 months ago