current_time minus 1hr in Glue Pyspark

0

I need to fetch files that has arrived current_time - 1hr from my S3 bucket for processing. My files name will be in format yyyymmdd-hhmmsssss.parquet (includes milli seconds also). So I am running a glue job to fetch the files that has file name for <= current_timestamp-1hr. Below code, I have used to fetch the time in required format desired_timezone = pytz.timezone('America/New_York') # Replace 'Your_Time_Zone' with your actual time zone current_datetime_2 = datetime.now(desired_timezone).strftime("%Y%m%d-%H%M%S")

I do not know, how to display time for current_time-1hr using above commands in Glue job pyspark code. Can someone please help me to achieve this?

1 Risposta
1
Risposta accettata

Just subtract an hour from the current time with timedelta(hours=1) and format it like your file names using strftime("%Y%m%d-%H%M%S").

You will have something like:

from datetime import datetime, timedelta
import pytz

desired_timezone = pytz.timezone('America/New_York')  # Replace 'Your_Time_Zone' with your actual time zone
current_datetime = datetime.now(desired_timezone)
one_hour_ago_datetime = current_datetime - timedelta(hours=1)

formatted_current_datetime = current_datetime.strftime("%Y%m%d-%H%M%S")
formatted_one_hour_ago_datetime = one_hour_ago_datetime.strftime("%Y%m%d-%H%M%S")

print("Current time:", formatted_current_datetime)
print("One hour ago:", formatted_one_hour_ago_datetime)

Resources:

profile picture
ESPERTO
con risposta 2 mesi fa
AWS
TECNICO DI SUPPORTO
verificato un mese fa
  • thanks a lot. The way you added the TIMEDELTA made the difference. Your solution worked for me :)

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande