AWS Glue job parameter not working

0

Hi Team,

I am trying to take mongoDB backup and store to S3 as a parquet format, so that I have used AWS Glue spark script. When i execute without parameter it's working fine, But if i execute with parameter it's getting error. Below is my script, can someone help me to how to achieve this with parameters.

Code

import sys
from datetime import datetime
from awsglue.transforms import *
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql import SparkSession
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['collection'])
args = getResolvedOptions(sys.argv, ['mongodatabase'])
args = getResolvedOptions(sys.argv, ['mongo_host'])
args = getResolvedOptions(sys.argv, ['password'])
args = getResolvedOptions(sys.argv, ['s3_bucket'])
args = getResolvedOptions(sys.argv, ['username'])

# Create a GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# Set the MongoDB connection options
mongo_uri = "mongodb://'username':'password'@'mongo_host':27017/'mongodatabase'.'collection'"
mongo_read_config = {
    "uri": mongo_uri,
    "spark.mongodb.input.uri": mongo_uri
}

# Read data from MongoDB
mongo_data = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
    .options(**mongo_read_config) \
    .load()
    
    # Get the current year, month, and date
current_year = datetime.now().strftime("%Y")
current_month = datetime.now().strftime("%m")
current_date = datetime.now().strftime("%d")

# Create the S3 output path
#s3_bucket = 's3_bucket'
#database_name = 'mongo_db'
#collection_name = 'collection'

# Write the data to S3 as Parquet format
s3_output_path = f"s3://{s3_bucket}/{database_name}/{collection_name}/year={current_year}/month={current_month}/date={current_date}"
mongo_data.write.parquet(s3_output_path)

#job.commit()


If i execute without parameter (all the values are hard coded)the code it's working fine, How to resolve this issue.

已提問 9 個月前檢視次數 556 次
2 個答案
0

Could you please share the error you are getting ?

It seems like it's just a problem on your code:

args = getResolvedOptions(sys.argv, ['collection'])
args = getResolvedOptions(sys.argv, ['mongodatabase'])
args = getResolvedOptions(sys.argv, ['mongo_host'])
args = getResolvedOptions(sys.argv, ['password'])
args = getResolvedOptions(sys.argv, ['s3_bucket'])
args = getResolvedOptions(sys.argv, ['username'])

When you do this you are always overwriting your args variable, so you will only keep the last value (username).

You should do:

args = getResolvedOptions(
                          sys.argv, 
                          ['collection',
                          'mongodatabase',
                          'mongo_host',
                          'password',
                          's3_bucket',
                          'username']
)

To access these parameters all you have to do is call args[<keyname>] like username = args['username'].

Check here for Glue documentation about getResolvedOptions.

Edit: to access this parameter you must call args['username']. 'username' itself is just a string.

# Set the MongoDB connection options
mongo_uri = f"mongodb://{args['username']}:{args['password']}@{args['mongo_host']}:27017/{args['mongodatabase']}.{args['collection']}"
mongo_read_config = {
    "uri": mongo_uri,
    "spark.mongodb.input.uri": mongo_uri
}
已回答 9 個月前
  • @ArthurLopes I am getting below error

    An error occurred while calling o92.load. Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='username', source=''mongo_db'', password=<hidden>, mechanismProperties=<hidden>}

  • Even i change the code like above getting same error.

0

@Arthur Lopes

I am getting below error

An error occurred while calling o92.load. Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='username', source=''mongo_db'', password=<hidden>, mechanismProperties=<hidden>}

已回答 9 個月前
  • You are passing literal strings to you code not the value of your parameters, check the edit on my answer.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南