AWS Glue job parameter not working

0

Hi Team,

I am trying to take mongoDB backup and store to S3 as a parquet format, so that I have used AWS Glue spark script. When i execute without parameter it's working fine, But if i execute with parameter it's getting error. Below is my script, can someone help me to how to achieve this with parameters.

Code

import sys
from datetime import datetime
from awsglue.transforms import *
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql import SparkSession
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['collection'])
args = getResolvedOptions(sys.argv, ['mongodatabase'])
args = getResolvedOptions(sys.argv, ['mongo_host'])
args = getResolvedOptions(sys.argv, ['password'])
args = getResolvedOptions(sys.argv, ['s3_bucket'])
args = getResolvedOptions(sys.argv, ['username'])

# Create a GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# Set the MongoDB connection options
mongo_uri = "mongodb://'username':'password'@'mongo_host':27017/'mongodatabase'.'collection'"
mongo_read_config = {
    "uri": mongo_uri,
    "spark.mongodb.input.uri": mongo_uri
}

# Read data from MongoDB
mongo_data = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
    .options(**mongo_read_config) \
    .load()
    
    # Get the current year, month, and date
current_year = datetime.now().strftime("%Y")
current_month = datetime.now().strftime("%m")
current_date = datetime.now().strftime("%d")

# Create the S3 output path
#s3_bucket = 's3_bucket'
#database_name = 'mongo_db'
#collection_name = 'collection'

# Write the data to S3 as Parquet format
s3_output_path = f"s3://{s3_bucket}/{database_name}/{collection_name}/year={current_year}/month={current_month}/date={current_date}"
mongo_data.write.parquet(s3_output_path)

#job.commit()


If i execute without parameter (all the values are hard coded)the code it's working fine, How to resolve this issue.

2 Answers
0

Could you please share the error you are getting ?

It seems like it's just a problem on your code:

args = getResolvedOptions(sys.argv, ['collection'])
args = getResolvedOptions(sys.argv, ['mongodatabase'])
args = getResolvedOptions(sys.argv, ['mongo_host'])
args = getResolvedOptions(sys.argv, ['password'])
args = getResolvedOptions(sys.argv, ['s3_bucket'])
args = getResolvedOptions(sys.argv, ['username'])

When you do this you are always overwriting your args variable, so you will only keep the last value (username).

You should do:

args = getResolvedOptions(
                          sys.argv, 
                          ['collection',
                          'mongodatabase',
                          'mongo_host',
                          'password',
                          's3_bucket',
                          'username']
)

To access these parameters all you have to do is call args[<keyname>] like username = args['username'].

Check here for Glue documentation about getResolvedOptions.

Edit: to access this parameter you must call args['username']. 'username' itself is just a string.

# Set the MongoDB connection options
mongo_uri = f"mongodb://{args['username']}:{args['password']}@{args['mongo_host']}:27017/{args['mongodatabase']}.{args['collection']}"
mongo_read_config = {
    "uri": mongo_uri,
    "spark.mongodb.input.uri": mongo_uri
}
answered 8 months ago
  • @ArthurLopes I am getting below error

    An error occurred while calling o92.load. Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='username', source=''mongo_db'', password=<hidden>, mechanismProperties=<hidden>}

  • Even i change the code like above getting same error.

0

@Arthur Lopes

I am getting below error

An error occurred while calling o92.load. Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='username', source=''mongo_db'', password=<hidden>, mechanismProperties=<hidden>}

answered 8 months ago
  • You are passing literal strings to you code not the value of your parameters, check the edit on my answer.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions