AWS Glue Job AnalysisException Error

0

Hi Team,

I have moved mongo DB data to S3 as Parquet format using AWS Glue Job, I am trying to pass the mongo DB collection name as a parameter, While i am executing the script i am getting AnalysisException Error.

I have refer the below blog for passing parameter to AWS Glue job.

https://stackoverflow.com/questions/52316668/aws-glue-job-input-parameters

Here is my Script, please find the scrips and let me know how to solve this issue.

`

import sys
from datetime import datetime
from awsglue.transforms import *
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql import SparkSession
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['collection'])

# Create a GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# Set the MongoDB connection options
mongo_uri = "mongodb://<username>:<password>@<hostname>:27017/<databasename>.'collection'"
mongo_read_config = {
    "uri": mongo_uri,
    "spark.mongodb.input.uri": mongo_uri
}

# Read data from MongoDB
mongo_data = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
    .options(**mongo_read_config) \
    .load()
    
    # Get the current year, month, and date
current_year = datetime.now().strftime("%Y")
current_month = datetime.now().strftime("%m")
current_date = datetime.now().strftime("%d")

# Create the S3 output path
s3_bucket = "<bucketname>/output_parquet/"
database_name = "<dbaname>"
collection_name = 'collection'





# Write the data to S3 as Parquet format
#s3_output_path = "s3://<bucketname>/output_parquet/"
s3_output_path = f"s3://{s3_bucket}/{database_name}/{collection_name}/year={current_year}/month={current_month}/date={current_date}"
mongo_data.write.parquet(s3_output_path)

`

  • the full stacktrace should have more indications of the cause, that error is too generic.

1 Answer
0

Krishna,

I think the issue is coming from the MongoDB URI in the script. You are trying to use a variable 'collection' in the URI string, but it's not being interpolated correctly. Here's the problematic line:

mongo_uri = "mongodb://<username>:<password>@<hostname>:27017/<databasename>.'collection'"

The 'collection' variable is inside single quotes, which means it's being treated as a literal string, not a variable. To interpolate the variable into the string, you should use f-string formatting in Python. Here's how to fix it:

mongo_uri = f"mongodb://<username>:<password>@<hostname>:27017/<databasename>.{args['collection']}"

This will replace {args['collection']} with the value of the 'collection' argument passed to the script.

Hope this helps with the issue!

profile picture
Zac Dan
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions