AWS Glue Job AnalysisException Error

0

Hi Team,

I have moved mongo DB data to S3 as Parquet format using AWS Glue Job, I am trying to pass the mongo DB collection name as a parameter, While i am executing the script i am getting AnalysisException Error.

I have refer the below blog for passing parameter to AWS Glue job.

https://stackoverflow.com/questions/52316668/aws-glue-job-input-parameters

Here is my Script, please find the scrips and let me know how to solve this issue.

`

import sys
from datetime import datetime
from awsglue.transforms import *
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql import SparkSession
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['collection'])

# Create a GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# Set the MongoDB connection options
mongo_uri = "mongodb://<username>:<password>@<hostname>:27017/<databasename>.'collection'"
mongo_read_config = {
    "uri": mongo_uri,
    "spark.mongodb.input.uri": mongo_uri
}

# Read data from MongoDB
mongo_data = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
    .options(**mongo_read_config) \
    .load()
    
    # Get the current year, month, and date
current_year = datetime.now().strftime("%Y")
current_month = datetime.now().strftime("%m")
current_date = datetime.now().strftime("%d")

# Create the S3 output path
s3_bucket = "<bucketname>/output_parquet/"
database_name = "<dbaname>"
collection_name = 'collection'





# Write the data to S3 as Parquet format
#s3_output_path = "s3://<bucketname>/output_parquet/"
s3_output_path = f"s3://{s3_bucket}/{database_name}/{collection_name}/year={current_year}/month={current_month}/date={current_date}"
mongo_data.write.parquet(s3_output_path)

`

  • the full stacktrace should have more indications of the cause, that error is too generic.

1回答
0

Krishna,

I think the issue is coming from the MongoDB URI in the script. You are trying to use a variable 'collection' in the URI string, but it's not being interpolated correctly. Here's the problematic line:

mongo_uri = "mongodb://<username>:<password>@<hostname>:27017/<databasename>.'collection'"

The 'collection' variable is inside single quotes, which means it's being treated as a literal string, not a variable. To interpolate the variable into the string, you should use f-string formatting in Python. Here's how to fix it:

mongo_uri = f"mongodb://<username>:<password>@<hostname>:27017/<databasename>.{args['collection']}"

This will replace {args['collection']} with the value of the 'collection' argument passed to the script.

Hope this helps with the issue!

profile picture
Zac Dan
回答済み 10ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ