AWS Glue Connection with Secrets Manager Issue

2

I have a Glue Job for migrating data from a Postgres database in RDS to a S3 bucket in Parquet format and a Crawler that connects to Postgres to infer the table schema. Previously, the Glue Connection used by both was configured to authenticate to Postgres via Username and Password, however now I would like it to authenticate via credentials stored in Secrets Manager instead. After updating the Glue Connection to use Secrets Manager, the Glue Job is failing with the following error:

2022-12-14 14:58:07,092 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
  File "/tmp/parquet-job.py", line 25, in <module>
    database=glue_source_database, table_name=table, transformation_ctx="Datasource")
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 787, in from_catalog
    return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 186, in create_dynamic_frame_from_catalog
    makeOptions(self._sc, additional_options), catalog_id),
  File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o75.getCatalogSource.
: java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:349)
	at scala.None$.get(Option.scala:347)
	at com.amazonaws.services.glue.util.DataCatalogWrapper.$anonfun$getJDBCConf$1(DataCatalogWrapper.scala:218)
	at scala.util.Try$.apply(Try.scala:209)
	at com.amazonaws.services.glue.util.DataCatalogWrapper.getJDBCConf(DataCatalogWrapper.scala:209)
	at com.amazonaws.services.glue.GlueContext.getGlueNativeJDBCSource(GlueContext.scala:487)
	at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:320)
	at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:185)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

This is the code for the Glue Job script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['JOB_NAME', 'TABLE_NAME', 'PARQUET_LOCATION', 'PARQUET_TABLE_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

glue_source_database = "postgres-db"

table = args['TABLE_NAME'].replace("-", "_")

output_s3_path = args['PARQUET_LOCATION'] + \
    "/" + args['PARQUET_TABLE_NAME']

Datasource = glueContext.create_dynamic_frame.from_catalog(
    database=glue_source_database, table_name=table, transformation_ctx="Datasource") # the script fails at this line

print("Items Count:  ", Datasource.count())
Datasource.printSchema()

Transform = ApplyMapping.apply(frame=Datasource, mappings=[
    ("id", "int", "id", "int"),
    ("name", "string", "name", "string")],
    transformation_ctx="Transform")

DF = Transform.toDF()
formatted_path = output_s3_path.replace("_", "-")
DF.write.mode('overwrite').format(
    'parquet').save(formatted_path)

job.commit()

I have confirmed that the IAM role for AWS Glue has permission to access my secret. The Crawler is running successfully using the same connection as the Glue Job. I also tried reverting the connection to use Username and Password, and the job succeeded. Is there anything that I am missing? Could this possibly be a bug with AWS Glue?

kyleh
質問済み 1年前1601ビュー
1回答
2

Hi,

I see that you are facing the “NoSuchElementException: None.get” error when using Secrets Manager credentials over the database connection. Unfortunately getCatalogSource and getCatalogSink currently do not support Secret Managers connection which is resulting in this exception. There is already a feature request for the same. However, we do not have an ETA as to when this feature would be implemented. However as a workaround you can either use boto3 to retrieve credentials from secrets manager or use the username/password to your connection options.

I hope the above information helps.

Thank you.

AWS
サポートエンジニア
回答済み 1年前
  • That makes sense. Hopefully that feature will be implemented soon. Thank you!

  • I get the same error when using from_options. Is it a similar issue?

    connection_options = {
            "useConnectionProperties": "true",
            "dbtable": table,
            "connectionName": connectionName,
        }
    
    return glueContext.create_dynamic_frame.from_options(
            connection_type='sqlserver',
            connection_options=connection_options
        )
    
  • Is there an ETA for this fix now ? I ran into the same issue recently and was wondering if something would change soon.

    Thanks a lot for the help

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ