An error occurred while calling o79.getDynamicFrame. java.lang.reflect.InvocationTargetException

0

I'm getting the error in the title while running a Glue Job to transform a CSV file to Parquet format. The code is below (unedited from the AWS recommendation):

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "appdata", table_name = "payment_csv", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "appdata", table_name = "payment_csv", transformation_ctx = "datasource0")
## @type: ApplyMapping
## @args: [mapping = [("_id", "string", "_id", "string"), ("amountpaidcents", "long", "amountpaidcents", "long"), ("_p_establishment", "string", "_p_establishment", "string"), ("type", "string", "type", "string"), ("ispaid", "boolean", "ispaid", "boolean"), ("for", "string", "for", "string"), ("_created_at", "string", "_created_at", "string"), ("_updated_at", "string", "_updated_at", "string"), ("formailingorder", "string", "formailingorder", "string"), ("formailinglistorder", "string", "formailinglistorder", "string"), ("_p_user", "string", "_p_user", "string"), ("`spcharge.created`", "long", "`spcharge.created`", "long"), ("`spcharge.data.object.billing_details.address.postal_code`", "long", "`spcharge.data.object.billing_details.address.postal_code`", "long"), ("`spcharge.data.object.payment_method_details.card.brand`", "string", "`spcharge.data.object.payment_method_details.card.brand`", "string"), ("`spcharge.data.object.payment_method_details.card.funding`", "string", "`spcharge.data.object.payment_method_details.card.funding`", "string")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource0]
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("_id", "string", "_id", "string"), ("amountpaidcents", "long", "amountpaidcents", "long"), ("_p_establishment", "string", "_p_establishment", "string"), ("type", "string", "type", "string"), ("ispaid", "boolean", "ispaid", "boolean"), ("for", "string", "for", "string"), ("_created_at", "string", "_created_at", "string"), ("_updated_at", "string", "_updated_at", "string"), ("formailingorder", "string", "formailingorder", "string"), ("formailinglistorder", "string", "formailinglistorder", "string"), ("_p_user", "string", "_p_user", "string"), ("`spcharge.created`", "long", "`spcharge.created`", "long"), ("`spcharge.data.object.billing_details.address.postal_code`", "long", "`spcharge.data.object.billing_details.address.postal_code`", "long"), ("`spcharge.data.object.payment_method_details.card.brand`", "string", "`spcharge.data.object.payment_method_details.card.brand`", "string"), ("`spcharge.data.object.payment_method_details.card.funding`", "string", "`spcharge.data.object.payment_method_details.card.funding`", "string")], transformation_ctx = "applymapping1")
## @type: ResolveChoice
## @args: [choice = "make_struct", transformation_ctx = "resolvechoice2"]
## @return: resolvechoice2
## @inputs: [frame = applymapping1]
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")
## @type: DropNullFields
## @args: [transformation_ctx = "dropnullfields3"]
## @return: dropnullfields3
## @inputs: [frame = resolvechoice2]
dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")
## @type: DataSink
## @args: [connection_type = "s3", connection_options = {"path": "s3://wp-datalake-v1/zohoexport"}, format = "parquet", transformation_ctx = "datasink4"]
## @return: datasink4
## @inputs: [frame = dropnullfields3]
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://wp-datalake-v1/zohoexport"}, format = "parquet", transformation_ctx = "datasink4")
job.commit()

Does anyone know what this error means and how I can resolve?

  • There is not enough information to resolve this. It would help if you could paste the complete stack trace.

    The best guess would be related to permissions to write into S3(Lake formation or S3 permisions, etc) OR and issue with flattening your struct. See if you can do it using Spark instead:

    resolvechoice3 = resolvechoice3.toDF()#convert to data frame resolvechoice3 = resolvechoice3.select(flatten(resolvechoice3.schema))#flatten resolvechoice3 = DynamicFrame.fromDF(resolvechoice3, glue_context, "final_convert")

  • Thank you! I posted the full trace as an answer below; no luck with the Spark additions unfortunately.

  • Hello jweyenberg, Can you post the stack trace before and after the exact error message "An error occurred while calling o79.getDynamicFrame. java.lang.reflect.InvocationTargetException"?

  • Ah, sorry - misunderstood. Please see that also in an Answer response below.

asked 2 years ago7384 views
3 Answers
1

Hello jweyenberg,

Below is the key error message:

An error occurred while calling o79.getDynamicFrame.\n: java.lang.RuntimeException: class com.amazonaws.services.gluejobexecutor.model.AccessDeniedException:User: arn:aws:sts::390172919295:assumed-role/AWSGlueServiceRole-zohodatacrawl/GlueJobRunnerSession is not authorized to perform: lakeformation:GetDataAccess on resource: arn:aws:glue:us-east-2:390172919295:table/appdata/payment_csv because no identity-based policy allows the lakeformation:GetDataAccess action 

which indicates the role (AWSGlueServiceRole-zohodatacrawl) you are using to run the Glue job does not have "GetDataAccess" action permission on the table "arn:aws:glue:us-east-2:390172919295:table/appdata/payment_csv".

You can add below policy into your role to fix this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "LakeFormationDataAccess",
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess"
            ],
            "Resource": "*"
        }
    ]
}

Reference: https://docs.aws.amazon.com/lake-formation/latest/dg/upgrade-glue-lake-formation-step3.html

As Lake Formation is known for multi-layer security, if you meet other Lake Formation related permission errors, you can have a check on this page: https://docs.aws.amazon.com/lake-formation/latest/dg/troubleshooting.html

AWS
SUPPORT ENGINEER
Jann_P
answered 2 years ago
0

OP posting the full error logs for additional detail:

Aug 26, 2022, 5:44:08 AM Pending execution 2022-08-26 12:44:21,934 main WARN JNDI lookup class is not available because this JRE does not support JNDI. JNDI string lookups will not be available, continuing configuration. java.lang.ClassNotFoundException: org.apache.logging.log4j.core.lookup.JndiLookup at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:173) at org.apache.logging.log4j.util.LoaderUtil.newInstanceOf(LoaderUtil.java:211) at org.apache.logging.log4j.util.LoaderUtil.newCheckedInstanceOf(LoaderUtil.java:232) at org.apache.logging.log4j.core.util.Loader.newCheckedInstanceOf(Loader.java:301) at org.apache.logging.log4j.core.lookup.Interpolator.<init>(Interpolator.java:95) at org.apache.logging.log4j.core.config.AbstractConfiguration.<init>(AbstractConfiguration.java:114) at org.apache.logging.log4j.core.config.DefaultConfiguration.<init>(DefaultConfiguration.java:55) at org.apache.logging.log4j.core.layout.PatternLayout$Builder.build(PatternLayout.java:430) at org.apache.logging.log4j.core.layout.PatternLayout.createDefaultLayout(PatternLayout.java:324) at org.apache.logging.log4j.core.appender.ConsoleAppender$Builder.<init>(ConsoleAppender.java:121) at org.apache.logging.log4j.core.appender.ConsoleAppender.newBuilder(ConsoleAppender.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.createBuilder(PluginBuilder.java:158) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:119) at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:813) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:753) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:745) at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:389) at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:169) at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:181) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:446) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:520) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:536) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:214) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194) at org.apache.logging.log4j.LogManager.getLogger(LogManager.java:597) at org.apache.spark.metrics.sink.MetricsConfigUtils.<clinit>(MetricsConfigUtils.java:12) at org.apache.spark.metrics.sink.MetricsProxyInfo.fromConfig(MetricsProxyInfo.java:17) at com.amazonaws.services.glue.cloudwatch.CloudWatchLogsAppenderCommon.<init>(CloudWatchLogsAppenderCommon.java:62) at com.amazonaws.services.glue.cloudwatch.CloudWatchLogsAppenderCommon$CloudWatchLogsAppenderCommonBuilder.build(CloudWatchLogsAppenderCommon.java:79) at com.amazonaws.services.glue.cloudwatch.CloudWatchAppender.activateOptions(CloudWatchAppender.java:73) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:81) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383) at org.apache.spark.network.util.JavaUtils.<clinit>(JavaUtils.java:41) at org.apache.spark.internal.config.ConfigHelpers$.byteFromString(ConfigBuilder.scala:67) at org.apache.spark.internal.config.ConfigBuilder$$anonfun$bytesConf$1.apply(ConfigBuilder.scala:235) at org.apache.spark.internal.config.ConfigBuilder$$anonfun$bytesConf$1.apply(ConfigBuilder.scala:235) at org.apache.spark.internal.config.TypedConfigBuilder$$anonfun$transform$1.apply(ConfigBuilder.scala:101) at org.apache.spark.internal.config.TypedConfigBuilder$$anonfun$transform$1.apply(ConfigBuilder.scala:101) at org.apache.spark.internal.config.TypedConfigBuilder.createWithDefault(ConfigBuilder.scala:143) at org.apache.spark.internal.config.package$.<init>(package.scala:121) at org.apache.spark.internal.config.package$.<clinit>(package.scala) at org.apache.spark.SparkConf$.<init>(SparkConf.scala:716) at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala) at org.apache.spark.SparkConf.set(SparkConf.scala:95) at org.apache.spark.SparkConf$$anonfun$loadFromSystemProperties$3.apply(SparkConf.scala:77) at org.apache.spark.SparkConf$$anonfun$loadFromSystemProperties$3.apply(SparkConf.scala:76) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:76) at org.apache.spark.SparkConf.<init>(SparkConf.scala:71) at org.apache.spark.SparkConf.<init>(SparkConf.scala:58) at com.amazonaws.services.glue.SparkProcessLauncherPlugin$class.getSparkConf(ProcessLauncher.scala:41) at com.amazonaws.services.glue.ProcessLauncher$$anon$1.getSparkConf(ProcessLauncher.scala:78) at com.amazonaws.services.glue.ProcessLauncher.<init>(ProcessLauncher.scala:84) at com.amazonaws.services.glue.ProcessLauncher.<init>(ProcessLauncher.scala:78) at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:29) at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala) 2022-08-26 12:44:21,938 main INFO Log4j appears to be running in a Servlet environment, but there's no log4j-web module available. If you want better web container support, please add the log4j-web JAR to your web archive or server lib directory. GlueTelmty1*,SM,GAUGE,2022-08-26T12:44:26Z,,jr_2201071918b9e489e76f8ab97876e7ba3a757354740fb454ff6f706dbbe16a63,glue.spark-application-

answered 2 years ago
0

Here is the stack trace from the run here:

22/08/24 15:24:40 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] {
    "Event": "GlueETLJobExceptionEvent",
    "Timestamp": 1661354680434,
    "Failure Reason": "Traceback (most recent call last):\n  File \"/tmp/wp-zoho-parquet-transform\", line 20, in <module>\n    datasource0 = glueContext.create_dynamic_frame.from_catalog(database = \"appdata\", table_name = \"payment_csv\", transformation_ctx = \"datasource0\")\n  File \"/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py\", line 642, in from_catalog\n    return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)\n  File \"/opt/amazon/lib/python3.6/site-packages/awsglue/context.py\", line 186, in create_dynamic_frame_from_catalog\n    return source.getFrame(**kwargs)\n  File \"/opt/amazon/lib/python3.6/site-packages/awsglue/data_source.py\", line 36, in getFrame\n    jframe = self._jsource.getDynamicFrame()\n  File \"/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\", line 1257, in __call__\n    answer, self.gateway_client, self.target_id, self.name)\n  File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 63, in deco\n    return f(*a, **kw)\n  File \"/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py\", line 328, in get_return_value\n    format(target_id, \".\", name), value)\npy4j.protocol.Py4JJavaError: An error occurred while calling o79.getDynamicFrame.\n: java.lang.RuntimeException: class com.amazonaws.services.gluejobexecutor.model.AccessDeniedException:User: arn:aws:sts::390172919295:assumed-role/AWSGlueServiceRole-zohodatacrawl/GlueJobRunnerSession is not authorized to perform: lakeformation:GetDataAccess on resource: arn:aws:glue:us-east-2:390172919295:table/appdata/payment_csv because no identity-based policy allows the lakeformation:GetDataAccess action (Service: AWSLakeFormation; Status Code: 400; Error Code: AccessDeniedException; Request ID: 4813cf44-7a93-4f9e-b833-88ba1f383457; Proxy: null) (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: AccessDeniedException; Request ID: 253e2ef8-35e6-46f9-9a63-3300136a7ead; Proxy: null)\n\tat com.amazonaws.services.glue.remote.LakeformationCredentialsProvider.refresh(LakeformationCredentialsProvider.scala:51)\n\tat com.amazonaws.services.glue.remote.LakeformationCredentialsProvider.<init>(LakeformationCredentialsProvider.scala:78)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat com.amazonaws.services.glue.remote.MichiganAWSCredentialProviderProxy$.get(MichiganAWSCredentialProviderProxy.scala:14)\n\tat com.amazonaws.services.glue.util.FileSchemeWrapper.setHadoopConfiguration(FileSchemeWrapper.scala:50)\n\tat com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:81)\n\tat com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:89)\n\tat com.amazonaws.services.glue.HadoopDataSource.getDynamicFrame(DataSource.scala:543)\n\tat com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:97)\n\tat com.amazonaws.services.glue.HadoopDataSource.getDynamicFrame(DataSource.scala:240)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\n",
    "Stack Trace": [
        {
            "Declaring Class": "get_return_value",
            "Method Name": "format(target_id, \".\", name), value)",
            "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
            "Line Number": 328
        },
        {
            "Declaring Class": "deco",
            "Method Name": "return f(*a, **kw)",
            "File Name": "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
            "Line Number": 63
        },
        {
            "Declaring Class": "__call__",
            "Method Name": "answer, self.gateway_client, self.target_id, self.name)",
            "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
            "Line Number": 1257
        },
        {
            "Declaring Class": "getFrame",
            "Method Name": "jframe = self._jsource.getDynamicFrame()",
            "File Name": "/opt/amazon/lib/python3.6/site-packages/awsglue/data_source.py",
            "Line Number": 36
        },
        {
            "Declaring Class": "create_dynamic_frame_from_catalog",
            "Method Name": "return source.getFrame(**kwargs)",
            "File Name": "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py",
            "Line Number": 186
        },
        {
            "Declaring Class": "from_catalog",
            "Method Name": "return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)",
            "File Name": "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py",
            "Line Number": 642
        },
        {
            "Declaring Class": "<module>",
            "Method Name": "datasource0 = glueContext.create_dynamic_frame.from_catalog(database = \"appdata\", table_name = \"payment_csv\", transformation_ctx = \"datasource0\")",
            "File Name": "/tmp/wp-zoho-parquet-transform",
            "Line Number": 20
        }
    ],
    "Last Executed Line number": 20,
    "script": "wp-zoho-parquet-transform"
}

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions

Relevant content