Questions tagged with AWS Glue

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Glue Job started failing after working for weeks with Py4JJavaError: An error occurred while calling o111.pyWriteDynamicFrame. java.lang.reflect.InvocationTargetException

Hi, I am a security architect not a programmer so I am the blind squirrel looking for a nut in this situation but could use some help. We have a glue job failing and the error bounces around from Py4JJavaError: An error occurred while calling o117.pyWriteDynamicFrame. java.lang.reflect.InvocationTargetException and JavaError: An error occurred while calling o116.pyWriteDynamicFrame. the full error I believe would be the below... Any help would be awesome and greatly appreciated "Event": "GlueETLJobExceptionEvent", "Timestamp": 1666211480954, "Failure Reason": "Traceback (most recent call last):\n File \"/tmp/governed_tables.py\", line 68, in <module>\n sink.writeFrame(dy_df)\n File \"/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py\", line 32, in writeFrame\n return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + \"_errors\")\n File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py\", line 1305, in __call__\n answer, self.gateway_client, self.target_id, self.name)\n File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 111, in deco\n return f(*a, **kw)\n File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py\", line 328, in get_return_value\n format(target_id, \".\", name), value)\npy4j.protocol.Py4JJavaError: An error occurred while calling o117.pyWriteDynamicFrame.\n: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat com.amazonaws.services.glue.remote.MichiganAWSCredentialProviderProxy$.get(MichiganAWSCredentialProviderProxy.scala:14)\n\tat com.amazonaws.services.glue.util.LakeformationClientWrapper.$anonfun$lakeformationUpdateTableObjectsInternal$1(LakeformationGovernedWrapper.scala:126)\n\tat scala.collection.immutable.List.map(List.scala:282)\n\tat com.amazonaws.services.glue.util.LakeformationClientWrapper.lakeformationUpdateTableObjectsInternal(LakeformationGovernedWrapper.scala:115)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native .Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat com.amazonaws.services.glue.GlueUtility$.callLakeformationMethod(GlueUtility.scala:44)\n\tat com.amazonaws.services.glue.sinks.HadoopDataSink.lakeformationUpdateTableObjects(HadoopDataSink.scala:344)\n\tat com.amazonaws.services.glue.sinks.HadoopDataSink.$anonfun$writeDynamicFrame$2(HadoopDataSink.scala:313)\n\tat com.amazonaws.services.glue.util.FileSchemeWrapper.$anonfun$executeWithQualifiedScheme$1(FileSchemeWrapper.scala:90)\n\tat com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:83)\n\tat com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:90)\n\tat com.amazonaws.services.glue.sinks.HadoopDataSink.$anonfun$writeDynamicFrame$1(HadoopDataSink.scala:158)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)\n\tat com.amazonaws.services.glue.sinks.HadoopDataSink.writeDynamicFrame(HadoopDataSink.scala:152)\n\tat com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:72)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\nCaused by: java.lang.RuntimeException: class com.amazonaws.services.gluejobexecutor.model.EntityNotFoundException:Database null not found. (Service: AWSLakeFormation; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 37ecd5ab-d47a-4c25-ad7a-80a070ed1558; Proxy: null) (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: EntityNotFoundException; Request ID: b3862a0e-3bb4-440e-9801-452716b82337; Proxy: null)\n\tat com.amazonaws.services.glue.remote.LakeformationCredentialsProvider.refresh(LakeformationCredentialsProvider.scala:50)\n\tat com.amazonaws.services.glue.remote.LakeformationCredentialsProvider.<init>(LakeformationCredentialsProvider.scala:77)\n\t... 33 more\n", "Stack Trace": [ { "Declaring Class": "get_return_value", "Method Name": "format(target_id, \".\", name), value)", "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", "Line Number": 328 }, { "Declaring Class": "deco", "Method Name": "return f(*a, **kw)", "File Name": "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", "Line Number": 111 }, { "Declaring Class": "__call__", "Method Name": "answer, self.gateway_client, self.target_id, self.name)", "File Name": "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", "Line Number": 1305 }, { "Declaring Class": "writeFrame", "Method Name": "return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + \"_errors\")", "File Name": "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", "Line Number": 32 }, { "Declaring Class": "<module>", "Method Name": "sink.writeFrame(dy_df)", "File Name": "/tmp/governed_tables.py", "Line Number": 68 } ], "Last Executed Line number": 68, "script": "governed_tables.py"
1
answers
0
votes
75
views
asked a month ago

Athena JSON SerDe Error

I have successfully created a database using a Glue crawler, but ran into this error (or an extremely similar one) when trying to access it with even a basic Athena query. ``` GENERIC_INTERNAL_ERROR: IndexOutOfBoundsException thrown initializing deserializer org.openx.data.jsonserde.JsonSerDe. Cause: Index: 55, Size: 55 ``` I have successfully run Athena queries on another database, which I believe is formatted the same way. I tried taking just a single file out of the S3 bucket and using a new crawler to make a new table, but it created the same error. My best guess is that there's a special character or something being used in the JSON that this particular deserializer can't handle. A JSON linter read the file as a valid JSON. Are there reserved characters for the AWS SerDe? Where can I look to find more information on this particular error? The data I am trying to assess belongs to a vendor, so I don't think I can post it here, but here is an example line from the file: ``` {"AE Weather Station (Standard w/ GHI) (Estimated GHI) (Watt hours/meter2)_Irradiance": "216.806", "AE Weather Station (Standard w/ GHI) (GHI sensor) (Watts/meter2)_GHI comparison": "0", "AE Weather Station (Standard w/ GHI) (GHI) (Watt hours/meter2)_Irradiance": "0", "AE Weather Station (Standard w/ GHI) - Ambient Temperature (Degrees Fahrenheit)_Temperature": "75.028", "AE Weather Station (Standard w/ GHI) - Module temperature (est) (Degrees Fahrenheit)_Temperature": "75.028", "AE Weather Station (Standard w/ GHI), Sun (GHI) (Watts/meter2)_Irradiance Sensor Degrade": "0", "AE Weather Station (Standard w/ GHI), Sun (GHI) (Watts/meter2)_Irradiance sensor orientation": "0", "AE Weather Station (Standard w/ GHI), Sun (GHI) * (Watts/meter2)_POA comparison": "0", "AE Weather Station (Standard w/ GHI): Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "AE Weather Station (Standard w/ GHI): Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "Ambient temperature (Degrees Celsius)_Clean average weather data": "23.904", "Ambient temperature Quality (Quality (asdf))_Clean average weather data": "100", "Bin Size": "Bin1Hour", "Blue-sky ratio: AE Weather Station (Standard w/ GHI), Sun (GHI) (Percent)_Irradiance Sensor Degrade": "104.792", "Blue-sky ratio: IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "102.432", "Blue-sky ratio: IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "111.587", "Blue-sky ratio: IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Percent)_Irradiance Sensor Degrade": "NaN", "GHI (Watts/meter2)_Clean average weather data": "0", "GHI Quality (Quality (asdf))_Clean average weather data": "100", "IMT 1 (1810) Ref Cell w/ Mod Temp (Estimated POA) (Watt hours/meter2)_Irradiance": "0", "IMT 1 (1810) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "71.581", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.090", "IMT 1 (1810) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 1 (1810) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 2 (1550) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "71.331", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.048", "IMT 2 (1550) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 2 (1550) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 3 (1220) Ref Cell w/ Mod Temp (Watt hours/meter2)_Irradiance": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp - Irradiance weighted module temperature (Degrees Fahrenheit)_Temperature": "70.246", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance Sensor Degrade": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance (Watts/meter2)_Irradiance sensor orientation": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance * (Watt hours/meter2)_GHI comparison": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp, Irradiance * (Watts/meter2)_POA comparison": "0.052", "IMT 3 (1220) Ref Cell w/ Mod Temp: Clear Sky (E-W) (Watts/meter2)_Irradiance sensor orientation": "0", "IMT 3 (1220) Ref Cell w/ Mod Temp: Clear Sky GHI (Watts/meter2)_Irradiance sensor orientation": "0", "Max wind speed - AE Weather Station (Standard w/ GHI) (Miles/hour)_Wind": "0", "Module temperature (Degrees Celsius)_Clean average weather data": "21.652", "Module temperature Quality (Quality (asdf))_Clean average weather data": "100", "POA (Watts/meter2)_Clean average weather data": "0", "POA Quality (Quality (asdf))_Clean average weather data": "100", "Rain (Rain (Inches))_Rain, Humidity, Barometric Pressure": "0", "Variance (Percent)_Irradiance": "NaN", "datetime": "2021-06-01 00:00:00", "time_retrieved": "20221007T10:56:1665165415"} ```
1
answers
0
votes
47
views
asked 2 months ago