Hive on EMR Serverless - StatsTask Exception

0

I developed a data processing application using Hive on EMR EC2. I'm trying to run the same code on EMR Serverless and am getting the following exception:

Job failed with Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask

The error occurs after trying to open this file:

2022-10-19T17:33:08,256 INFO  [d0599018-6ded-4feb-992c-c34f060d969e main([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(936)) - Opening 's3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/-ext-10001/000000_0' for reading

The log file shows earlier that another file was renamed to this file, so the file should exist:

2022-10-19T17:32:44,882 INFO  [d0599018-6ded-4feb-992c-c34f060d969e main([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:rename(963)) - rename s3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/_tmp.-ext-10001/000000_0 s3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/-ext-10001/000000_0 using algorithm version 1

Here's the stack trace of the error:

2022-10-19T17:33:08,256 INFO  [d0599018-6ded-4feb-992c-c34f060d969e main([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(936)) - Opening 's3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/-ext-10001/000000_0' for reading
2022-10-19T17:33:08,321 ERROR [d0599018-6ded-4feb-992c-c34f060d969e main([])]: exec.StatsTask (StatsTask.java:execute(114)) - Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4697) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:179) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:83) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:108) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2682) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2353) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2029) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1727) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1721) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:472) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:488) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:796) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:762) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:684) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_342]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_342]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323) [hadoop-common-3.2.1-amzn-8.jar:?]
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236) [hadoop-common-3.2.1-amzn-8.jar:?]
Caused by: java.lang.NullPointerException
	at com.amazonaws.glue.catalog.converters.HiveToCatalogConverter.convertDecimal(HiveToCatalogConverter.java:347) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.converters.HiveToCatalogConverter.convertColumnStatisticsData(HiveToCatalogConverter.java:302) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.converters.HiveToCatalogConverter.convertColumnStatisticsObjList(HiveToCatalogConverter.java:250) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.updateTableColumnStatistics(GlueMetastoreClientDelegate.java:1581) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.setPartitionColumnStatistics(GlueMetastoreClientDelegate.java:1945) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.setPartitionColumnStatistics(AWSCatalogMetastoreClient.java:1994) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4694) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	... 27 more

... 27 more
Patrick
preguntada hace un año293 visualizaciones
1 Respuesta
0

From the stacktrace, it looks like its trying to store stats to glue for a decimal column type when all values are null. This is a known issue and EMR team is working on fixing it.

AWS
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas