Hive on EMR Serverless - StatsTask Exception

0

I developed a data processing application using Hive on EMR EC2. I'm trying to run the same code on EMR Serverless and am getting the following exception:

Job failed with Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask

The error occurs after trying to open this file:

2022-10-19T17:33:08,256 INFO  [d0599018-6ded-4feb-992c-c34f060d969e main([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(936)) - Opening 's3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/-ext-10001/000000_0' for reading

The log file shows earlier that another file was renamed to this file, so the file should exist:

2022-10-19T17:32:44,882 INFO  [d0599018-6ded-4feb-992c-c34f060d969e main([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:rename(963)) - rename s3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/_tmp.-ext-10001/000000_0 s3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/-ext-10001/000000_0 using algorithm version 1

Here's the stack trace of the error:

2022-10-19T17:33:08,256 INFO  [d0599018-6ded-4feb-992c-c34f060d969e main([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(936)) - Opening 's3://redacted-serverless-shakedown-s3/hive/scratch/hadoop/d0599018-6ded-4feb-992c-c34f060d969e/hive_2022-10-19_17-30-28_695_6254544867894550511-1/-mr-10000/.hive-staging_hive_2022-10-19_17-30-28_695_6254544867894550511-1/-ext-10001/000000_0' for reading
2022-10-19T17:33:08,321 ERROR [d0599018-6ded-4feb-992c-c34f060d969e main([])]: exec.StatsTask (StatsTask.java:execute(114)) - Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4697) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:179) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:83) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:108) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2682) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2353) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2029) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1727) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1721) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) [hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:472) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:488) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:796) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:762) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:684) [hive-cli-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_342]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_342]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323) [hadoop-common-3.2.1-amzn-8.jar:?]
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236) [hadoop-common-3.2.1-amzn-8.jar:?]
Caused by: java.lang.NullPointerException
	at com.amazonaws.glue.catalog.converters.HiveToCatalogConverter.convertDecimal(HiveToCatalogConverter.java:347) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.converters.HiveToCatalogConverter.convertColumnStatisticsData(HiveToCatalogConverter.java:302) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.converters.HiveToCatalogConverter.convertColumnStatisticsObjList(HiveToCatalogConverter.java:250) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.updateTableColumnStatistics(GlueMetastoreClientDelegate.java:1581) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.setPartitionColumnStatistics(GlueMetastoreClientDelegate.java:1945) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.setPartitionColumnStatistics(AWSCatalogMetastoreClient.java:1994) ~[aws-glue-datacatalog-hive3-client-3.6.0.jar:?]
	at org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4694) ~[hive-exec-3.1.3-amzn-1.jar:3.1.3-amzn-1]
	... 27 more

... 27 more
Patrick
gefragt vor 2 Jahren313 Aufrufe
1 Antwort
0

From the stacktrace, it looks like its trying to store stats to glue for a decimal column type when all values are null. This is a known issue and EMR team is working on fixing it.

AWS
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen