By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon EMR

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Apache Hudi on Amazon EMR and AWS Database

I created the tutorial from this [link](https://aws.amazon.com/pt/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/) successfully But a trying make that using other data and table I don't have success. I receive this error: ``` hadoop@ip-10-99-2-111 bin]$ spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.5 --master yarn --deploy-mode cluster --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false /usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar --table-type COPY_ON_WRITE --source-ordering-field dms_received_ts --props s3://hudi-test-tt/properties/dfs-source-health-care-full.properties --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --target-base-path s3://hudi-test-tt/hudi/health_care --target-table hudiblogdb.health_care --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --enable-hive-sync Ivy Default Cache set to: /home/hadoop/.ivy2/cache The jars for the packages stored in: /home/hadoop/.ivy2/jars :: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.hudi#hudi-utilities-bundle_2.11 added as a dependency org.apache.spark#spark-avro_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-fbc63aec-b48f-4ef4-bc38-f788919cf31c;1.0 confs: [default] found org.apache.hudi#hudi-utilities-bundle_2.11;0.5.2-incubating in central found org.apache.spark#spark-avro_2.11;2.4.5 in central found org.spark-project.spark#unused;1.0.0 in central :: resolution report :: resolve 270ms :: artifacts dl 7ms :: modules in use: org.apache.hudi#hudi-utilities-bundle_2.11;0.5.2-incubating from central in [default] org.apache.spark#spark-avro_2.11;2.4.5 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 3 | 0 | 0 | 0 || 3 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-fbc63aec-b48f-4ef4-bc38-f788919cf31c confs: [default] 0 artifacts copied, 3 already retrieved (0kB/7ms) 22/08/25 21:39:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22/08/25 21:39:38 INFO RMProxy: Connecting to ResourceManager at ip-10-99-2-111.us-east-2.compute.internal/10.99.2.111:8032 22/08/25 21:39:38 INFO Client: Requesting a new application from cluster with 1 NodeManagers 22/08/25 21:39:38 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container) 22/08/25 21:39:38 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead 22/08/25 21:39:38 INFO Client: Setting up container launch context for our AM 22/08/25 21:39:38 INFO Client: Setting up the launch environment for our AM container 22/08/25 21:39:39 INFO Client: Preparing resources for our AM container 22/08/25 21:39:39 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 22/08/25 21:39:41 INFO Client: Uploading resource file:/mnt/tmp/spark-4c327077-6693-4371-9e41-10e2342e0200/__spark_libs__5969710364624957851.zip -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/__spark_libs__5969710364624957851.zip 22/08/25 21:39:41 INFO Client: Uploading resource file:/usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/hudi-utilities-bundle_2.11-0.5.2-incubating.jar 22/08/25 21:39:41 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.11-0.5.2-incubating.jar -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/org.apache.hudi_hudi-utilities-bundle_2.11-0.5.2-incubating.jar 22/08/25 21:39:41 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.5.jar -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/org.apache.spark_spark-avro_2.11-2.4.5.jar 22/08/25 21:39:41 INFO Client: Uploading resource file:/home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/org.spark-project.spark_unused-1.0.0.jar 22/08/25 21:39:41 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/hive-site.xml 22/08/25 21:39:42 INFO Client: Uploading resource file:/mnt/tmp/spark-4c327077-6693-4371-9e41-10e2342e0200/__spark_conf__6985991088000323368.zip -> hdfs://ip-10-99-2-111.us-east-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1661296163923_0003/__spark_conf__.zip 22/08/25 21:39:42 INFO SecurityManager: Changing view acls to: hadoop 22/08/25 21:39:42 INFO SecurityManager: Changing modify acls to: hadoop 22/08/25 21:39:42 INFO SecurityManager: Changing view acls groups to: 22/08/25 21:39:42 INFO SecurityManager: Changing modify acls groups to: 22/08/25 21:39:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 22/08/25 21:39:43 INFO Client: Submitting application application_1661296163923_0003 to ResourceManager 22/08/25 21:39:43 INFO YarnClientImpl: Submitted application application_1661296163923_0003 22/08/25 21:39:44 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:44 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1661463583358 final status: UNDEFINED tracking URL: http://ip-10-99-2-111.us-east-2.compute.internal:20888/proxy/application_1661296163923_0003/ user: hadoop 22/08/25 21:39:45 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:46 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:47 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:48 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:49 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:50 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:39:50 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: ip-10-99-2-253.us-east-2.compute.internal ApplicationMaster RPC port: 33179 queue: default start time: 1661463583358 final status: UNDEFINED tracking URL: http://ip-10-99-2-111.us-east-2.compute.internal:20888/proxy/application_1661296163923_0003/ user: hadoop 22/08/25 21:39:51 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:39:52 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:39:53 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:39:54 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:39:55 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:55 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1661463583358 final status: UNDEFINED tracking URL: http://ip-10-99-2-111.us-east-2.compute.internal:20888/proxy/application_1661296163923_0003/ user: hadoop 22/08/25 21:39:56 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:57 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:58 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:39:59 INFO Client: Application report for application_1661296163923_0003 (state: ACCEPTED) 22/08/25 21:40:00 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:40:00 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: ip-10-99-2-253.us-east-2.compute.internal ApplicationMaster RPC port: 34591 queue: default start time: 1661463583358 final status: UNDEFINED tracking URL: http://ip-10-99-2-111.us-east-2.compute.internal:20888/proxy/application_1661296163923_0003/ user: hadoop 22/08/25 21:40:01 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:40:02 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:40:03 INFO Client: Application report for application_1661296163923_0003 (state: RUNNING) 22/08/25 21:40:04 INFO Client: Application report for application_1661296163923_0003 (state: FINISHED) 22/08/25 21:40:04 INFO Client: client token: N/A diagnostics: User class threw exception: java.io.IOException: Could not load schema provider class org.apache.hudi.utilities.schema.FilebasedSchemaProvider at org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:101) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:364) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:95) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:89) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685) Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89) at org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:99) ... 9 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78) ... 11 more Caused by: org.apache.hudi.exception.HoodieNotSupportedException: Required property hoodie.deltastreamer.schemaprovider.source.schema.file is missing at org.apache.hudi.DataSourceUtils.lambda$checkRequiredProperties$1(DataSourceUtils.java:173) at java.util.Collections$SingletonList.forEach(Collections.java:4824) at org.apache.hudi.DataSourceUtils.checkRequiredProperties(DataSourceUtils.java:171) at org.apache.hudi.utilities.schema.FilebasedSchemaProvider.<init>(FilebasedSchemaProvider.java:55) ... 16 more ApplicationMaster host: ip-10-99-2-253.us-east-2.compute.internal ApplicationMaster RPC port: 34591 queue: default start time: 1661463583358 final status: FAILED tracking URL: http://ip-10-99-2-111.us-east-2.compute.internal:20888/proxy/application_1661296163923_0003/ user: hadoop 22/08/25 21:40:04 ERROR Client: Application diagnostics message: User class threw exception: java.io.IOException: Could not load schema provider class org.apache.hudi.utilities.schema.FilebasedSchemaProvider at org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:101) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:364) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:95) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:89) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685) Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80) at org.apache.hudi.common.util.ReflectionUtils.loadClass(Refl ```
1
answers
0
votes
20
views
asked a month ago

Overriding java logging config

``` -rw-r--r-- 1 yarn yarn 4.8G Jul 29 04:43 hadoop-yarn-timelineserver-ip-172-31-23-227.out Jul 29, 2022 4:44:14 AM com.sun.jersey.server.wadl.generators.AbstractWadlGeneratorGrammarGenerator attachTypes INFO: Couldn't find grammar element for class javax.ws.rs.core.Response Jul 29, 2022 4:44:14 AM com.sun.jersey.server.wadl.generators.AbstractWadlGeneratorGrammarGenerator attachTypes INFO: Couldn't find grammar element for class javax.ws.rs.core.Response Jul 29, 2022 4:44:14 AM com.sun.jersey.server.wadl.generators.AbstractWadlGeneratorGrammarGenerator attachTypes INFO: Couldn't find grammar element for class javax.ws.rs.core.Response Jul 29, 2022 4:44:14 AM com.sun.jersey.server.wadl.generators.AbstractWadlGeneratorGrammarGenerator attachTypes INFO: Couldn't find grammar element for class javax.ws.rs.core.Response Jul 29, 2022 4:44:14 AM com.sun.jersey.server.wadl.generators.AbstractWadlGeneratorGrammarGenerator attachTypes INFO: Couldn't find grammar element for class javax.ws.rs.core.Response ``` I've just found that the .out file of YARN timeline server is abnormally big and it consumes /mnt partition. The file just has tons of useless messages for me written by AbstractWadlGeneratorGrammarGenerator class of jersey-server module. [AbstractWadlGeneratorGrammarGenerator.java](https://github.com/javaee/jersey-1.x/blob/master/jersey-server/src/main/java/com/sun/jersey/server/wadl/generators/AbstractWadlGeneratorGrammarGenerator.java) ``` ... import java.util.logging.Logger; ... private static final Logger LOGGER = Logger.getLogger( AbstractWadlGeneratorGrammarGenerator.class.getName() ); ... LOGGER.info("Couldn't find grammar element for class " + parameterClass.getName()); ... ``` However, AbstractWadlGeneratorGrammarGenerator writes the message by **Java logger, not log4j**. Can I override Java logger configuration for EMR servers, especially YARN timeline server?
1
answers
0
votes
70
views
asked 2 months ago