By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Database

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Executing hive create table in Spark.sql -- java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

1. The Issue: We have a Spark EMR cluster which connects to a remote hive metastore to use our emr hive data warehouse. When executing Pyspark statement in Zeppelin notebook: sc.sql("create table userdb_emr_search.test_table (id int, attr string)") Got this exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found 2. EMR Spark cluster configures: Release label:emr-6.3.0 Hadoop distribution:Amazon 3.2.1 Applications:Spark 3.1.1, JupyterHub 1.2.0, Ganglia 3.7.2, Zeppelin 0.9.0 3. The class org.apache.hadoop.fs.s3a.S3AFileSystem has its ClassPath on spark class path correctly: 'spark.executor.extraClassPath', '....:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:.... 'spark.driver.extraClassPath', '....:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:.... 4. Jar files are in the right places: Under /usr/lib/hadoop: -rw-r--r-- 1 root root 501704 Mar 30 2021 hadoop-aws-3.2.1-amzn-3.jar lrwxrwxrwx 1 root root 27 Sep 8 01:50 hadoop-aws.jar -> hadoop-aws-3.2.1-amzn-3.jar -rw-r--r-- 1 root root 4175105 Mar 30 2021 hadoop-common-3.2.1-amzn-3.jar lrwxrwxrwx 1 root root 30 Sep 8 01:50 hadoop-common.jar -> hadoop-common-3.2.1-amzn-3.jar Under /usr/share/aws/aws-java-sdk/: -rw-r--r-- 1 root root 216879203 Apr 1 2021 aws-java-sdk-bundle-1.11.977.jar 5. Hadoop storage: Use Amazon S3 for Hadoop storage instead of HDFS 6. Error log when executing spark sql create table in Zeppelin notebook: WARN [2022-09-05 03:24:11,785] ({SchedulerFactory3} NotebookServer.java[onStatusChange]:1928) - Job paragraph_1662330571651_66787638 is finished, status: ERROR, exception: null, result: %text Fail to execute line 2: sc.sql("create table userdb_emr_search.test_table (id int, attr string)") Traceback (most recent call last): File "/tmp/1662348163304-0/zeppelin_python.py", line 158, in <module> exec(code, _zcUserQueryNameSpace) File "<stdin>", line 2, in <module> File "/usr/lib/spark/python/pyspark/sql/session.py", line 723, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found) INFO [2022-09-05 03:24:11,785] ({SchedulerFactory3} VFSNotebookRepo.java[save]:144) - Saving note 2HDK22P2Z to Untitled Note 1_2HDK22P2Z.zpln Please help investigate why spark sql cannot see the class org.apache.hadoop.fs.s3a.S3AFileSystem even its jar files are in right place and have correct ClassPath.
1
answers
0
votes
45
views
asked a month ago

error in lambda , no clue why

hi, we are geting this error in the logs and have no clue why, we have a lambda which does a scan on dynambd, the lambda seems to throw an error count at the time, there seems to be nothing in dynamdb, this is the error, can anyone tell me why this is happening? { "errorType": "ValidationException", "errorMessage": "Invalid FilterExpression: Expression size has exceeded the maximum allowed size; expression size: 4172", "code": "ValidationException", "message": "Invalid FilterExpression: Expression size has exceeded the maximum allowed size; expression size: 4172", "time": "2022-09-07T16:12:20.819Z", "requestId": "RG2TJNCJ36LIHKIHPJ8V7BQ3ARVV4KQNSO5AEMVJF66Q9ASUAAJG", "statusCode": 400, "retryable": false, "retryDelay": 6.469955607178434, "stack": [ "ValidationException: Invalid FilterExpression: Expression size has exceeded the maximum allowed size; expression size: 4172", " at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)", " at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)", " at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)", " at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)", " at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)", " at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)", " at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10", " at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)", " at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)", " at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18)" ] }
1
answers
0
votes
40
views
asked a month ago

[Announcement]: Amazon RDS for PostgreSQL 10 deprecation

Amazon RDS for PostgreSQL is announcing deprecation of its PostgreSQL major version 10, starting April 17, 2023. PostgreSQL community plans to deprecate PostgreSQL 10 on November 10, 2022 [1] and it will not receive any security patches after November 10, 2022. We strongly recommend that you monitor the PostgreSQL security page for documented vulnerabilities [2]. We recommend that you take action and upgrade your PostgreSQL databases running on major version 10 to a later major version, such as version 14. PostgreSQL 14 includes performance improvements for parallel queries, heavily-concurrent workloads, partitioned tables, logical replication, and vacuuming. To learn more about the benefits of PostgreSQL 14, please refer to ‘PostgreSQL 14 Community Release Announcement’ [3] To upgrade your DB instance from major version 10, we recommend that you first upgrade your current version of database to a minor version such as 10.19 or later that has an upgrade path to the target major version, and then perform an upgrade to a later major version, such as 14. You can initiate an upgrade by going to the Modify DB Instance page in the AWS Management Console and change the database version setting to a newer major version of PostgreSQL. Alternatively, you can also use the AWS CLI to perform the upgrade. While major version upgrades typically complete within the standard maintenance window, the duration of the upgrade depends on the number of objects within the database. To avoid any unplanned unavailability outside your maintenance window, we recommend that you first take a snapshot of your database and test the upgrade to get an estimate of the upgrade duration. To learn more about upgrading PostgreSQL major versions in RDS, review the 'Upgrading Database Versions' page [4]. The upgrade process will shutdown the database instance, perform the upgrade, and restart the database instance. The DB instance may restart multiple times during the process. If you choose the "Apply Immediately" option, the upgrade will be initiated immediately after clicking on the "Modify DB Instance" button. If you choose not to apply the change immediately, the upgrade will be performed during your next maintenance window. Your affected Amazon RDS DB Instances are listed in "Affected Resources" tab of the AWS Health Dashboard. Starting February 14, 2023 00:00:01 AM UTC, you will not be able to create new RDS instances with PostgreSQL major version 10 from either the AWS Console or the CLI. We recommend you to upgrade your databases before April 17, 2023. RDS will upgrade your PostgreSQL 10 databases to PostgreSQL 14 during a scheduled maintenance window between April 17, 2023 00:00:01 UTC and July 18, 2023 00:00:01 UTC. On July 18, 2023 00:00:01 AM UTC, any PostgreSQL 10 databases that remain will be force upgraded to version 14 regardless of instances’ scheduled maintenance window. If you have any questions or concerns, please reach out to AWS Support [5]. To proactively plan for future PostgreSQL deprecation of major version 11 and above, you can now refer to published RDS for PostgreSQL deprecation dates here [6] [1] https://www.postgresql.org/support/versioning/ [2] https://www.postgresql.org/support/security/ [3] https://www.postgresql.org/about/news/postgresql-14-released-2318/ [4]https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html#USER_UpgradeDBInstance.PostgreSQL.MajorVersion.Process [5] http://aws.amazon.com/support [6] https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/postgresql-release-calendar.html
0
answers
3
votes
131
views
asked a month ago