Questions tagged with AWS Glue

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Create table in GlueCatalog with Parquet format fails, even when enableUpdateCatalog

I am trying to modify a PySpark/Glue job to write a DynamicFrame to a new table in Glue Catalog, with a Parquet format. The database exists, the table does not yet. I am following the instructions posted here: https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html I am getting the error: ``` File "/tmp/resource_logs_with_attributes.py", line 443, in write_transformed_data_to_s3 format="glueparquet" File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 386, in write_dynamic_frame_from_catalog makeOptions(self._sc, additional_options), catalog_id) File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o91.getCatalogSink. : com.amazonaws.services.glue.model.EntityNotFoundException: Table api_logs_core not found. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 4ed5f8f4-e8bf-4764-afb8-425804d3bcd5; Proxy: null) ``` ``` sink = glueContext.getSink( connection_type="s3", path="s3://mybucket/test", enableUpdateCatalog=True, format="glueparquet" ) sink.setCatalogInfo( catalogDatabase='myexistinggluedatabase', catalogTableName='newtable' ) newdyf = glueContext.write_dynamic_frame_from_catalog( frame=my_dyf, database='myexistinggluedatabase', table_name='newtable', format="glueparquet" ) ``` I have tried variants including: * include the partition key array in the sink (`partitionKeys=["dt_pk"]`) * `format="parquet"` + `format_options={"useGlueParquetWriter": True}` * including `"updateBehavior": "UPDATE_IN_DATABASE"` in additional_options * `sink.writeFrame()` with options as documented * `my_dyf.write()` with connection_options and format + format_options This is generally functional code that has worked like this: * reads data from a source into a DynamicFrame * does basic transformations to a new schema * works in the context of a Glue Catalog * existing code uses `write_dynamic_frame.with_options`, specifying S3 location and format parquet, partitioned by date, etc., and to a table whose schema was discovered and written to the Glue Catalog by a Glue Crawler. This works in two phases: Glue Job runs and writes Parquet files to S3, then Glue Crawler updates the Glue Catalog * writes to a partitioned S3 data set in parquet format * Works either with standard parquet, or Glue Parquet Writer My sole change is that I have the same schema, partitions, data format as before, but that I now want to write to a new S3 location, and that I want to have Glue Job create the table and update the Data Catalog.
0
answers
0
votes
12
views
asked 18 days ago

Loading data from RDS postgres database to S3 bucket but getting Unable to execute HTTP request error.

Hi all. I created a Glue job that extracts data from RDS postgresql database and load to S3 bucket. i used crawler to create schema of RDS postgresql source database. But when i run the job it keeps running almost one hour and after got failed with the following error. I have created role that give AWSGluerole acces to it which i am using for running the glue job. I have given permissions to lake formation and also added inline policy to access to s3 bucket. Any solution will be highly appreciated.![The image of error is also attached](/media/postImages/original/IMOiM94Y5RQGm1Hia-xEBR1A) 22/10/31 13:55:48 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] { "Event": "GlueExceptionAnalysisTaskFailed", "Timestamp": 1667224548950, "Failure Reason": "Unable to execute HTTP request: Connect to my-first-bucket-0.s3.ap-northeast-1.amazonaws.com:443 [my-first-bucket-0.s3.ap-northeast-1.amazonaws.com/52.219.196.98, my-first-bucket-0.s3.ap-northeast-1.amazonaws.com/52.219.8.63, my-first-bucket-0.s3.ap-northeast-1.amazonaws.com/52.219.4.75, my-first-bucket-0.s3.ap-northeast-1.amazonaws.com/52.219.16.187, my-first-bucket-0.s3.ap-northeast-1.amazonaws.com/52.219.1.27, my-first-bucket-0.s3.ap-northeast-1.amazonaws.com/52.219.152.58] failed: connect timed out", "Stack Trace": [ { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor", "Method Name": "handleRetryableException", "File Name": "AmazonHttpClient.java", "Line Number": 1207 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor", "Method Name": "executeHelper", "File Name": "AmazonHttpClient.java", "Line Number": 1153 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor", "Method Name": "doExecute", "File Name": "AmazonHttpClient.java", "Line Number": 802 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor", "Method Name": "executeWithTimer", "File Name": "AmazonHttpClient.java", "Line Number": 770 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor", "Method Name": "execute", "File Name": "AmazonHttpClient.java", "Line Number": 744 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor", "Method Name": "access$500", "File Name": "AmazonHttpClient.java", "Line Number": 704 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl", "Method Name": "execute", "File Name": "AmazonHttpClient.java", "Line Number": 686 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient", "Method Name": "execute", "File Name": "AmazonHttpClient.java", "Line Number": 550 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient", "Method Name": "execute", "File Name": "AmazonHttpClient.java", "Line Number": 530 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client", "Method Name": "invoke", "File Name": "AmazonS3Client.java", "Line Number": 5140 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client", "Method Name": "invoke", "File Name": "AmazonS3Client.java", "Line Number": 5086 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client", "Method Name": "access$300", "File Name": "AmazonS3Client.java", "Line Number": 394 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy", "Method Name": "invokeServiceCall", "File Name": "AmazonS3Client.java", "Line Number": 6032 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client", "Method Name": "uploadObject", "File Name": "AmazonS3Client.java", "Line Number": 1812 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client", "Method Name": "putObject", "File Name": "AmazonS3Client.java", "Line Number": 1772 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.call.PutObjectCall", "Method Name": "performCall", "File Name": "PutObjectCall.java", "Line Number": 35 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.call.PutObjectCall", "Method Name": "performCall", "File Name": "PutObjectCall.java", "Line Number": 10 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.call.AbstractUploadingS3Call", "Method Name": "perform", "File Name": "AbstractUploadingS3Call.java", "Line Number": 87 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor", "Method Name": "execute", "File Name": "GlobalS3Executor.java", "Line Number": 114 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient", "Method Name": "invoke", "File Name": "AmazonS3LiteClient.java", "Line Number": 191 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient", "Method Name": "invoke", "File Name": "AmazonS3LiteClient.java", "Line Number": 186 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient", "Method Name": "putObject", "File Name": "AmazonS3LiteClient.java", "Line Number": 107 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore", "Method Name": "storeFile", "File Name": "Jets3tNativeFileSystemStore.java", "Line Number": 152 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream", "Method Name": "uploadSinglePart", "File Name": "MultipartUploadOutputStream.java", "Line Number": 198 }, { "Declaring Class": "com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream", "Method Name": "close", "File Name": "MultipartUploadOutputStream.java", "Line Number": 427 }, { "Declaring Class": "org.apache.hadoop.fs.FSDataOutputStream$PositionCache", "Method Name": "close", "File Name": "FSDataOutputStream.java", "Line Number": 73 }, { "Declaring Class": "org.apache.hadoop.fs.FSDataOutputStream", "Method Name": "close", "File Name": "FSDataOutputStream.java", "Line Number": 102 }, { "Declaring Class": "com.fasterxml.jackson.dataformat.csv.impl.UTF8Writer", "Method Name": "close", "File Name": "UTF8Writer.java", "Line Number": 74 }, { "Declaring Class": "com.fasterxml.jackson.dataformat.csv.impl.CsvEncoder", "Method Name": "close", "File Name": "CsvEncoder.java", "Line Number": 989 }, { "Declaring Class": "com.fasterxml.jackson.dataformat.csv.CsvGenerator", "Method Name": "close", "File Name": "CsvGenerator.java", "Line Number": 479 }, { "Declaring Class": "com.amazonaws.services.glue.writers.JacksonWriter", "Method Name": "done", "File Name": "JacksonWriter.scala", "Line Number": 73 }, { "Declaring Class": "com.amazonaws.services.glue.hadoop.TapeOutputFormat$$anon$1", "Method Name": "close", "File Name": "TapeOutputFormat.scala", "Line Number": 217 }, { "Declaring Class": "com.amazonaws.services.glue.sinks.HadoopWriters$", "Method Name": "$anonfun$writeNotPartitioned$3", "File Name": "HadoopWriters.scala", "Line Number": 125 }, { "Declaring Class": "org.apache.spark.util.Utils$", "Method Name": "tryWithSafeFinallyAndFailureCallbacks", "File Name": "Utils.scala", "Line Number": 1495 }, { "Declaring Class": "org.apache.spark.sql.glue.SparkUtility$", "Method Name": "tryWithSafeFinallyAndFailureCallbacks", "File Name": "SparkUtility.scala", "Line Number": 39 }, { "Declaring Class": "com.amazonaws.services.glue.sinks.HadoopWriters$", "Method Name": "writeNotPartitioned", "File Name": "HadoopWriters.scala", "Line Number": 125 }, { "Declaring Class": "com.amazonaws.services.glue.sinks.HadoopWriters$", "Method Name": "$anonfun$doStreamWrite$1", "File Name": "HadoopWriters.scala", "Line Number": 138 }, { "Declaring Class": "com.amazonaws.services.glue.sinks.HadoopWriters$", "Method Name": "$anonfun$doStreamWrite$1$adapted", "File Name": "HadoopWriters.scala", "Line Number": 129 }, { "Declaring Class": "org.apache.spark.scheduler.ResultTask", "Method Name": "runTask", "File Name": "ResultTask.scala", "Line Number": 90 }, { "Declaring Class": "org.apache.spark.scheduler.Task", "Method Name": "run", "File Name": "Task.scala", "Line Number": 131 }, { "Declaring Class": "org.apache.spark.executor.Executor$TaskRunner", "Method Name": "$anonfun$run$3", "File Name": "Executor.scala", "Line Number": 497 }, { "Declaring Class": "org.apache.spark.util.Utils$", "Method Name": "tryWithSafeFinally", "File Name": "Utils.scala", "Line Number": 1439 }, { "Declaring Class": "org.apache.spark.executor.Executor$TaskRunner", "Method Name": "run", "File Name": "Executor.scala", "Line Number": 500 }, { "Declaring Class": "java.util.concurrent.ThreadPoolExecutor", "Method Name": "runWorker", "File Name": "ThreadPoolExecutor.java", "Line Number": 1149 }, { "Declaring Class": "java.util.concurrent.ThreadPoolExecutor$Worker", "Method Name": "run", "File Name": "ThreadPoolExecutor.java", "Line Number": 624 }, { "Declaring Class": "java.lang.Thread", "Method Name": "run", "File Name": "Thread.java", "Line Number": 750 } ], "Task Launch Time": 1667223676080, "Stage ID": 1, "Stage Attempt ID": 0, "Task Type": "ResultTask", "Executor ID": "9", "Task ID": 11 }
1
answers
0
votes
17
views
asked a month ago