Questions tagged with Extract Transform & Load Data

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Redshift serverless data api sql calls

Hi, I am new to Redshift serverless and working on a Lambda function to connect to serverless database using Python redshift Data API and execute bunch of SQL and stored procedures that are stored in a Nexus repository artifact. I am seeing errors when I try to execute SQL statements read from a file as a string. Here is an example of a DDL from one of the scripts. -- Table Definitions -- ---------------------------------- -- test1 ----------------------- -- ---------------------------------- DROP TABLE IF EXISTS test1; CREATE TABLE test1 ( id varchar(32), name varchar(64) DEFAULT NULL, grade varchar(64) DEFAULT NULL, zip varchar(5) DEFAULT NULL -- test2 -------------------------- -- ---------------------------------- DROP TABLE IF EXISTS test2; CREATE TABLE test2( id varchar(32), user_id varchar(32) DEFAULT NULL, hnum varchar(6), ts_created timestamp DEFAULT NULL, ts_updated timestamp DEFAULT NULL -- and few other tables in the same script The function runs fine if I hard code the sql query in the code and I don't see any syntax or other errors with the sql file contents since I could run those using Redshift query editor by manually copy n pasting all the DDLs. Am I missing anything or using data API is not the right approach for this use case? Error and Traceback from the lambda function execution: During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/runtime/bootstrap.py", line 60, in <module> main() File "/var/runtime/bootstrap.py", line 57, in main awslambdaricmain.main([os.environ["LAMBDA_TASK_ROOT"], os.environ["_HANDLER"]]) File "/var/runtime/awslambdaric/__main__.py", line 21, in main bootstrap.run(app_root, handler, lambda_runtime_api_addr) File "/var/runtime/awslambdaric/bootstrap.py", line 405, in run handle_event_request( File "/var/runtime/awslambdaric/bootstrap.py", line 165, in handle_event_request xray_fault = make_xray_fault(etype.__name__, str(value), os.getcwd(), tb_tuples) FileNotFoundError: [Errno 2] No such file or directory
1
answers
0
votes
48
views
asked 5 days ago

Create table in GlueCatalog with Parquet format fails, even when enableUpdateCatalog

I am trying to modify a PySpark/Glue job to write a DynamicFrame to a new table in Glue Catalog, with a Parquet format. The database exists, the table does not yet. I am following the instructions posted here: https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html I am getting the error: ``` File "/tmp/resource_logs_with_attributes.py", line 443, in write_transformed_data_to_s3 format="glueparquet" File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 386, in write_dynamic_frame_from_catalog makeOptions(self._sc, additional_options), catalog_id) File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o91.getCatalogSink. : com.amazonaws.services.glue.model.EntityNotFoundException: Table api_logs_core not found. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 4ed5f8f4-e8bf-4764-afb8-425804d3bcd5; Proxy: null) ``` ``` sink = glueContext.getSink( connection_type="s3", path="s3://mybucket/test", enableUpdateCatalog=True, format="glueparquet" ) sink.setCatalogInfo( catalogDatabase='myexistinggluedatabase', catalogTableName='newtable' ) newdyf = glueContext.write_dynamic_frame_from_catalog( frame=my_dyf, database='myexistinggluedatabase', table_name='newtable', format="glueparquet" ) ``` I have tried variants including: * include the partition key array in the sink (`partitionKeys=["dt_pk"]`) * `format="parquet"` + `format_options={"useGlueParquetWriter": True}` * including `"updateBehavior": "UPDATE_IN_DATABASE"` in additional_options * `sink.writeFrame()` with options as documented * `my_dyf.write()` with connection_options and format + format_options This is generally functional code that has worked like this: * reads data from a source into a DynamicFrame * does basic transformations to a new schema * works in the context of a Glue Catalog * existing code uses `write_dynamic_frame.with_options`, specifying S3 location and format parquet, partitioned by date, etc., and to a table whose schema was discovered and written to the Glue Catalog by a Glue Crawler. This works in two phases: Glue Job runs and writes Parquet files to S3, then Glue Crawler updates the Glue Catalog * writes to a partitioned S3 data set in parquet format * Works either with standard parquet, or Glue Parquet Writer My sole change is that I have the same schema, partitions, data format as before, but that I now want to write to a new S3 location, and that I want to have Glue Job create the table and update the Data Catalog.
0
answers
0
votes
11
views
asked 15 days ago