Questions tagged with AWS Glue

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

How to get the "RuleResults" generated by a glue studio job that have some dataquality rules, in the dataQuality Tab of the job i can manually download the results to a file when appears the "RuleResults". i have one step function where calls that job, i would like to know the output of that file(where was generated and the key in s3, too) to evaluate in a next step(i.e lambda function) which rules were'nt passed and which ones yes. tkx
0
answers
0
votes
10
views
Willi5
asked 6 days ago
Hi I have a glue job running with PySpark. Its taking too long to write the dynamic frame to s3. For around 1200 records writing it too around 500 seconds alone for writing to s3. I have observed that even if data frame is empty still it takes same amount of time to write to s3. Below are code snippets - > test1_df = test_df.repartition(1) > invoice_extract_final_dyf = DynamicFrame.fromDF(test1_df, glueContext, "invoice_extract_final_dyf") > glueContext.write_dynamic_frame.from_options(frame=invoice_extract_final_dyf, connection_type="s3", connection_options={"path": destination_path}, format="json") The conversion in 2nd line and writing to s3 both of these consumes most of the time. Any help will be appreciated. Let me know if any further details are needed.
2
answers
0
votes
36
views
asked 6 days ago
I set up the resources to trigger glue job through eventbridge. But when I tested in console, Invocations == FailedInvocations == TriggeredRules == 1. What can I do to fix it? ``` ######### AWS Glue Workflow ############ # Create a Glue workflow that triggers the Glue job resource "aws_glue_workflow" "example_glue_workflow" { name = "example_glue_workflow" description = "Glue workflow that triggers the example_glue_job" } resource "aws_glue_trigger" "example_glue_trigger" { name = "example_glue_trigger" workflow_name = aws_glue_workflow.example_glue_workflow.name type = "EVENT" actions { job_name = aws_glue_job.example_glue_job.name } } ######### AWS EventBridge ############## resource "aws_cloudwatch_event_rule" "example_etl_trigger" { name = "example_etl_trigger" description = "Trigger Glue job when a request is made to the API endpoint" event_pattern = jsonencode({ "source": ["example_api"] }) } resource "aws_cloudwatch_event_target" "glue_job_target" { rule = aws_cloudwatch_event_rule.example_etl_trigger.name target_id = "example_event_target" arn = aws_glue_workflow.example_glue_workflow.arn role_arn = local.example_role_arn } ```
1
answers
0
votes
23
views
asked 6 days ago
Unable to add Custom Data Types of a classifier through Cloud Formation Template. It is done throigh only Console, but there is no such parameter available in Cloud formation template.
0
answers
1
votes
15
views
nikhil
asked 7 days ago
Using Glue we can crawl snowflake table properly in catalog but Athena failed to query table data: HIVE_UNSUPPORTED_FORMAT: Unable to create input format Googling results suggested it's because of table created by crawler has empty "input format", "Output format"... yes they are empty for this table crawled from Snowflake. So the question is 1 why didn't crawler set them? (crawler can classify the table is snowflake correctly) 2 what the values should be if manual edit is needed? Is Snowflake table able to be queried by Athena? Any idea? Thanks,
1
answers
0
votes
39
views
profile picture
asked 8 days ago
Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primary key as the bookmark key instead of column 'ceres_mono_index'. Here is the code ```python cb_ceres = glueContext.create_dynamic_frame.from_options( connection_type="postgresql", connection_options={ "url": f"jdbc:postgresql://{ENDPOINT}:5432/{DBNAME}", "dbtable": "xxxxx_raw_ceres", "user": username, "password": password, }, additional_options={"jobBookmarkKeys": ["ceres_mono_index"], "jobBookmarkKeysSortOrder": "asc"}, transformation_ctx="cb_ceres_bookmark", ) ``` How could I fix the issue? Thank you
1
answers
0
votes
34
views
asked 9 days ago
Hello, I am trying to run my first job in AWS Glue, but I am encountering the following error: "An error occurred while calling o103.pyWriteDynamicFrame. /run-1679066163418-part-r-00000 (Permission denied)". The error message indicates that the permission has been denied. I am using an IAM Role that has AmazonS3FullAccess, AWSGlueServiceRole, and even AdministratorAccess. Although I understand that this is not ideal for security reasons, I added this policy to ensure that the IAM Role is not the issue. I have attempted to use different sources (such as DynamoDB and S3) and targets (such as Redshift and Datacatalog), but I consistently receive the same error. Does anyone know how I can resolve this issue? Thank you in advance!
1
answers
0
votes
27
views
asked 10 days ago
How can I apply a negative exclusion pattern on the configuration of my Crawler. I would like to negate every folder that does not match the following !prd**/queries/** I want to exclude everything that does not match with this
1
answers
0
votes
17
views
asked 10 days ago
I tried to set up an cross-account Athena access. I could see the database in Lake formation, Glue and Athena under target account. At the beginning I don't see any tables in the target Athena console. After I did something in Lake formation console (target account) I could see a table in target Athena console and query it successfully. But I could not see other tables from the same database even I tried many ways. I always got below error even I the gave the KMS access everywhere (both KMS and IAM role) or turn off the kms encryption in Glue. I don't know what is the actual reason. Below is an example of the error message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access. (Service: AWSKMS; Status Code: 400; Error Code: AccessDeniedException; Request ID: cb9a754f-fc1c-414d-b526-c43fa96d3c13; Proxy: null) (Service: AWSGlue; Status Code: 400; Error Code: GlueEncryptionException; Request ID: 0c785fdf-e3f7-45b2-9857-e6deddecd6f9; Proxy: null) This query ran against the "xxx_lakehouse" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: b2c74c7e-21ed-4375-8712-cd1579eab9a7. I have already added the permissions pointed out in https://repost.aws/knowledge-center/cross-account-access-denied-error-s3? Does anyone know how to fix the error and see the cross-account tables in Athena? Thank you very much.
1
answers
0
votes
46
views
asked 11 days ago
I followed all the steps mentioned in [https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html ]() and followed steps mentioned [https://www.youtube.com/watch?v=04LMQxDxjGM]() When I run jupyter notebook command from pycharm it is getting opened on internet explorer . But When I am trying to create juyptor notebook directly in pycharm I am not getting option of GLue pyspark or glue spark as mentioned in video. ![Enter image description here](/media/postImages/original/IMnbE0t-KjQFqO5Ux1CTKNYQ) ![Enter image description here](/media/postImages/original/IMVjMw2IzQQn6b7LDUrY935A) Also it never showed me https://localhost as shown in video
1
answers
0
votes
31
views
asked 11 days ago
I have created an Iceberg table on Athena with table property **vacuum_min_snapshots_to_keep**. I am running command `VACUUM hamza_iceberg_table;` , upon running this command I am getting below mentioned error: ``` [ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience. This query ran against the "db-name" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: a12te0e1a84-4028-87d3-a6e2 ```
1
answers
0
votes
46
views
asked 11 days ago
Hello everyone, We have been receiving this EntityNotFoundException error intermittently for one of our lambda fuctions. We've found that the EntityNotFoundException error we've been getting is actually a result of the GLUE job not being processed properly, which is why it's failing. By looking at the logs, We can see that the GLUE job is having issue in creating new table after deleting the old one which is why the lambda function is not able to locate it. We need guidance on why the error is occuring as there have been no code changes from our side. Error Details: [ERROR] EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the GetTable operation: Table tbl_site_crowdflower not found. Traceback (most recent call last):   File "/var/task/lambda_function.py", line 104, in lambda_handler     create_generic_db(data,bus_date,src_bucket_name)   File "/var/task/lambda_function.py", line 146, in create_generic_db     res = glue.get_table(DatabaseName=backup_databasename, Name=tbl_name)   File "/var/runtime/botocore/client.py", line 391, in _api_call     return self._make_api_call(operation_name, kwargs)   File "/var/runtime/botocore/client.py", line 719, in _make_api_call     raise error_class(parsed_response, operation_name) ![EntityNotFoundException](/media/postImages/original/IMFBxEje_JQOqGXgc4CxLLCA)
1
answers
0
votes
30
views
asked 12 days ago