Skip to content

Error Category: UNCLASSIFIED_ERROR; Failed Line Number: 36; An error occurred while calling z:com.amazonaws.services.glue.ml.EntityDetector.detect. Task not serializable

0

Custom regex pattern I get error below when I try to use a custom regex pattern to detect sensitive data in Glue Studio

Error Category: UNCLASSIFIED_ERROR; Failed Line Number: 36; An error occurred while calling z:com.amazonaws.services.glue.ml.EntityDetector.detect. Task not serializable

It works for other in built categories of sensitive data

ERROR 2025-03-19T00:32:59,235 28286 com.amazonaws.services.glue.ProcessLauncher [main] 76 Error from Python:Traceback (most recent call last): File "<frozen runpy>", line 291, in run_path File "<frozen runpy>", line 98, in _run_module_code File "<frozen runpy>", line 88, in _run_code File "/tmp/glue-job-14421657534699783885/test job.py", line 39, in <module> DetectSensitiveData_node1742300301037 = entity_detector.detect(AWSGlueDataCatalog_node1742301366920, detection_parameters, "DetectedEntities") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hadoop/.local/lib/python3.11/site-packages/awsglueml/transforms/entity_detector.py", line 56, in detect jvm.EntityDetector.detect( File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in call return_value = get_return_value( ^^^^^^^^^^^^^^^^^ File "/usr/lib/spark/python/lib/py

asked a year ago177 views
2 Answers
1

I downgraded the visual job from 5.0 to 4.0

Glue Version: Ensure you're using a Glue version that fully supports custom regex patterns for sensitive data detection.

answered a year ago
  • Glue 4.0 works for me. Thank you!

0
Accepted Answer

The error you're encountering suggests there might be an issue with the custom regex pattern you're trying to use in AWS Glue Studio for sensitive data detection. Here are a few things to check and consider:

  1. Regex String Format: Ensure that your regular expression string is properly formatted and follows the constraints specified by AWS Glue. The regex string should be between 1 and 255 characters long and match the single-line string pattern.

  2. Validation: Before using the custom pattern in your Glue job, it's a good practice to validate the regex. AWS Glue Studio provides a validation feature when creating a new detection entity pattern. Use this to confirm your regex is valid.

  3. Context Words: If you've added context words to your custom pattern, make sure they are relevant and don't conflict with the regex. Context words are optional but can help increase the accuracy of detection.

  4. Unique Name: Ensure that the name you've given to your custom pattern is unique within your AWS account and doesn't exceed 255 characters.

  5. API Usage: If you're creating the custom pattern programmatically outside of Glue Studio, make sure you're using the CreateCustomEntityType API correctly, providing all required parameters.

  6. Serialization: The error message mentions a serialization issue. This could indicate a problem with how the custom pattern is being passed to or processed by the Glue job. You might need to review how you're incorporating the custom pattern into your Glue script.

  7. Glue Version: Ensure you're using a Glue version that fully supports custom regex patterns for sensitive data detection.

If you've checked all these points and are still encountering the error, you may need to reach out to AWS support for further assistance. They can provide more specific guidance based on your exact configuration and use case.
Sources
Detect and process sensitive data - AWS Glue
Sensitive data detection API - AWS Glue
CustomEntityType - AWS Glue
CreateCustomEntityType - AWS Glue

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.