Skip to content

Error while creating running Blueprint in AWS LakeFormation: IllegalArgumentException: Unrecognized scheme null; expected s3, s3n, or s3a

0

I am trying to create a blueprint with AWS redshift as my source data. While running the blueprint I am getting following error -

IllegalArgumentException: Unrecognized scheme null; expected s3, s3n, or s3a

I assume this is related to S3 location. The only place in the blueprint I am providing s3 location is while configuring the "import target" parameters. I am following the tutorial specified on this page - https://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-tutorial-jdbc.html. I have double checked the value for "Target storage location" and its a valid s3 location (s3://data-lake-custom-location)

When I investigated the logs on AWS Glue side, I see the following trace.

25/06/30 16:42:14 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] {"Event":"GlueETLJobExceptionEvent","Timestamp":1751301734197,"Failure Reason":"Traceback (most recent call last):\n  File \"/tmp/jdbc_snapshot.py\", line 328, in <module>\n    main()\n  File \"/tmp/jdbc_snapshot.py\", line 324, in main\n    driver.run_transform()\n  File \"/tmp/jdbc_snapshot.py\", line 305, in run_transform\n    transform.transform()\n  File \"/tmp/jdbc_snapshot.py\", line 91, in transform\n    self._snapshot_transform()\n  File \"/tmp/jdbc_snapshot.py\", line 79, in _snapshot_transform\n    table_name=self.source.table_name)\n  File \"/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py\", line 791, in from_catalog\n    return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)\n  File \"/opt/amazon/lib/python3.6/site-packages/awsglue/context.py\", line 188, in create_dynamic_frame_from_catalog\n    return source.getFrame(**kwargs)\n  File \"/opt/amazon/lib/python3.6/site-packages/awsglue/data_source.py\", line 36, in getFrame\n    jframe = self._jsource.getDynamicFrame()\n  File \"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py\", line 1305, in __call__\n    answer, self.gateway_client, self.target_id, self.name)\n  File \"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 117, in deco\n    raise converted from None\npyspark.sql.utils.IllegalArgumentException: Unrecognized scheme null; expected s3, s3n, or s3a","Stack Trace":[{"Declaring Class":"deco","Method Name":"raise converted from None","File Name":"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py","Line Number":117},{"Declaring Class":"__call__","Method Name":"answer, self.gateway_client, self.target_id, self.name)","File Name":"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py","Line Number":1305},{"Declaring Class":"getFrame","Method Name":"jframe = self._jsource.getDynamicFrame()","File Name":"/opt/amazon/lib/python3.6/site-packages/awsglue/data_source.py","Line Number":36},{"Declaring Class":"create_dynamic_frame_from_catalog","Method Name":"return source.getFrame(**kwargs)","File Name":"/opt/amazon/lib/python3.6/site-packages/awsglue/context.py","Line Number":188},{"Declaring Class":"from_catalog","Method Name":"return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)","File Name":"/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py","Line Number":791},{"Declaring Class":"_snapshot_transform","Method Name":"table_name=self.source.table_name)","File Name":"/tmp/jdbc_snapshot.py","Line Number":79},{"Declaring Class":"transform","Method Name":"self._snapshot_transform()","File Name":"/tmp/jdbc_snapshot.py","Line Number":91},{"Declaring Class":"run_transform","Method Name":"transform.transform()","File Name":"/tmp/jdbc_snapshot.py","Line Number":305},{"Declaring Class":"main","Method Name":"driver.run_transform()","File Name":"/tmp/jdbc_snapshot.py","Line Number":324},{"Declaring Class":"<module>","Method Name":"main()","File Name":"/tmp/jdbc_snapshot.py","Line Number":328}],"Last Executed Line number":328,"script":"jdbc_snapshot.py"}

followed by this error description

pyspark.sql.utils.IllegalArgumentException: Unrecognized scheme null; expected s3, s3n, or s3a

This does not help in identifying the root cause. Can you please help me to resolve this error? Thanks

PS:FYI, I have already tested my redshift connection and it works!

asked a year ago102 views
1 Answer
1

This error occurs because the workflow role doesn't have the necessary permissions to access the S3 target location or there's an issue with how the S3 location is being referenced in your blueprint configuration.

When creating a Lake Formation workflow using blueprints, there are several permissions and configuration requirements that need to be met:

  1. Workflow Role Permissions: Ensure that the workflow role has the required Lake Formation permissions on both the target database and the Amazon S3 target location. The error "Unrecognized scheme null; expected s3, s3n, or s3a" suggests that the workflow is unable to properly access or recognize the S3 location.

  2. S3 Location Registration: The target S3 location needs to be registered with Lake Formation. If it's not registered, the workflow won't be able to properly access it.

  3. IAM Permissions: The workflow role needs the appropriate IAM permissions to access the S3 bucket, including s3:GetObject, s3:PutObject, and other relevant permissions.

  4. Temporary Directory for Redshift: When working with Redshift as a source, you might need to specify a temporary directory in S3 where Redshift can store intermediate results. Make sure this is properly configured and the workflow role has access to it.

  5. Data Location Permissions: Grant data location permissions on the target Amazon S3 location to the IAM role used for the workflow.

To resolve this issue:

  • Verify that the S3 location is correctly formatted (should start with "s3://")
  • Ensure the S3 location is registered with Lake Formation
  • Check that the workflow role has the necessary Lake Formation permissions on both the database and S3 location
  • Confirm that the workflow role has the required IAM permissions to access the S3 bucket
  • If using Redshift, ensure any required temporary directories are properly configured

Since you mentioned that your Redshift connection is working, the issue is most likely related to the S3 target location configuration or permissions.
Sources
Creating a workflow - AWS Lake Formation
Ingestion layer - AWS Serverless Data Analytics Pipeline
Crawler errors when the crawler is using Lake Formation permissions - AWS Glue

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.