boto3 - unable to put string to s3 in EMR cluster - "Not a directory" error

0

In a workflow where Airflow spins up an EMR cluster to run a PySpark job, in the job I attempt to copy a string to s3 using the boto3 api. I can terminal into the cluster, start pyspark and copy the string manually with no issues. However, when it runs in the job I get this error:

botocore.exceptions.SSLError: SSL validation failed for https://my-bucket-name.s3.amazonaws.com/path/to/file.txt [Errno 20] Not a directory

This exception is spawned from the following two stack traces in the urllib3 package:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 322, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir)
NotADirectoryError: [Errno 20] Not a directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/botocore/httpsession.py", line 262, in send
    chunked=self._chunked(request.headers),
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 344, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 344, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 843, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 370, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    raise SSLError(e)
urllib3.exceptions.SSLError: [Errno 20] Not a directory

I am at a loss for what is going on here. It appears to be some issue with the certs but I don't understand why it works manually but not through the job. Any insight would be greatly appreciated.

jfaath
asked 5 years ago746 views
1 Answer
0

This was resolved by installing the certifi library to the EMR image.

jfaath
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions