boto3 - unable to put string to s3 in EMR cluster - "Not a directory" error

0

In a workflow where Airflow spins up an EMR cluster to run a PySpark job, in the job I attempt to copy a string to s3 using the boto3 api. I can terminal into the cluster, start pyspark and copy the string manually with no issues. However, when it runs in the job I get this error:

botocore.exceptions.SSLError: SSL validation failed for https://my-bucket-name.s3.amazonaws.com/path/to/file.txt [Errno 20] Not a directory

This exception is spawned from the following two stack traces in the urllib3 package:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 322, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir)
NotADirectoryError: [Errno 20] Not a directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/botocore/httpsession.py", line 262, in send
    chunked=self._chunked(request.headers),
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 344, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 344, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 843, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 370, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    raise SSLError(e)
urllib3.exceptions.SSLError: [Errno 20] Not a directory

I am at a loss for what is going on here. It appears to be some issue with the certs but I don't understand why it works manually but not through the job. Any insight would be greatly appreciated.

jfaath
gefragt vor 5 Jahren758 Aufrufe
1 Antwort
0

This was resolved by installing the certifi library to the EMR image.

jfaath
beantwortet vor 5 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen