1 Risposta
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
0
Seems like the you ran out of HTTPConnection objects that is either trying to connect to source (s3) or connect to sink (temp location of s3). I have seen issues with EMR like this before and I did set fs.s3.maxConnections to high value to increase the connection pool size. You can increase that as mentioned here 1. You can set the value by below
scala: sparkcontext.hadoopConfiguration.set("spark.hadoop.fs.s3.maxConnections", 1000)
python: sparkcontext._jsc.hadoopConfiguration().set('spark.hadoop.fs.s3.maxConnections', '1000')
The issue might be because of large files being fetched and written to sink and thus the HTTP Connection might take longer then normal and the pool might not just be enough for you.
con risposta 4 anni fa
Contenuto pertinente
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 2 anni fa