Questions tagged with Amazon EMR
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
I know the recommended strategy is to use EMR Serverless or EMR. However, I have a particular use case where I only need to run a fairly small PySpark job and need quick results. I've already gotten...
0
answers
0
votes
30
views
asked 11 hours agolg...
Why does Amazon EMR creates inbound rule entries for master and core security groups?
![Core SG](/media/postImages/original/IM6Mggxg_vTQSTJFNCM0FRPA)
![Master...
1
answers
0
votes
168
views
asked 21 days agolg...
I have an EMR workspace under which I have 4 Jupyter notebooks created on which PySpark code blocks are run.
I want to get the last execution code block time across all 4 notebooks to determine the...
1
answers
0
votes
150
views
asked a month agolg...
I want to change the default s3 storage class to INTELLIGENT_TIERING of Hive connector of EMR Trino 426 (EMR 6.15.0).
I found the [hive.s3.storage-class option in the Trino 426 official...
Accepted AnswerAmazon EMR
2
answers
0
votes
185
views
asked a month agolg...
I am running an EMR cluster with an attached notebook, and using Apache spark to load/process data however I have not been able to load data into Apache. Whenever I try to run...
1
answers
0
votes
344
views
asked a month agolg...
I have spark application running in emr 7 that took 15+ hours which was taken 9 hours in emr 6.14. There is no code change and data volume changes. One observation is the application attempted thrice...
Accepted AnswerAmazon EMR
3
answers
0
votes
283
views
asked 2 months agolg...
I have an EMR cluster and I have used the treasure data connector to read data from table into dataframe using pyspark. Now these tables that I'm trying to read have approximately 100 million to 500...
1
answers
0
votes
334
views
asked 2 months agolg...
Issue: PySpark works in the first cells (likely SparkSession creation) but throws import errors when using my Python files in later cells.
Environment: AWS EMR ( Amazon EMR...
1
answers
0
votes
344
views
asked 2 months agolg...
Let me know if this is something AWS EMR Studio does:
1. in Databricks community edition, and in Google Collab, one can fire up a simple Jupyter notrebook with an automatically started cluster (small...
1
answers
0
votes
410
views
asked 2 months agolg...
Hi everyone,
I am using AWS EMR to do some ETL operations on very large datasets (like millions/billions of records). I am using PySpark and reading the csv files using *spark.read.csv*. The results...
1
answers
0
votes
465
views
asked 2 months agolg...
While running the serverless job run, I am getting below errror:
"Number of cores specified by 'spark.driver.cores '7' is invalid".
2
answers
0
votes
471
views
asked 2 months agolg...
Hi
I have a EMR with Hbase on S3 storage mode.I have a read replica cluster pointing to same S3 bucket.
Now when I add record in primary cluster and flush table on primary, and then run refresh_hfiles...
1
answers
0
votes
457
views
asked 2 months agolg...