By using AWS re:Post, you agree to the AWS re:Post Terms of Use

All Content tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Select up to 5 tags to filter
Sort by most recent
431 results
I have a use case where I need to run Batch EMR job on schedule (daily). I can make folders on date basis for my data coming from IoT. Or I can make folders for each device sending IoT data and put da...
1
answers
0
votes
436
views
asked 5 months ago
Trying to load data of 200GB into dynamo using spark EMR but facing performance issues. """ Copy paste the following code in your Lambda function. Make sure to change the following key parameters for...
4
answers
0
votes
803
views
asked 5 months ago
profile pictureAWS
SUPPORT ENGINEER
published 5 months ago3 votes1.3K views
This article offers instructions on how to set up and access Delta tables from SQL Explorer in EMR JupyterHub. SQL Explorer utilizes the Presto engine configured within the EMR cluster to process data...
Amazon EMR
I'm trying to create a EMR 7.1.0 cluster with HBase enabled for full S3 backup (including WAL) via the web console. However, no AWSServiceRoleForEMRWAL role is automatically being created and thus my ...
2
answers
0
votes
424
views
asked 5 months ago
I'm trying to find out if Trino on EMR supports access controls maintained in Lake Formation. My catalog is AWS Glue. I couldn't find any documentation on Lake Formation or EMR side that would talk ab...
1
answers
0
votes
592
views
profile picture
asked 5 months ago
profile pictureAWS
SUPPORT ENGINEER
published 5 months ago2 votes2.2K views
This article offers instructions on how to configure additional Elastic Block Store (EBS) volumes for HDFS or YARN to increase the storage capacity of a running Amazon EMR cluster.
Amazon EMR
Hello, Can we get solution for this error `Service: EmrServerlessResourceManager; Status Code: 403; Error Code: AccessDeniedException` while running spark submit jobs at EMR Serverless. Below is th...
1
answers
0
votes
756
views
asked 5 months ago
I noticed that when you create a new EMR cluster using Spark, the default Python environment includes two different packages that both provide the "dateutil" package: ``` py-dateutil==2.2 python-date...
1
answers
1
votes
652
views
asked 6 months ago
Hello Experts, Technically speaking, EBS volumes assigned to the EMR core nodes are persistent storage and I have specifically created them to not delete on cluster termination. Then, I have attached...
Accepted AnswerAmazon EMR
1
answers
0
votes
515
views
asked 6 months ago
I know the recommended strategy is to use EMR Serverless or EMR. However, I have a particular use case where I only need to run a fairly small PySpark job and need quick results. I've already gotten...
1
answers
0
votes
795
views
asked 6 months ago
Why does Amazon EMR creates inbound rule entries for master and core security groups? ![Core SG](/media/postImages/original/IM6Mggxg_vTQSTJFNCM0FRPA) ![Master SG](/media/postImages/original/IMqAmPsK...
1
answers
0
votes
631
views
asked 7 months ago