Skip to content

Questions tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Filter questions
Select tags to filter
Sort by
Sort by most recent
Filter Questions by:

Browse through the questions and answers listed below or filter and sort to narrow down your results.

344 results
Hello, I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and...
1
answers
0
votes
164
views
asked 3 months ago
Hi Team, I'm trying to set up the Amazon Q on the EMR studio notebook workspace and followed this guide: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/emr-setup.html?trk=769a1a2b-8c19-4976...
1
answers
0
votes
90
views
asked 4 months ago
Getting this issue in Amazon EMR during a pyspark job execution. ``` df = spark.read.parquet("s3a://test/raw-billing-cor-data/cur2/123456789/cid-cur2/data/BILLING_PERIOD=2025-08/") py4j.protocol.Py4...
1
answers
0
votes
105
views
asked 4 months ago
Hi. I am trying to configure ZGC in HBase following the recommendations, but the JAVA_HOME and HBASE_REGIONSERVER_GC_OPTS variables are not modified in the /etc/hbase/conf/hbase-env.sh file. Has anyo...
1
answers
0
votes
54
views
asked 4 months ago
I am deploying an EMR HBase cluster with EMR WAL enabled using Terraform. The cluster is created successfully and the WALs are visible using the emrwal CLI. When I change some configuration of my clus...
1
answers
0
votes
71
views
asked 5 months ago
Hi everyone, I currently have an EMR cluster (emr-6.9.0) running a real-time ingestion process. To save disk space, I’ve been using the **Cloud Shuffle Storage Plugin** for Apache Spark. Now, I need ...
1
answers
0
votes
191
views
asked 5 months ago
Hi , Recently i started facing issues with EMR (EC2 is out of capacity), mentioning that "EC2 is out of capacity for m6a.12xlarge in availability zone us-east-1c" I tried different machines in same ...
Accepted AnswerAmazon EC2Amazon EMR
1
answers
0
votes
132
views
asked 7 months ago
Hi Team, I'm Getting the below error while writing the table in postgres using glue spark script can you please help me on this issue. "Table or view "asset_aircraft_rpt" already exists. SaveMode: ...
1
answers
0
votes
95
views
asked 7 months ago
Hello, I'm facing a pretty annoying error. Whenever I try to execute a UDF function on a EMR Notebook I get the following error: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o157...
2
answers
0
votes
123
views
asked 7 months ago
I'm trying to build a simple Collaborative Filtering Recommendation Engine using Apache Spark ML lib on Amazon EMR. So I created a EMR on EC2 cluster, with the following configuration: ![Enter image...
1
answers
0
votes
77
views
asked 7 months ago
Hello! We're trying to migrate from a stand-alone Hive Metastore to Glue. We've modified the definition of some EMR clusters (v7.0.0) to use Glue as the metastore, we use Spark on Hadoop to process da...
2
answers
0
votes
183
views
asked 7 months ago
We're looking for native API/CLI/SDK support to manage EMR Studio workspaces programmatically. Currently, these operations (create, list, delete, etc.) are only possible via the UI, making it difficul...
1
answers
0
votes
152
views
asked 8 months ago
  • 1
  • 2
  • 3
  • 4
  • 5
  • •••
  • 29
  • Page size
    12 / page