Skip to content

Questions tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Filter questions
Select tags to filter
Sort by
Sort by most recent
Filter Questions by:

Browse through the questions and answers listed below or filter and sort to narrow down your results.

345 results
**Service and environment** * Product: Amazon EMR * Release label: emr-7.13.0 * Applications: Spark, Livy (and others as applicable) * Region: [e.g. us-east-1] * Workload: PySpark / Livy submitting Py...
2
answers
2
votes
88
views
asked 13 days ago
Hello, I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and...
1
answers
0
votes
293
views
asked 7 months ago
Hi Team, I'm trying to set up the Amazon Q on the EMR studio notebook workspace and followed this guide: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/emr-setup.html?trk=769a1a2b-8c19-4976...
1
answers
0
votes
135
views
asked 7 months ago
Getting this issue in Amazon EMR during a pyspark job execution. ``` df = spark.read.parquet("s3a://test/raw-billing-cor-data/cur2/123456789/cid-cur2/data/BILLING_PERIOD=2025-08/") py4j.protocol.Py4...
1
answers
0
votes
192
views
asked 7 months ago
Hi. I am trying to configure ZGC in HBase following the recommendations, but the JAVA_HOME and HBASE_REGIONSERVER_GC_OPTS variables are not modified in the /etc/hbase/conf/hbase-env.sh file. Has anyo...
1
answers
0
votes
103
views
asked 7 months ago
I am deploying an EMR HBase cluster with EMR WAL enabled using Terraform. The cluster is created successfully and the WALs are visible using the emrwal CLI. When I change some configuration of my clus...
1
answers
0
votes
133
views
asked 8 months ago
Hi everyone, I currently have an EMR cluster (emr-6.9.0) running a real-time ingestion process. To save disk space, I’ve been using the **Cloud Shuffle Storage Plugin** for Apache Spark. Now, I need ...
1
answers
0
votes
267
views
asked 9 months ago
Hi , Recently i started facing issues with EMR (EC2 is out of capacity), mentioning that "EC2 is out of capacity for m6a.12xlarge in availability zone us-east-1c" I tried different machines in same ...
Accepted AnswerAmazon EC2Amazon EMR
1
answers
0
votes
192
views
asked 10 months ago
Hi Team, I'm Getting the below error while writing the table in postgres using glue spark script can you please help me on this issue. "Table or view "asset_aircraft_rpt" already exists. SaveMode: ...
1
answers
0
votes
142
views
asked 10 months ago
Hello, I'm facing a pretty annoying error. Whenever I try to execute a UDF function on a EMR Notebook I get the following error: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o157...
2
answers
0
votes
153
views
asked 10 months ago
I'm trying to build a simple Collaborative Filtering Recommendation Engine using Apache Spark ML lib on Amazon EMR. So I created a EMR on EC2 cluster, with the following configuration: ![Enter image...
1
answers
0
votes
118
views
asked 10 months ago
Hello! We're trying to migrate from a stand-alone Hive Metastore to Glue. We've modified the definition of some EMR clusters (v7.0.0) to use Glue as the metastore, we use Spark on Hadoop to process da...
2
answers
0
votes
253
views
asked a year ago
  • 1
  • 2
  • 3
  • 4
  • 5
  • •••
  • 29
  • Page size
    12 / page