All Content tagged with Amazon EMR
Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.
Content language: English
Filter content
Select tags to filter
Sort by
Sort by most recent
464 results
Naveen JagathesanEXPERT
published 15 days ago0 votes94 views
Running Spark on EMR with KMS-encrypted S3 data? Every object read triggers a kms:Decrypt API call — and at scale, those costs add up fast. If your compliance requirements prevent switching to S3 Buck...
Mark TwomeyEXPERT
published 16 days ago1 votes99 views
A field guide for PyTorch, TensorFlow, Spark, and Kubernetes workloads reading training data from Amazon S3 Express One Zone directory buckets.
**Service and environment**
* Product: Amazon EMR
* Release label: emr-7.13.0
* Applications: Spark, Livy (and others as applicable)
* Region: [e.g. us-east-1]
* Workload: PySpark / Livy submitting Py...
2
answers
2
votes
147
views
asked a month ago
shubhranshuEXPERT
published 2 months ago6 votes210 views
I want to view the Spark UI for my AWS Glue job runs, but I cannot use Docker on my local machine. I need an alternative way to run the Apache Spark History Server natively on macOS to read Spark even...
shubhranshuEXPERT
published 4 months ago11 votes291 views
I'm trying to install Python modules in my AWS Glue Python Shell job using wheel files stored in Amazon Simple Storage Service (Amazon S3). My job runs in a private Virtual Private Cloud (VPC) with Am...
Ram AchantaEXPERT
published 6 months ago2 votes501 views
This framework provides a structured approach for migrating analytics workloads from EMR on EC2 to EMR Serverless in enterprise environments. It guides organizations through the complete migration lif...
Ram AchantaEXPERT
published 7 months ago2 votes305 views
Enterprises struggle with EMR version upgrades, facing challenges like production downtime, performance degradation, and compliance risks. Without a structured approach, organizations often experience...
Hello,
I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and...
1
answers
0
votes
300
views
asked 8 months ago
Hi Team,
I'm trying to set up the Amazon Q on the EMR studio notebook workspace and followed this guide: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/emr-setup.html?trk=769a1a2b-8c19-4976...
1
answers
0
votes
139
views
asked 8 months ago
Getting this issue in Amazon EMR during a pyspark job execution.
```
df = spark.read.parquet("s3a://test/raw-billing-cor-data/cur2/123456789/cid-cur2/data/BILLING_PERIOD=2025-08/")
py4j.protocol.Py4...
1
answers
0
votes
198
views
asked 8 months ago
Hi.
I am trying to configure ZGC in HBase following the recommendations, but the JAVA_HOME and HBASE_REGIONSERVER_GC_OPTS variables are not modified in the /etc/hbase/conf/hbase-env.sh file. Has anyo...
1
answers
0
votes
109
views
asked 8 months ago
