By using AWS re:Post, you agree to the AWS re:Post Terms of Use

All Content tagged with Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Content language: English

Select up to 5 tags to filter
Sort by most recent
431 results
Issue: PySpark works in the first cells (likely SparkSession creation) but throws import errors when using my Python files in later cells. Environment: AWS EMR ( Amazon EMR version emr-6.4.0 Installe...
1
answers
0
votes
674
views
asked 8 months ago
Let me know if this is something AWS EMR Studio does: 1. in Databricks community edition, and in Google Collab, one can fire up a simple Jupyter notrebook with an automatically started cluster (small ...
1
answers
0
votes
693
views
asked 8 months ago
Hi everyone, I am using AWS EMR to do some ETL operations on very large datasets (like millions/billions of records). I am using PySpark and reading the csv files using *spark.read.csv*. The results ...
1
answers
0
votes
930
views
asked 8 months ago
While running the serverless job run, I am getting below errror: "Number of cores specified by 'spark.driver.cores '7' is invalid".
2
answers
0
votes
565
views
asked 8 months ago
Hi I have a EMR with Hbase on S3 storage mode.I have a read replica cluster pointing to same S3 bucket. Now when I add record in primary cluster and flush table on primary, and then run refresh_hfiles...
1
answers
0
votes
570
views
asked 8 months ago
Hi I am getting error while launching EMR with Hbase as S3Storage and WAL backup enabled . Caused by: java.lang.RuntimeException: createWal failed for wal WALMetadata(WALWorkspace=testworkspace2, Ta...
1
answers
0
votes
688
views
asked 8 months ago
profile pictureAWS
SUPPORT ENGINEER
published 9 months ago3 votes1.4K views
This article describes the high level procedure on how to integrate the tableau application with kerberized EMR cluster.
Amazon EMR
I have a Python package saved in CodeCommit and need to use it in the notebook attached to my EMR cluster workspace. The package is already successfully installed via bootstrap. To do this, in my .sh ...
1
answers
0
votes
585
views
asked 9 months ago
I have a Serverless EMR appication, I am submitting a spark job via python script. I have packaged all the dependencies an an the script to an s3 bucket. When I execute the job the spark job is runnin...
2
answers
0
votes
666
views
asked 9 months ago
Hello, I configured iceberg formatted table with transaction in hive on EMR 6.4.1. When I insert data into the table, the operation get stuck, without any error. Any insights are highly appreciate...
Accepted AnswerAmazon EMR
1
answers
0
votes
516
views
asked 9 months ago
I've started seeing the following error on JupyterHub on EMR `TypeError: required field "type_ignores" missing from Module` from the simplest commands ![the command](/media/postImages/original/IM0G...
2
answers
0
votes
574
views
asked 9 months ago
Hi Team, We have EMR 6.10 cluster where flink jobs submitted to existing application. Container was running in task node in my case. Then I resized the task instance group from 1 to 0 in task instance...
Accepted AnswerAmazon EMR
1
answers
0
votes
522
views
asked 9 months ago