AWS announces preview of AWS Interconnect - multicloud
AWS announces AWS Interconnect – multicloud (preview), providing simple, resilient, high-speed private connections to other cloud service providers. AWS Interconnect - multicloud is easy to configure and provides high-speed, resilient connectivity with dedicated bandwidth, enabling customers to interconnect AWS networking services such as AWS Transit Gateway, AWS Cloud WAN, and Amazon VPC to other cloud service providers with ease.
Implementing a Governance Framework for Migrating from Amazon EMR on EC2 to EMR Serverless
This framework provides a structured approach for migrating analytics workloads from EMR on EC2 to EMR Serverless in enterprise environments. It guides organizations through the complete migration lifecycle—from initial workload assessment and benchmarking to implementation, security controls, and post-migration optimization. Designed for platform teams, application owners, and operations staff; it focuses on migration methodology and governance rather than specific code implementations.
Introduction
As organizations scale their big data workloads on AWS, many are evaluating the transition from Amazon EMR on EC2 to EMR Serverless. EMR Serverless offers compelling advantages for enterprises, significantly reducing operational overhead by eliminating cluster management tasks while providing automatic scaling and pay-per-use pricing . Organizations can focus on running analytics workloads without managing clusters, benefit from near-instantaneous startup times, and achieve cost optimization through resource utilization. Additionally, the fine-grained scaling capabilities help organizations optimize costs by paying only for resources actually consumed during job execution.
However, this migration presents complex challenges, particularly for large enterprises managing multiple application teams and diverse workloads. Common challenges include maintaining consistent security controls, managing shared resources (ex: max concurrent vCPUs) effectively, ensuring predictable performance, and controlling costs—all while coordinating migrations across multiple application teams. Without a structured approach, organizations risk inconsistent implementations, resource contention, and potential security gaps during this transition.
This framework provides a centralized, prescriptive approach for managing these migrations while ensuring operational excellence and security compliance.
Solution Overview
The governance framework implements a three-phased approach to EMR Serverless migration, combining AWS best practices with enterprise-grade controls. The solution addresses key migration challenges through:
Workload Assessment and Planning: The framework begins with a systematic evaluation of existing EMR on EC2 workloads using detailed decision matrices. This helps teams determine which applications are best suited for EMR Serverless, considering factors such as business needs, job patterns and resource requirements.
Resource Management and Control: The framework establishes a centralized resource management strategy that enables organizations to effectively monitor and control EMR Serverless environment. Through Amazon CloudWatch integration, teams can track key metrics including vCPU utilization, memory consumption, and job execution patterns. CloudWatch dashboards can provide visibility into resource usage trends and help identify optimization opportunities. The framework highlights automated alerts for resource utilization thresholds and job performance anomalies, enabling proactive management of the environment.
Security and Compliance Integration: The framework enforces security controls through standardized IAM roles, AWS Lake Formation for fine-grained access control, encryption requirements, and networking configurations.
Migration Execution and Strategy
The migration from EMR on EC2 to EMR Serverless requires a methodical approach guided by clear principles and decision frameworks. This section outlines the strategic components that ensure successful transition while maintaining operational stability and security compliance.
1. Guiding Principles for Migration
Workload Suitability Assessment [Right-Workload, Right-Platform]: Your migration journey starts with a thorough workload suitability assessment. Before initiating any migration, validate your workload's compatibility with EMR Serverless by analyzing job patterns, resource utilization, and performance requirements. Pay particular attention to job duration, memory usage patterns, and data access patterns, as these factors significantly impact migration success.
Resource Management Strategy [Shared Quota Awareness]: Resource management becomes critical in a shared environment. Implement a comprehensive approach that accounts for concurrent execution needs across all applications. You will want to establish pre-initialized capacity for latency-sensitive workloads and define appropriate vCPU and memory configurations for different job types. Regular monitoring and adjustment of these configurations will help you achieve optimal resource utilization.
Cost Optimization Framework [Cost Visibility First]: Cost optimization should begin before production deployment. Start by establishing baseline costs from your existing EMR on EC2 deployments and model EMR Serverless costs using the AWS Cost Explorer and EMR Serverless Cost Estimator. Set up cost allocation tags and budget alerts early in the process, and implement automated cost optimization measures to maintain control over spending.
Security and Governance Controls [Security by Default]: Security remains paramount throughout the migration. Implement security-first principle through standardized IAM roles and permissions aligned with least privilege principles. Ensure comprehensive encryption for data at rest and in transit, and integrate with AWS Lake Formation for robust data access governance.
2. Core Platform Differences
When transitioning from EMR on EC2 to EMR Serverless, you'll encounter several architectural changes that require attention!
Infrastructure Management: The most significant shift occurs in infrastructure management, where EMR Serverless removes cluster management and EC2 instance selection requirements. This change requires you to adapt your monitoring and optimization strategies to focus on job-level metrics rather than instance-level metrics.
Resource Scaling: Resource scaling in EMR Serverless operates differently from EMR on EC2's instance-based approach. You will work with fine-grained scaling at the vCPU and memory level, which requires redefining your auto-scaling strategies and adjusting job configurations for optimal resource utilization. Consider implementing pre-initialization for performance-sensitive workloads to minimize startup times.
Storage Architecture: The storage architecture in EMR Serverless works with Amazon S3. This requires migrating your data storage patterns to S3, optimizing data access patterns for S3 performance, and implementing S3 lifecycle policies for cost optimization. Moreover, the local disks on workers also support temporary storage that EMR Serverless uses to shuffle and process data during job runs.
Below is a matrix that explores differences between EMR on EC2 with AWS Serverless big data compute engines like EMR Serverless and AWS Glue. Your choice between EMR on EC2 & Serverless deployments depends on specific workload characteristics and operational requirements.
| Dimension | EMR on EC2 | EMR Serverless | AWS Glue |
|---|---|---|---|
| Provisioning | Full cluster control (EC2, networking, OS) | No infrastructure to manage—just submit jobs | No infrastructure; fully abstracted |
| Scaling | Instance-based auto scaling (slow granularity) | Fine-grained worker scaling by vCPU/memory; elastic capacity | Auto-scaled DPUs; pre-warmed workers (Glue 3.0+) |
| Startup Time | 5–10 mins (with bootstrap overhead) | ~ 10–20 seconds; near instantaneous with pre-initialized capacity | 1–2 minutes (can be longer for cold starts) |
| Cost Efficiency | Best with Reserved/Spot for steady, long-running jobs | Pay-per-use; best for bursty, seasonal, or interactive job | Pay-per-DPU-second; good for scheduled jobs with small-to-medium complexity |
| Customization | Full control: install libraries, agents, OS tuning | Limited; only Spark/Hive config and code tuning, custom AMIs support available | Minimal customization; no access to underlying Spark environment |
| Framework Support | Spark, Hive, Flink, HBase, Presto, custom apps | Spark, Hive | Apache Hudi, Iceberg, Spark, Python (Ray, Pandas), Scala, SQL |
| Storage | HDFS (cluster-local) + S3 integration | No HDFS; I/O goes to S3 | S3 + Glue Catalog |
| Shuffle Behavior | HDFS/local disk used for shuffle | S3 + shuffle-optimized disks | Internal storage, S3 |
| Monitoring & Logging | CloudWatch, custom tools | CloudWatch with job-level insights, fewer tuning knobs, Spark UI | CloudWatch + job metrics |
| Governance | Per-VPC and per-job IAM; tagging requires customization | Tag-based governance supported; works well in multi-tenant shared services | IAM + Lake Formation + Data Catalog integration |
| Security | Full control over networking, encryption, OS patches | Secure by default; VPC and encryption managed | Fully managed encryption, VPC support; Glue-specific role |
| Resilience | Depends on instance & AZ config; Spot interruption risk | Multi-AZ, managed execution, automatic re-tries, Spot is not yet supported | Fully managed, auto-retries, and resilient runtime |
| Ideal Use Case | Complex ML pipelines, persistent long-running jobs | Interactive applications, seasonal data prep, ad-hoc analytics | Scheduled ETL, catalog-integrated pipelines, serverless ELT |
Additional differences between EMR on EC2 vs EMR on EKS vs EMR Serverless are outlined here: EMR FAQs
3. Migration Decision Framework
| Decision Question | Decision Driver | Recommended Compute Model | Rationale |
|---|---|---|---|
| Do you need OS-level control or custom libraries? | Custom AMIs, bootstrap scripts, native agents | EMR on EC2 | Full control over the cluster environment, suitable for complex dependencies |
| Are jobs short-lived or sporadic (bursty)? | Pay-per-use, startup speed | EMR Serverless; AWS Glue | Serverless models scale fast and cost less for intermittent workloads |
| Is the workload primarily scheduled ETL? | Integration with Glue Catalog, transform + load | AWS Glue | Purpose-built for ETL with strong metadata and orchestration support |
| Do you need consistent performance with long job runtimes? | Resource reservation, predictable cost | EMR on EC2 (with Reserved/Spot instances) | Efficient for persistent workloads |
| Is fast startup critical (e.g., interactive analysis)? | Latency sensitivity | EMR Serverless | Pre-initialized capacity enables near-instant responsiveness |
| Are you running Spark/Hive-only jobs? | Framework limitations | EMR Serverless | No need for broader frameworks |
| Is your platform multi-tenant or shared? | Quota enforcement, team isolation, chargeback | EMR Serverless | Supports tag-based + building quota controls for shared platform environments |
| Do you require GPU or specialty instance types? | ML/DL workloads with custom infra | EMR on EC2 | Supports GPU-enabled EC2, ideal for custom ML pipelines |
| Do you need fine-grained scaling with minimal tuning? | Auto-scaling simplicity | EMR Serverless; AWS Glue | Both handle resource level scaling |
| Is your organization optimizing for compliance and security without custom infra? | Managed encryption, access policies | EMR Serverless; AWS Glue | Serverless options offer built-in compliance and secure defaults |
EMR Serverless Considerations & Limitations:
- Support limited to Apache Spark and Apache Hive workloads
- No support for EC2 Spot instances currently
- Cannot modify resource configurations after job submission
- Unable to install custom libraries at runtime
- Resource changes require application restart
- More information can be found here:
4. Migration Intake & Communication
Successful migrations require clear communication and structured processes between application teams and Platform Team. To initiate a migration, application teams submit a comprehensive migration request to the Platform team through a standardized intake process. This process captures essential information needed to evaluate and support the migration effectively.
| Required Information | Description | Example |
|---|---|---|
| Application Details | Name, owner, business unit, environment | DataLake ETL, Jane Smith, Finance, Dev |
| Current Configuration | EMR version, instance types, cluster size | EMR 6.9.0, m5.4xlarge, 10 core nodes, 20 task nodes with EMR Managed Scaling |
| Target Platform | EMR Serverless | EMR Serverless 6.10.0 |
| Workload Characteristics | Job frequency, duration, peak/average resource utilization | Daily, 45 min avg runtime, 80% CPU peak |
| Data Profile | Volume, file formats, partition strategy | 500GB daily, Parquet, partitioned by date |
| SLA Requirements | Processing window, criticality tier | Complete by 06:00 PM daily, Tier 2 |
| Benchmark Window | Proposed testing period | Jan 15-22, 2026 |
| Dependencies | Upstream/downstream systems | Depends on Sales data, feeds BI dashboards |
The Platform Team uses this information to:
- Assess migration needs and complexity
- Identify potential risks and mitigation strategies
- Plan appropriate resources and support
- Establish appropriate monitoring and alerting thresholds
5. Benchmarking & Sizing Methodology
Benchmarking plays a crucial role in ensuring successful migrations to EMR Serverless. This process prevents unbounded resource consumption, ensures predictable performance, validates your S3 architecture, establishes cost baselines, and identifies optimal configuration requirements.
Your benchmarking journey begins with baseline measurements from your existing EMR on EC2 environment. Collect comprehensive job profiling data, including vCPU utilization, memory utilization, shuffle data volume, I/O patterns, job duration, and cost per run. Document your current performance metrics and establish clear targets for the migration.
Standardized Benchmarking Process
Below phased approach is an example and can be modified as per individual app team requirements:
Phase 1: Baseline Measurement (EMR on EC2)
1. Job Profiling Data Collection: Capture the following metrics over at least 5 job executions: a. vCPU utilization (avg/peak): b. Memory utilization (avg/peak): c. Shuffle data volume from Spark UI: d. I/O patterns (read/write): e. Job duration (min/max/avg): f. Cost per run from AWS Cost Explorer [EMR + EC2 + EBS]:
2. Performance Baseline Documentation
| Metric | Current Performance | Target Performance | Acceptable Range |
|---|---|---|---|
| Job Duration | 45 minutes | ≤40 minutes | +/- 10% |
| Cost per Run | $X.XX | ≤$X.XX | +/- 15% |
| Resource Efficiency | X vCPU-hrs | mprove by ≥10% | N/A |
| Failure Rate | X% | ≤X% | + 0.5% |
Phase 2: EMR Serverless Configuration Tuning
1. Initial Resource Sizing Guidelines: 1. Driver Size: Begin with equivalent memory to current driver node, typically 4-16GB 2. Executor Size: Start with equivalent to current executor configuration, typically 4-8GB 3. Initial Worker Count: Begin with approximately 80% of current core node count 4. Max Worker Count: Set to 120% of current core node count as a starting point
2. Configuration Testing Matrix: Test each workload with the following configuration variations:
| Test ID | Configuration Focus | Description |
|---|---|---|
| TS-1 | Baseline | Direct equivalent to current EMR on EC2 resources |
| TS-2 | Pre-Initialization (optional) | Enable pre-initialized capacity for faster startup |
| TS-3 | Resource Scaling | Increase worker resources by 20% to evaluate performance impact |
| TS-4 | Driver Optimization | Increase driver resources to evaluate coordination improvements |
| TS-5 | Shuffle Optimization | Enable shuffle-optimized storage for shuffle-heavy workloads |
3. Storage Performance Testing:
1. Small file test: Process x number of files < 1 MB
2. Large file test: Process y number of files > 1 GB
3. Mixed workload: Realistic data pattern
4. Capture S3 request count and latency
Note: Thresholds & Limits - Define job-level caps and mandatory application max capacity. Platform Team to block workloads without approved application max capacity.
Phase 3: Cost-Performance Analysis
EMR Serverless Cost Estimation: https://aws.amazon.com/emr/pricing/ → You are charged for aggregate vCPU, memory, and storage resources used from the time, workers are ready to run your workload until the time they stop, rounded up to the nearest second with a 1-minute minimum
- vCPU-seconds consumed:
- GB-seconds of memory consumed:
- Standard Storage:
- Shuffle Optimized Storage (if applicable):
- S3 request costs:
Total price = $xx
There is also an EMR Serverless Cost Estimator tool for Apache Spark applications: https://github.com/aws-samples/aws-emr-utilities/tree/main/utilities/emr-serverless-estimator
Benchmarking Success Criteria
All migrations must meet the following criteria before production approval:
-
Performance Validation:
- Job duration within 10% of baseline (or better)
- Resource utilization efficiency improved by ≥10%
- All SLAs consistently met across 10+ test executions
-
Cost Optimization:
- Cost comparison analysis completed and favorable
- Total cost per execution documented and predictable
- Resource limits properly configured to prevent runaway costs
-
Operational Readiness:
- CloudWatch dashboards configured for monitoring [could be dependency on Platform Team for PROD]
- Alerts established for performance/cost anomalies
- Runbooks created for common failure scenarios
- Max application capacity limits set [approved by Platform Team for PROD]
-
Technical Validation:
- Workload successfully operates on S3 storage model
- No performance degradation under concurrent execution
- All downstream dependencies continue to function correctly
- Graviton performance validated (if applicable)
EMR Serverless Migration Benchmark Report
Application Information
- Name: [Application Name]
- Owner: [Team/Individual]
- Environment: [Dev/Test/Prod]
- Migration Timeline: [Planned Timeline]
Baseline Performance (EMR on EC2)
- Configuration: [EMR Version, Instance Types, Count]
- Average Runtime: [Minutes]
- Resource Utilization: [vCPU-hrs, Memory-GB-hrs]
- Cost Per Run: [$X.XX]
EMR Serverless Performance
- Configuration: [Driver Size, Worker Size, Max Capacity]
- Average Runtime: [Minutes]
- Resource Utilization: [vCPU-seconds, GB-seconds]
- Cost Per Run: [$X.XX]
Performance Comparison
- Runtime Delta: [+/- X%]
- Cost Delta: [+/- X%]
- Resource Efficiency: [+/- X%]
Configuration Optimization
- Driver Configuration: [Settings]
- Executor Configuration: [Settings]
- Pre-Initialization: [Yes/No, Capacity]
- Shuffle Optimization: [Settings]
Cost Analysis
- Break-Even Point: [X jobs/day]
- Projected Annual Savings: [$X]
- Cost Control Measures: [Limits, Alerts]
Validation Results
- SLA Compliance: [Pass/Fail]
- Concurrent Execution: [Pass/Fail]
- S3 Performance: [Pass/Fail]
- Downstream Systems: [Pass/Fail]
Approval Process
- Platform Team to set-up monitoring dashboards and review alert configurations for PROD environment
- Platform Team Sign-off: [Name, Date]
- Production Go-Live: [Date]
Critical Governance Control: No migration may proceed to production without formal Platform Team sign-off. This control ensures consistent implementation of organizational standards and prevents uncontrolled proliferation of EMR Serverless applications.
6. Security & Compliance
EMR Serverless security controls must be implemented consistently across all applications to maintain organizational security posture. Best practices are outlined as below:
Identity & Access Management
Begin with a robust identity and access management structure that implements application-specific roles following least-privilege principles. Create distinct roles for development, testing, and production environments, and implement permission boundaries to prevent privilege escalation.
When integrating with AWS Lake Formation, you will need to assess and implement column-level, row-level, and cell-level security policies. This integration provides centralized data access auditing and compliance reporting capabilities, crucial for maintaining governance standards across your analytics environment.
Data Protection
Data protection in EMR Serverless requires a comprehensive approach to encryption. Implement mandatory KMS encryption for all S3 data and use S3 Bucket Keys to optimize encryption costs. Ensure your temporary storage and shuffle operations use encrypted storage, and protect your logs and metrics with organization-managed KMS keys. For secrets management, integrate with AWS Secrets Manager for database credentials, avoiding the storage of credentials in job parameters or environment variables.
Network Security
Your network security configuration should follow established organizational standards. Deploy EMR Serverless applications within approved VPC configurations using required S3 endpoints to prevent public internet data transfer. Implement standardized security groups and maintain comprehensive network traffic monitoring and logging.
Governance Controls
- Mandatory Tagging
- Audit & Logging
- Compliance Validation
7. Release, Cutover and Rollback
Structured Migration Execution:
A disciplined approach to production migration ensures business continuity and minimizes disruption. Begin with thorough staging validation, conducting end-to-end testing with production-like data volumes. Verify your security and compliance controls, and test all integration points with dependent systems.
1. Staging Validation: a. Complete end-to-end testing in staging environment b. Performance validation against production-like data volumes c. Integration testing with all dependent systems d. Security and compliance validation
2. Cutover Execution a. Scheduled during approved change window b. Enhanced monitoring during transition period c. Functional validation of all critical paths d. Performance metrics comparison to baseline e. Data integrity verification f. Downstream system impact assessment
Detailed Rollback Plan
1. Rollback Triggers: (values to be defined by app teams) a. Performance Degradation: > x% increase in job duration b. Cost Overrun: > y% increase in execution cost c. Reliability Issues: > z% increase in failure rate d. Data Quality: Any detected data integrity issues f. Downstream Impact: Disruption to dependent systems
2. Rollback Procedure: Rollback Procedure Template
1. Initiate Rollback Decision
- Confirm rollback trigger has been met
- Obtain approval from designated authority
- Notify all stakeholders of rollback decision
2. Stop New Job Submissions
- Pause orchestration systems (Airflow/Step Functions)
- Hold incoming job requests
- Allow in-progress jobs to complete if possible
3. Reactivate EMR on EC2 Infrastructure
- Verify EC2 clusters are available or launch if needed
- Confirm all dependencies are functional
- Validate security configurations
4. Redirect Workflow
- Update orchestration configuration to use EMR on EC2
- Verify job submission is properly routed
- Confirm first job executes successfully
5. Verification Steps
- Validate job execution metrics
- Confirm data integrity
- Verify downstream system functionality
6. Post-Rollback Analysis
- Document root cause of rollback trigger
- Develop remediation plan
- Schedule re-attempt with fixes implemented
Acceptance Criteria
1. Stability Period: a. 7-14 days of continuous operation b. All SLAs consistently met c. No critical or high-severity incidents d. Cost metrics within projected ranges
2. Formal Acceptance: a. Stakeholder sign-off on successful migration b. Documentation of lessons learned c. Transition to business-as-usual operations d. Closure of migration project
8. Sunsetting EMR on EC2 Clusters
The retirement of your EMR on EC2 infrastructure requires careful planning to ensure a complete transition while preserving historical data. During the first three weeks after your production migration to EMR Serverless, maintain your EMR on EC2 configurations as a safety net.
Create a comprehensive archive of your EMR on EC2 environment by documenting custom AMIs, storing bootstrap scripts, and backing up critical configuration files. Transfer your historical Spark logs and job histories to a designated S3 location for future reference. This preservation strategy supports both audit requirements and provides valuable historical performance data.
Begin your infrastructure decommissioning 30 days after confirming stable EMR Serverless operations. Remove unused security groups, IAM roles, and other associated resources systematically. Document each removal step to maintain a clear audit trail and prevent unintended service disruptions.
9. Service Quota & Noisy Neighbor Governance
Resource Management Framework: EMR Serverless introduces a shared resource model for shared tenant accounts that requires structured governance to prevent resource contention and ensure fair allocation across teams. The quota management and noisy neighbor prevention strategies address this critical challenge in shared serverless environments.
Organizational Quota Structure [handled by Platform Team]
Account-Level Controls:
- Maximum Concurrent vCPUs: Set at the AWS account level (default: 16 vCPUs)
- Service Quotas: Managed through AWS Service Quotas with quarterly reviews
Team-Level Allocations:
- Baseline Quota: Guaranteed minimum resources for each app team
- Burst Allocation: Additional resources available on request
- Reservation System: Pre-allocation for critical processing windows
Application-Level Limits:
- Maximum Capacity: Required for all applications (no exceptions)
- Default Worker Configuration: Standardized sizes based on workload type
Implementation Controls:
- Platform Team to build a custom solution that can keep a track of concurrent vCPUs being requested per application team
- Reject the request if it is greater than pre-configured values and accept only when it is defined within thresholds
- The custom solution for example can include Lambda as a validation engine with Dynamo DB for tracking resource level metrics
Monitoring & Alerting Framework
1. Standard Monitoring Dashboards: a. Resource Utilization: vCPU/memory usage by application team b. Performance Metrics: Job duration, success rates, and SLA compliance c. Cost Analytics: Cost per job, cost trends, and optimization opportunities
2. Alerting Hierarchy: Noisy Neighbor Detection a. Warning Thresholds: 80% quota utilization triggers notification b. Critical Thresholds: >= 90% quota utilization triggers escalation
Real-World Implementation Pattern:
1. Dual Enforcement Approach: a. Proactive validation before application creation (for orchestrated workflows from Airflow) b. Reactive monitoring after application creation (for interactive sessions using EMR Studio)
2. Resource Tracking System: a. Team-level quota allocation and usage tracking b. Application-level resource consumption monitoring c. Centralized configuration management
3. Automated Enforcement: a. Event-driven architecture monitoring EMR Serverless API calls b. Automatic validation against team quotas c. Notification system for quota violations d. Enforcement actions for non-compliant applications
This pattern demonstrates how organizations can effectively manage shared EMR Serverless resources while maintaining team autonomy and preventing resource contention
10. Post-Migration Optimization
Continuous Improvement Framework: Migration to EMR Serverless is not the end goal but the beginning of an optimization journey:
Short-Term Optimization (30 Days)
Your migration to EMR Serverless marks the beginning of an optimization journey. Within the first 30 days after migration, conduct detailed performance analysis of your job runtime patterns. Identify and address performance bottlenecks, optimize resource configurations, and fine-tune your Spark parameters for maximum efficiency.
Focus your cost optimization efforts on analyzing actual versus projected costs. Review your resource allocation patterns and adjust pre-initialization settings based on real-world usage data. Monitor your quota utilization closely, adjusting team quotas based on observed consumption patterns and addressing any resource contention issues that emerge.
Continuous Improvement Process
Implement a continuous improvement process that includes regular assessment of Graviton performance opportunities. Compare x86 versus Graviton performance metrics and develop migration plans for workloads that could benefit from Graviton's price-performance advantages. Pay particular attention to shuffle optimization by analyzing patterns, tuning partition counts, and implementing data skew mitigation strategies.
For shared tenant environments, regularly review your organizational resource allocation strategy. Adjust team quotas based on evolving business priorities and develop predictive models for future resource needs. This proactive approach helps maintain optimal performance while controlling costs.
Conclusion
The transition from EMR on EC2 to EMR Serverless represents a significant opportunity to modernize your analytics infrastructure while reducing operational overhead. By following this governance framework, you can achieve a structured, secure, and efficient migration that sets the foundation for continued optimization and growth. Success in this journey requires careful planning, thorough testing, and ongoing optimization. Remember that AWS Support offers extensive resources to help you navigate this transition.
Call to Action
Take these next steps to begin your EMR Serverless migration journey:
- Evaluate your current EMR on EC2 workloads using the decision framework provided in this article
- Begin benchmarking your priority workloads using the methodology outlined above
- Review the EMR Serverless documentation for detailed technical specifications
- Reach out to AWS Support if you have any questions
Appendix
- EMR Serverless Shuffle Optimized Disks: https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-shuffle-optimized-disks.html
- EMR Serverless Pre-initialized capacity: https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/pre-init-capacity.html
- EMR Serverless Job Resiliency: https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-resiliency.html
About the Author
Ram Achanta is a Senior Technical Account Manager within AWS Enterprise Support, where he specializes in supporting Global Financial Services (GFS) customers. With over eight years of industry experience, Ram helps organizations design and implement highly scalable, resilient, and secure solutions on AWS. He provides strategic guidance on AWS service implementation and works closely with customers on their operational challenges.
Sakshi Arya is a Technical Account Manager in AWS Enterprise Support, bringing nearly five years of comprehensive industry experience. She built her foundation in Analytics domain specializing in Amazon EMR service, handling critical escalations and engagements for Strategic Industries customers. She now supports Global Financial Services (GFS) customers, focusing on optimizing and scaling their data infrastructure while ensuring operational excellence and architectural best practices.
- Language
- English
Relevant content
- asked 2 years ago
- Accepted Answerasked a year ago
AWS OFFICIALUpdated a year ago
AWS OFFICIALUpdated a year ago
AWS OFFICIALUpdated a year ago