I launched the stack for US East (N. Virginia) from the AWS Glue user guide (https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-history.html) to use the yaml template to get a better understanding of creating resources using AWS Cloudformation.
I have been stuck on the CREATE_FAILED STATE for the waitCondition resource.
I read that using the waitcondition Handle to create an EC2 instance is not a best practice and a creation policy is better. I took off the wait handle and edited the wait condition to include a creation policy but even when I reduced the counts to include the 4 resources it was expected to create, it still returned a failed result and rolled back all the resources.
Is there something I'm doing wrong?
Below is the script for the template:
Parameters:
InstanceType:
Type: String
Default: t3.small
AllowedValues:
- t3.micro
- t3.small
- t3.medium
- t3.large
- t3.xlarge
- t3.2xlarge
- m5.large
- m5.xlarge
- m5.2xlarge
- m5.4xlarge
- m5.8xlarge
- m5.12xlarge
- m5.16xlarge
- m5.24xlarge
- r5.large
- r5.xlarge
- r5.2xlarge
- r5.4xlarge
- r5.8xlarge
- r5.12xlarge
- r5.16xlarge
- r5.24xlarge
Description: Instance Type for EC2 instance which hosts Spark history server.
Enter one of [t3.micro/small/medium/large/xlarge/2xlarge,
m5.large/xlarge/2xlarge/4xlarge/8xlarge/12xlarge/16xlarge/24xlarge,
r5.large/xlarge/2xlarge/4xlarge/8xlarge/12xlarge/16xlarge/24xlarge]].
Default is t3.small.
LatestAmiId:
Type: AWS::SSM::Parameter::ValueAWS::EC2::Image::Id
Description: Latest AMI ID of Amazon Linux 2 for Spark history server instance.
You can use the default value.
Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
VpcId:
Type: AWS::EC2::VPC::Id
Description: 'VPC ID for Spark history server instance. You can use a VPC in
your account. Warning: Using default VPC with a default NACL is not
recommended.'
Default: ''
SubnetId:
Type: AWS::EC2::Subnet::Id
Description: Subnet ID for Spark history server instance. You can use any of
subnet in your VPC. You need to have network reachability from your client
to the subnet. If you want to access via Internet, you would need to use a
public subnet which has Internet gateway in the route table.
Default: ''
IpAddressRange:
Type: String
Description: 'IP address range that can be used to view the Spark UI. You should
use a custom value if you want to restrict access from a specific IP
address range. Warning: Using the IP address range of 0.0.0.0/0 would make
Spark UI publicly accessible.'
MinLength: 9
MaxLength: 18
HistoryServerPort:
Type: Number
Description: History Server Port for the Spark UI. You can use the default value.
Default: 18080
MinValue: 1150
MaxValue: 65535
EventLogDir:
Type: String
Description: 'Event Log Directory where Spark event logs are stored from the
Glue job or dev endpoints. You must use s3a:// for the event logs path
scheme (example: s3a://path_to_eventlog).'
Default: s3a://path_to_eventlog
SparkPackageLocation:
Type: String
Description: You can use the default value.
Default: https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz
KeystorePath:
Type: String
Description: SSL/TLS keystore path for HTTPS. If you want to use custom keystore
file, you can specify the S3 path s3://path_to_your_keystore_file here. If
you leave this parameter empty, self-signed certificate based keystore is
used.
KeystorePassword:
Type: String
NoEcho: true
Description: SSL/TLS keystore password for HTTPS. A valid password can contain 6
to 30 characters.
MinLength: 6
MaxLength: 30
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: Spark UI Configuration
Parameters:
- IpAddressRange
- HistoryServerPort
- EventLogDir
- SparkPackageLocation
- KeystorePath
- KeystorePassword
- Label:
default: EC2 Instance Configuration
Parameters:
- InstanceType
- LatestAmiId
- VpcId
- SubnetId
Mappings:
MemoryBasedOnInstanceType:
t3.micro:
SparkDaemonMemory: 512m
t3.small:
SparkDaemonMemory: 1g
t3.medium:
SparkDaemonMemory: 3g
t3.large:
SparkDaemonMemory: 6g
t3.xlarge:
SparkDaemonMemory: 12g
t3.2xlarge:
SparkDaemonMemory: 28g
m5.large:
SparkDaemonMemory: 6g
m5.xlarge:
SparkDaemonMemory: 12g
m5.2xlarge:
SparkDaemonMemory: 28g
m5.4xlarge:
SparkDaemonMemory: 28g
m5.8xlarge:
SparkDaemonMemory: 28g
m5.12xlarge:
SparkDaemonMemory: 28g
m5.16xlarge:
SparkDaemonMemory: 28g
m5.24xlarge:
SparkDaemonMemory: 28g
r5.large:
SparkDaemonMemory: 12g
r5.xlarge:
SparkDaemonMemory: 28g
r5.2xlarge:
SparkDaemonMemory: 28g
r5.4xlarge:
SparkDaemonMemory: 28g
r5.8xlarge:
SparkDaemonMemory: 28g
r5.12xlarge:
SparkDaemonMemory: 28g
r5.16xlarge:
SparkDaemonMemory: 28g
r5.24xlarge:
SparkDaemonMemory: 28g
Resources:
Imds2LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
MetadataOptions:
HttpEndpoint: enabled
HttpTokens: required
HistoryServerInstance:
Type: AWS::EC2::Instance
Properties:
LaunchTemplate:
LaunchTemplateId: !Ref Imds2LaunchTemplate
Version: !GetAtt Imds2LaunchTemplate.LatestVersionNumber
ImageId: !Ref LatestAmiId
InstanceType: !Ref InstanceType
SubnetId: !Ref SubnetId
SecurityGroupIds:
- !Ref InstanceSecurityGroup
IamInstanceProfile: !Ref HistoryServerInstanceProfile
UserData: !Base64
Fn::Sub: |
#!/bin/bash -xe
yum update -y aws-cfn-bootstrap
echo "CA_OVERRIDE=/etc/pki/tls/certs/ca-bundle.crt" >> /etc/environment
export CA_OVERRIDE=/etc/pki/tls/certs/ca-bundle.crt
rpm -Uvh https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
pip3 install requests
/opt/aws/bin/cfn-init -v -s ${AWS::StackName} -r HistoryServerInstance --region ${AWS::Region}
/opt/aws/bin/cfn-signal -e -s ${AWS::StackName} -r HistoryServerInstance --region ${AWS::Region}
Metadata:
AWS::CloudFormation::Init:
configSets:
default:
- cloudwatch_agent_configure
- cloudwatch_agent_restart
- spark_download
- spark_init
- spark_configure
- spark_hs_start
- spark_hs_test
cloudwatch_agent_configure:
files:
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json:
content: !Sub |
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/cfn-init.log",
"log_group_name": "/aws-glue/sparkui_cfn/cfn-init.log"
},
{
"file_path": "/opt/spark/logs/spark-",
"log_group_name": "/aws-glue/sparkui_cfn/spark_history_server.log"
}
]
}
}
}
}
cloudwatch_agent_restart:
commands:
01_stop_service:
command: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a
stop
02_start_service:
command: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a
fetch-config -m ec2 -c
file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
-s
spark_download:
packages:
yum:
java-1.8.0-openjdk: []
maven: []
python3: []
python3-pip: []
sources:
/opt: !Ref SparkPackageLocation
commands:
create-symlink:
command: ln -s /opt/spark- /opt/spark
export:
command: !Sub |
echo "export JAVA_HOME=/usr/lib/jvm/jre" | sudo tee -a /etc/profile.d/jdk.sh
echo "export SPARK_HOME=/opt/spark" | sudo tee -a /etc/profile.d/spark.sh
export JAVA_HOME=/usr/lib/jvm/jre
export SPARK_HOME=/opt/spark
download-pom-xml:
command: curl -o /tmp/pom.xml
https://aws-glue-sparkui-prod-us-east-1.s3.amazonaws.com/public/mvn/glue-4_0/pom.xml
download-setup-py:
command: curl -o /tmp/setup.py
https://aws-glue-sparkui-prod-us-east-1.s3.amazonaws.com/public/misc/glue-4_0/setup.py
download-systemd-file:
command: curl -o /usr/lib/systemd/system/spark-history-server.service
https://aws-glue-sparkui-prod-us-east-1.s3.amazonaws.com/public/misc/spark-history-server.service
spark_init:
commands:
download-mvn-dependencies:
command: cd /tmp; mvn dependency:copy-dependencies
-DoutputDirectory=/opt/spark/jars/
install-boto:
command: pip3 install boto --user; pip3 install boto3 --user
files:
/opt/spark/conf/spark-defaults.conf:
content: !Sub |
spark.eventLog.enabled true
spark.history.fs.logDirectory ${EventLogDir}
spark.history.ui.port 0
spark.ssl.historyServer.enabled true
spark.ssl.historyServer.port ${HistoryServerPort}
spark.ssl.historyServer.keyStorePassword ${KeystorePassword}
group: ec2-user
mode: '000644'
owner: ec2-user
/opt/spark/conf/spark-env.sh:
content: !Sub
- |
export SPARK_DAEMON_MEMORY=${SparkDaemonMemoryConfig}
export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem"
- SparkDaemonMemoryConfig: !FindInMap
- MemoryBasedOnInstanceType
- !Ref InstanceType
- SparkDaemonMemory
group: ec2-user
mode: '000644'
owner: ec2-user
spark_configure:
commands:
create-symlink:
command: ln -s /usr/lib/systemd/system/spark-history-server.service
/etc/systemd/system/multi-user.target.wants/
enable-spark-hs:
command: systemctl enable spark-history-server
configure-keystore:
command: !Sub |
python3 /tmp/setup.py --keystore "${KeystorePath}" --keystorepw "${KeystorePassword}" > /tmp/setup_py.log 2>&1
spark_hs_start:
commands:
start_spark_hs_server:
command: systemctl start spark-history-server
spark_hs_test:
commands:
check-spark-hs-server:
command: !Sub |
curl --retry 60 --retry-delay 10 --retry-max-time 600 --retry-connrefused https://localhost:${HistoryServerPort} --insecure;
/opt/aws/bin/cfn-signal -e $? "${WaitHandle}"
WaitHandle:
Type: AWS::CloudFormation::WaitConditionHandle
WaitCondition:
Type: AWS::CloudFormation::WaitCondition
DependsOn: HistoryServerInstance
Properties:
Handle: !Ref WaitHandle
Timeout: 1200
InstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Enable HTTPS access
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: !Ref HistoryServerPort
ToPort: !Ref HistoryServerPort
CidrIp: !Ref IpAddressRange
HistoryServerRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: root
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- kms:Decrypt
Resource: '*'
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
HistoryServerInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref HistoryServerRole
Outputs:
SparkUiPublicUrl:
Description: The Public URL of Spark UI
Value: !Join
- ''
- - https://
- !GetAtt HistoryServerInstance.PublicDnsName
- ':'
- !Ref HistoryServerPort
SparkUiPrivateUrl:
Description: The Private URL of Spark UI
Value: !Join
- ''
- - https://
- !GetAtt HistoryServerInstance.PrivateDnsName
- ':'
- !Ref HistoryServerPort
CloudWatchLogsCfnInit:
Description: CloudWatch Logs Console URL for cfn-init.log in History Server Instance
Value: !Join
- ''
- - https://console.aws.amazon.com/cloudwatch/home?region=
- !Ref AWS::Region
- '#logEventViewer:group=/aws-glue/sparkui_cfn/cfn-init.log;stream='
- !Ref HistoryServerInstance
CloudWatchLogsSparkHistoryServer:
Description: CloudWatch Logs Console URL for spark history server logs in
History Server Instance
Value: !Join
- ''
- - https://console.aws.amazon.com/cloudwatch/home?region=
- !Ref AWS::Region
- '#logEventViewer:group=/aws-glue/sparkui_cfn/spark_history_server.log;stream='
- !Ref HistoryServerInstance
I made the modifications as suggested: added the creation policy to the EC2 instance, took off WaitHandle and WaitCondition. The instance never got created and the stack got rolled back. Is the template given on the documentation guide outdated?