Skip to content

Need help with adding SSH port, missing `sshd.service` and rebooting EC2 instance.

0

What I want to do
I'm trying to create multiple EC2 instances using Terraform, and then add a different SSH port to all of them because port 22 traffic might be blocked in different networks. I'm able to add different port to /etc/ssh/sshd_config of the instances using the user_data attribute of Terraform EC2 resource, but activating those ports is being problematic.

Can't apply SSH changes
I need to use Ubuntu instances, where I noticed that it doesn't have any sshd.service despite being able to receive SSH connections and tab-completion showing ssh.service as an option. If I connect to an instance through AWS Console and try running sudo systemctl restart sshd.service, it fails with the error message saying that the service isn't available.

Previously working method
I found that the port changes get applied when I reboot the instance. So, I installed AWS CLI v2 and created a null_resource for executing aws ec2 reboot-instance using the Terraform code. The following code snippet was working and giving the expected outcome, until yesterday.

# EC2 INSTANCE MODULE
module "ec2_module" {
	source     = "./Modules/EC2"
	depends_on = [ aws_key_pair.ec2_ssh_key, module.sg_module, tls_private_key.ansible_ssh_key ]
	for_each = var.ec2_instances
	instance_name  = "${var.project_prefix}-${each.value.name}"
	instance_type  = each.value.type
	instance_sg    = [ module.sg_module.security_group_id ]
	root_vol_size  = each.value.root_size
	ssh_public_key = aws_key_pair.ec2_ssh_key.key_name
	
	user_data = <<-EOF
	            #!/usr/bin/env bash
	            echo -e "Port 22\nPort ${var.external_access_ports["SSH_Alt"]}" | sudo tee -a /etc/ssh/sshd_config
	            echo "${tls_private_key.ansible_ssh_key.public_key_openssh}" | sudo tee -a /home/${var.ec2_username}/.ssh/authorized_keys
	            EOF
}

# Reboot EC2 instances once after creation
resource "null_resource" "reboot_ec2_instances" {
	for_each = module.ec2_module
	triggers = {
		instance_id = each.value.instance_id
		# instance_state = each.value.instance_state
	}
	provisioner "local-exec" {
		# aws ec2 wait instance-running --instance-ids ${each.value.instance_id} --region ${var.infra_region}
		command = "aws ec2 reboot-instances --instance-ids ${each.value.instance_id} --region ${var.infra_region}"
	}
	depends_on = [ module.ec2_module ]
}

New/Current problems
The snippet above would successfully create the instances, add the SSH port and then reboot the instance that lead to the new port being applied and active. But for some reason, it's stopped working since yesterday. Moreover, the null_resource seems to be making the instances take much longer than normal for being available to connect through SSH, meaning connecting through AWS Console is also not available until several minutes pass despite completing Status Checks earlier.

Other approaches that didn't work:

  • Changing the trigger to wait for instance state.
  • Running aws ec2 wait instance-running ... before reboot command (commented in snippet).
  • Any kind of rebooting command like systemctl reboot, shutdown -r, etc. in user_data doesn't apply the change.
  • Tried CloudInit YAML on suggestion from AmazonQ, but it doesn't even seem to add the port.
  • Same behaviour between Snap and direct download versions.

I just want to apply the SSH port changes. Rebooting seemed to do the trick, but now my method of rebooting them don't seem to work. What is a reliable method to reboot or apply the port changes?

2 Answers
1
Accepted Answer

Hello.

What version of Ubuntu are you using?
In Ubuntu versions 24-based, the ssh process uses something called ssh.socket.
Therefore, even if you rewrite the settings in "/etc/ssh/sshd_config", the port number will not change.
https://www.reddit.com/r/Ubuntu/comments/1gybsi7/changing_ssh_port_does_not_work_ubuntu_2410/

You need to either disable ssh.socket as follows, or rewrite the settings in "/usr/lib/systemd/system/ssh.socket" and perform a daemon reload to overwrite the ssh.socket settings.

  - systemctl stop ssh.socket
  - systemctl disable ssh.socket

Incidentally, the following sample Terraform code is code that I was able to confirm works on my AWS account using an Ubuntu 24.04 AMI in the Tokyo region.

locals {
  cloud_init_config = <<-YAML
  #cloud-config

  runcmd:
    - sed -i 's/ListenStream=0.0.0.0:22/ListenStream=0.0.0.0:10022/' /usr/lib/systemd/system/ssh.socket
    - sed -i 's/ListenStream=\\[::\\]:22/ListenStream=[::]:10022/' /usr/lib/systemd/system/ssh.socket
    - systemctl daemon-reexec
    - systemctl daemon-reload
    - systemctl restart ssh.socket
    - sed -i 's/^#\?Port 22/Port 10022/' /etc/ssh/sshd_config
    - systemctl restart ssh.service
  YAML
}

resource "aws_instance" "example" {
  ami                    = "ami-0f8faa29480e7e6de"
  instance_type          = "t3.micro"
  key_name               = aws_key_pair.example.key_name
  vpc_security_group_ids = [aws_security_group.ec2_sg.id]

  user_data = base64encode(local.cloud_init_config)
}
EXPERT
answered a month ago
EXPERT
reviewed a month ago
  • Thank you for sharing your approach. I'm using AMIs ami-019715e0d74f695be and ami-05d2d839d4f73aafb. I tested and found a working method for my project. It seems that the following user_data is enough for what I'm trying to do--

    	user_data = <<-EOF
    	            #!/usr/bin/env bash
    	            echo -e "Port 22\nPort ${var.external_access_ports["SSH_Alt"]}" | sudo tee -a /etc/ssh/sshd_config
    	            echo "${tls_private_key.ansible_ssh_key.public_key_openssh}" | sudo tee -a /home/${var.ec2_username}/.ssh/authorized_keys
    	            systemctl daemon-reexec
    	            systemctl restart ssh.socket ssh.service
    	            EOF

    I didn't need to edit the systemd files for the services.

    However, I also discovered the following through my testing--

    • The systemctl commands need to be run in the following order: daemon-reexec, restart ssh.socket and restart ssh.service. This order works for both Terraform code and manually executing them after connecting to the instance. They can be shortened as below--
      systemctl daemon-reexec
      systemctl restart ssh.socket ssh.service
    • sudo isn't necessary to run the commands, but required for adding the ports to the file using tee.
    • Maintaining consistent indentation is mandatory, else the user_data commands don't execute. Code editor might cause problems here with their auto-indentation.
0

Based on your situation, here are some recommendations for reliably rebooting your EC2 instances and applying SSH configuration changes:

Understanding the Reboot Command

The AWS CLI reboot-instances command is asynchronous - it only queues a request to reboot the specified instances rather than waiting for the reboot to complete. This means your Terraform provisioner may be completing before the instance has actually rebooted and come back online, which could explain the timing issues you're experiencing.

Recommended Approaches

  1. Add Wait Conditions: After issuing the reboot command, you should wait for the instance to complete its reboot cycle. Consider adding a wait command that checks for the instance to return to a running state and pass status checks before proceeding.

  2. Handle the sshd Service Properly: For Ubuntu instances, the SSH service might be managed differently. Instead of trying to restart sshd.service, you could:

  • Use sudo systemctl restart ssh.service (without the 'd')
  • Or include a reboot command at the end of your user_data script using shutdown -r +1 to schedule a reboot after the configuration changes are written
  1. Improve Your user_data Script: Consider making your user_data script more robust by:
  • Checking if the port already exists in the config before adding it
  • Using sed or proper configuration management to modify sshd_config
  • Adding logging to track what's happening during instance initialization
  • Ensuring the SSH configuration is valid before attempting any service restart
  1. Alternative to null_resource: Instead of using a separate null_resource to reboot, you could handle the reboot within the user_data itself after making the configuration changes, though you mentioned this didn't work - this might be due to timing of when user_data executes.

The key issue is likely the asynchronous nature of the reboot operation combined with timing dependencies in your Terraform workflow. Adding proper wait conditions and status checks should help ensure the instances are fully operational before Terraform considers the provisioning complete.
Sources
reboot-instances — AWS CLI 2.27.41 Command Reference

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.