Salta al contenuto

Switching from EC2-Launch v1 to EC2-Launch v2 has broken my website.

0

I've automated the updating of an IIS/DotNet website, thus:

  • zip file of latest version of website is uploaded to S3
  • S3 notification triggers lambda
  • lambda launches a "staging instance" from a "base ami"
  • staging instance UserData does everything needed to make the website work -- installs prerequisites, unzips the zip file, configures IIS...
  • Last thing the staging instance does is run the appropriate (version-dependent) EC2-Launch command to get the instance ready to be imaged, then invokes another lambda that...
  • images the staging instance
  • Does a CloudFormation update, changing the parameter that says what image a load balancer should launch.

For v1, the "appropriate command" is "InitializeInstance.ps1 --Schedule". For v2, it's "EC2Launch.exe sysprep".

The only thing that has changed on the "base image" is the version of EC2 launch. The only thing that has changed in the "staging instance's" UserData is how EC2Launch is used.

One more maybe relevant datum. I need to set the computer name, because the website needs to determine which environment it's running in -- and therefore which database server to connect to. It does this based on the computer name, which contains the environment as a substring. With V1, I was just doing it with the PowerShell commandlet "Rename-Computer". This wasn't working with v2. So now I'm putting the following in the config file before running sysprep:

      - task: setHostName
        inputs:
          reboot: true
          hostName:myNewName

That does seem to get the hostname changed. But...

With V2, attempts to connect to the website fail. Connection Reset.

Looking around on the server instance, there is very little obviously amiss. IIS is running. The website itself is running and configured right. The app pool is running. I tried to get a clue by looking at \inetpub\logs, but the really curious thing was that there were no logs. Really? There were always logs before -- even when things were broken. I guess it's breaking now before it even gets around to writing logs. A clue, maybe, but not much of one.

The only other thing I could find was that, every time I try to get to the broken website from a client, the Windows Security Logs show one of more logs of failed event id 5061. (These events don't happen on a working system.) I've looked into event id 5061 enough to find out that it has something to do with an encryption key. But I sure am having trouble figuring out why the change of EC2Launch version would have such a result, or what to do about it. Any ideas?

Security Log with event id 5061

  • What happens when you try and connect to the website? What errors? What happens when going direct vs cloudfront?

  • Despite the different in username, I am the same person who posted originally.

    I'm still stuck, but I've made a few discoveries.

    First, I've taken the changing of the system name out of the equation. The web server code now uses other means to determine which environment it's in -- which database server to connect to. This helped me to answer two questions I had:

    1. Is it the changing of the system name that's causing the trouble? I didn't think so, but it would be nice to eliminate it as a possibility. The answer is: no, that was not the trouble. I'm still having the problem.

    TBC

    1. EC2Launch/sysprep couldn't change system name without doing a reboot.  Without changing the system name, I don't have to reboot.  That make it easier to do these checkpoints:

    Again, to be continued...

    • Do everything necessary to beat the instance into shape -- except the EC2Launch/sysprep.
    • Hit the instances website.  Works fine.  Cool.  So far, so good.
    • Run "EC2Launch.exe sysprep", without the --reboot.  Okay, well, it did clobber my RDP, but the instance was still running.  And I could still, after EC2Launch was done running, hit the website.  Still, so far, so good.
    • Power off the instance, image it, launch from image.  Website broken.  Same audit failures.
  • And, one more test. Everything exactly like in my last comment -- from the top, from scratch -- but for the last bullet. After EC2Launch is done running, and I've confirmed tbat the web server is still answering, I just did a plain old reboot. And after the instance -- same instance -- comes back up, the web server doesn't answer. Same errors in the Security log.

    So, it isn't the running of EC2Launch -- in and of itself -- that breaks it. Apparently, whatever breaks it happens either during shutdown or during the subsequent startup.

    But I still haven't a clue what.

posta un anno fa747 visualizzazioni
4 Risposte
0

The issues you're experiencing after switching from EC2Launch v1 to EC2Launch v2 are likely related to the differences in how these versions handle instance configuration and sysprep processes. Here are some potential causes and solutions to consider:

  1. Configuration migration: When migrating from EC2Launch v1 to v2, not all configurations are automatically transferred. The migration tool doesn't detect scheduled tasks linked to EC2Launch v1 scripts, so you may need to manually configure these in EC2Launch v2. Review your agent-config.yml file and ensure all necessary tasks are properly set up.

  2. Sysprep process: The sysprep process in EC2Launch v2 might be handling some aspects differently than v1. This could affect how your instance is prepared for imaging. Make sure you're using the correct sysprep command for v2: "EC2Launch.exe sysprep".

  3. Computer name setting: Your approach to setting the computer name has changed. While using the setHostName task in the config file is correct for EC2Launch v2, ensure that this change isn't causing any unexpected side effects. The reboot after setting the hostname might be interfering with other setup processes.

  4. User data execution timing: In EC2Launch v2, the timing of user data execution has changed. It now runs during the PostReady stage, and the exact timing depends on the format of your user data (YAML 1.0, YAML 1.1, or XML). Ensure your user data is in the correct format and running at the appropriate time.

  5. IIS configuration: The lack of IIS logs suggests that IIS might not be fully initialized or configured correctly after the sysprep process. Review your IIS setup process in the user data script to ensure it's compatible with the EC2Launch v2 workflow.

  6. Encryption key issues: The Event ID 5061 relates to cryptographic operations, specifically the sealing of keys. This could indicate that there's an issue with how system or user keys are being handled after the sysprep process. This might be related to how EC2Launch v2 prepares the instance for imaging.

To troubleshoot:

  1. Review and update your agent-config.yml file to ensure all necessary tasks are properly configured for EC2Launch v2.
  2. Check the EC2Launch v2 logs (located in C:\ProgramData\Amazon\EC2Launch\log) for any error messages or unexpected behavior.
  3. Verify that your user data script is running at the correct time and completing all necessary setup steps.
  4. Consider temporarily disabling the setHostName task to see if it's causing any conflicts.
  5. Ensure that your IIS setup process is compatible with the EC2Launch v2 workflow.
  6. Investigate the cryptographic issues indicated by Event ID 5061, possibly by reviewing system security policies and how they're applied during the sysprep process.

If these steps don't resolve the issue, you may need to engage with AWS support for more in-depth troubleshooting, as there could be complex interactions between EC2Launch v2, your custom setup process, and Windows system configurations.
Sources
Migrate to EC2Launch v2 for Windows instances - Amazon Elastic Compute Cloud
Troubleshoot issues with the EC2Launch v2 agent - Amazon Elastic Compute Cloud
Use the EC2Launch v2 agent to perform tasks during EC2 Windows instance launch - Amazon Elastic Compute Cloud
Windows Service administration for EC2Launch v2 and EC2Config agents - Amazon Elastic Compute Cloud

con risposta un anno fa
  • I indicated in my original post that I wasn't sure whether or not changing the machine name had anything to do with it. Now I am sure. It doesn't.

    I tried removing my edit of the config file, so the machine name would not be changed. And it's failing in exactly the same way. Still getting the failed event id 5061's in the Security log. Still getting no inetpub\Logs. It doesn't matter that the website code can't figure out which DB to connect to based on the system name -- it looks like the website code just isn't running.

  • Anyone out there? Did the comments I posted (as @philmftcom) make any sense? Is there any hope of getting this resolved?

  • One more data point...

    When I was using v1, I did the InitializeInstance.ps1 -Schedule right at the start, before install IIS or any of the rest of it. And that worked.

    When I switched to v2, and I was mucking with the host name, and found out that v2's "EC2Launch sysprep" wouldn't get the host name change unless you told the agent-config.yml file to reboot, I had no choice but to move sysprep to the end, after installing the web server, in order that the web server installation got done before the sysprep forced a reboot before the other stuff got done.

    Well, a comment or two ago, I mentioned that I worked out another way to tell the web server code what environment it was running in, so the change of the host name was not necessary. So I took that bit -- including the reboot -- out of the agent-config.yml. But that, by itself, didn't fix my problem. However, at that point, I was still doing sysprep as the last thing, after installing everything else.

    But then, when I found out that certutil was report one set of key values just before I ran sysprep, and a different set of key values just after running sysprep, it occurred to me to wonder if moving sysprep back to the beginning (because now I can) would help the problem. I mean, if IIS hasn't even installed the keys yet when sysprep is run, it's not going to be able to muck with those keys, right?

    So I gave it a go.

    (To be continued...)

    • check the keys. There are none.
    • run sysprep.
    • check the keys again. Still none. As expected.
    • do all the rest, including installing IIS, which creates the keys.
    • check the keys. They exist. Note their values.
    • power off, image, launch from image
    • on newly-launched image, check the keys. They have changed. D'oh! Surely, that means the website won't work? But just to make sure..
    • try hitting the website. Yup. Same problem. Connection reset. Same audit failure.
0

I have been searching through every config file I can find, in all of \ProgramData\Amazon\EC2Launch, \Program Files\Amason\EC2Launch, and \Windows\SYstem32\Sysprep, hoping to find a line that obviously says to muck with the keys, in the hope that I could edit/remove that line to make it not do so. I haven't found it.

con risposta 10 mesi fa
  • Oops. I pressed the wrong button. This "answer" was supposed to be a comment.

  • AFAIK, the only reason I even need to run EC2Launch is so that, after launching from the new image, EC2 will be able to tell me the administrative password of the newly launched instance. Without running EC2Launch, I don't get that.

    I would be perfectly happy if I could work out a way to do only that part -- set the password -- without doing anything else -- especially this cert mucking it's doing. But after looking long and hard at the EC2Launch docs, and a lot of trial-n-error, I haven't found anything that gets me the pw without messing up the certs.

0

Just in case anyone runs into this, looking for an answer to a similar problem, I'll post where I ended up with this.

I'm just not running sysprep any more. (me: "Doctor, it hurts when I do this." Doctor: "Well, don't do that.")

Of course, that comes with implications. But, AFAIK, the only one that I care about is that instances launched from the non-sysprep-ed image don't end up with their admin user password being retrievable from EC2. Well, I'm (grudgingly) okay with that. Because:

  • I can still SSM to the instance and get a running-as-admin PowerShell.
  • I'm actually getting kinda handy with PowerShell only, no desktop. I was raised on *nix -- learned to embrace the command line. Even when *nix desktops started becoming ubiquitous, I mostly used them as a way to get multiple command lines. And, though I'm no MS fan, I must say PowerShell is pretty darned good. (Certainly a far cry better than skinkin' cmd.)
  • Even when I "can't" get something done from PS, I know how to set a user PW from PS, so I can SSM in, set the PW, then I can RDP again. (I've tried it. It works.)
  • If I ever get tired enough of having to set the admin PW by hand now and then, I'm sure I can work out automating it. Maybe keep the pw in an encrypted ParameterStore, or some such. Have the instance set the admin pw from that in its UserData.

So I'm limping along again just by not using EC2Launch at all.

p.s.,

  • If anyone out there knows of some other Very Important Reason why I have to run EC2Launch, I would greatly appreciate letting me know.
  • If anyone knows a more direct way to accomplish only the setting of the admin password and making it available to EC2 -- without doing whatever-it-was that was breaking the website -- I would greatly appreciate hearing about that, too.
con risposta 8 mesi fa
0

Well, no. I can't get by without running EC2Launch. Because the other thing it does, besides making the get-pw-from-ec2 thing happen, is that it makes instances launched from the image run their UserData. No EC2Launch, no UserData run. And I need UserData to be run. So, unless I can work out how to run EC2Launch V2 without it killing the website, I'm stuck running EC2Launch V1 indefinitely. So, again, help would be appreciated. But the evidence is mounting that there's no one out there. So, I'm not holding my breath.

con risposta 7 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.