Questions tagged with Amazon EC2
Content language: English
Sort by most recent
Autoscaling scale in policy working
I have an autoscaling group that works with a target-tracking policy and scales out after my instance hits 75% CPU. But I wanna know how much will it take to scale in or terminate the instance made by the autoscaling group right now it's 25 min which is a lot and can I set a custom time for my instance to be deleted during scale in the process? Thanks
How can I work around spontaneous nvml mismatch errors in AWS ECS gpu image?
We're running g4dn.xlarges in a few ECS clusters for some ML services, and use the AWS-provided GPU-optimized ECS AMI (https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#Images:visibility=public-images;imageId=ami-07dd70259efc9d59b). This morning at around 7-8am PST (12/7/2022), newly-provisioned container instances stopped being able to register with our ECS clusters. After some poking around on the boxes and reading /var/log/ecs/ecs-init.log, it turned out that we were getting errors in nvml that prevented the ECS init routine from completing: ``` [ERROR] Nvidia GPU Manager: setup failed: error initializing nvidia nvml: nvml: Driver/library version mismatch ``` This is the same AMI as some older instances in the cluster that started up fine. We noticed the issue simultaneously across 4 different clusters. Manually killing and restart nvidia components on individual hosts resolved the mismatch and allowed ECS init to complete (and the instances to become available for task allocation): ``` [ec2-user@- ~]$ lsmod | grep nvidia nvidia_drm 61440 0 nvidia_modeset 1200128 1 nvidia_drm nvidia_uvm 1142784 0 nvidia 35459072 2 nvidia_uvm,nvidia_modeset drm_kms_helper 184320 1 nvidia_drm drm 421888 4 drm_kms_helper,nvidia,nvidia_drm i2c_core 77824 3 drm_kms_helper,nvidia,drm [ec2-user@- ~]$ sudo rmmod nvidia_uvm [ec2-user@- ~]$ sudo rmmod nvidia_drm [ec2-user@- ~]$ sudo rmmod nvidia_modeset [ec2-user@- ~]$ sudo rmmod nvidia [ec2-user@- ~]$ nvidia-smi ``` This seems a bit bonkers, as it's a regression in the absence of a new AMI or any changes to our application or AWS resources. What causes this spontaneous mismatch and how can we work around it in an automated fashion?
Medium EC2 Instance Keeps going down
Hi, my Medium EC2 Instance Keeps going down... The server does not pass its status check and the website being hosted can no longer load and only times out. Doesn't make sense, only fixes once I get in there and manually reboot it. Why does the server keep going down? Is there a technical fault with the hardware or software?
Elastic IP Addresses disappeared (EC2 Instances)
Hi, So I've had a few Elastic IP addresses for years and associate them with specific EC2 Instances, as required, in a specific Region. They have now disappeared & I do not release them (100% positive). I tried allocating a new IP address to test, but the Scope of this new public IP address is VPC only (I cannot edit/change). Any thoughts?
Best practice sending SMS with EC2 PHP without the AWS PHP SDK
Hello! We would like to use text messages SMS to our clients for certain functions of our website. We are using EC2 with Apache and like to send an SMS with PHP without using the AWS PHP SDK. We cannot find out how to send commands with PHP directly to AWS SNS or to AWS Pinpoint. The telephone numbers come from our database so using 'topics' would seem not usable. Any help is appreciated!
amazon-ssm-agent ubuntu patch management
Hi, When patching of ubuntu 22.04 will ba available via amazon-ssm-agent ? I have lot of EC2 instances with linux ubuntu 22.04 which is on the market some time, moreover AWS is supporting this version of ubuntu as official AMI, but amazon-ssm-agent does't support ubuntu fully.
EBS initialize cost horribly long time
I restored a 4TiB EBS volume(provisioned IOPS 16000, throughput 500MB/s) from a snapshot, and copying the data in the block to the instance store of the ec2 (I need the data to be in the instance store, and I think this can also be regarded as a initialize process.) There's around 2.5TiB data to be copied, and copying has lasted over a week, while the disk's latency is still too high and the throughput is just 3~5MiB/s: ![Enter image description here](/media/postImages/original/IMptbq3YqlSPqqmgZrmifsoA) I know that a new restored EBS volume from snapshot is lazy loading but I also read from this blog post that there's a background process keeps doing this: https://aws.amazon.com/blogs/storage/addressing-i-o-latency-when-restoring-amazon-ebs-volumes-from-ebs-snapshots/ > The replicated volume loads data in the background so that you can begin using it immediately. If you access data that hasn’t been loaded yet, the volume immediately downloads the requested data from Amazon S3, and then continues loading the rest of the volume’s data in the background May I know how this background process works exactly? Since my initializing process(the copying process) is ongoing as well, will this background job runs as well?
AWS Free Tier usage limit - hit 85% limit after a month
Hello, I am extremely new to the AWS world. I just finished a personal project where I used S3, EC2 and RDS. Though I haven't done much on the website. I deployed my website on Oct 04 and Got a notice in Nov 03, sayin that I have used up 85% of my free tier. I also turned off the project on Nov 3rd as well. I also created the project/website utilzing django and REACT. It would be greatly appreciated if some one can give me some info or direct me to the correct sources to understand this problem bettter and why/how did i already use up 85% of the free tier in a barely a month. Thank you in advance whoever replies! It is greatly appreicated!
How to connect to AWS EC2 instance if lost SSH key pair
Hi, Lost SSH private key to access production web server. After google result, I tried to follow two method with the following link at https://aws.amazon.com/premiumsupport/knowledge-center/user-data-replace-key-pair-ec2/ But both method 1 and 2 not work at all. Could you please help to figure out? Method 1: Enter user-data. After config completed, using SSH and server refused the public key Method 2: Use AWS Systems Manager. Unable to find runbook called AWSSupport-ResetAccess in Automation runbook Roy