- Newest
- Most votes
- Most comments
Hi , perhaps this document would provide more clarity - https://docs.aws.amazon.com/efs/latest/ug/how-it-works.html#how-it-works-conceptual , especially the part "We recommend that you access the file system from a mount target within the same Availability Zone for performance and cost reasons." . In addition, cross-AZ mounts ( EC2 in one AZ and mount target in another ) would also reduce availability . If any one of those AZs go down, your application availability will be impacted. That's the reason EC2 and mount target should be in same AZ ( static) and any changes in the network settings of the mount target would make the mount unresponsive on EC2. For any zonal failures, EC2 instances and Mount Targets on another AZ should be able to pick up the additional load.
The challenge here is the interaction between DNS names, IP addresses and applications - in this case, the application is the NFS client.
The intention behind using a DNS name rather than an IP address for a resource (and this is regardless of application) is that the DNS lookup can return multiple IP addresses. The application can then choose one (normally at random, but it might have some mechanism for using something "closer" to itself - or maybe on the local network; most of the time this doesn't happen though) and that IP address is what it connects to. If the IP address doesn't respond then it can choose another from the list that was returned in the first place.
Note that this choice and the retries (in the event of a failure) might be done at an operating system level rather than within the application. But either way, it works pretty much the same.
But what happens when the application connects to an IP that works; but later on that IP stops working? The application has to use one of the other IP addresses; it might have them cached or it might do a DNS lookup again. That's not really the issue - the challenge is, how long does it take for the application to figure out that the IP address it is communicating with is not responding? And in that case, does it do another DNS lookup to try another IP address; or does it just retry the existing IP address forever? At what point does it give up?
In this case, the NFS client might eventually figure out that the original IP address isn't responding and try another. But it also might not. From this distance, it's not possible to say because I don't know what NFS client is being used; and even if I did it's more a question for the developers/designers of that client. The timeout might even be adjustable - again, a question for the developers of the client.
You might find that rather than a reboot it would be possible to reset just the NFS client by sending a signal/interrupt of some sort. But again, you'd need to reliably detect that the endpoint wasn't responding.
Thanks to both of you for taking the time to give me your thoughts. This efs is shared across ec2's in an ASG. The instances have Apache2 running so I think I'll move the health.html file for the health check form the ami and put it on the efs, This should mitigate against a ec2 that isn't fully functional residing in the ASG.
Relevant content
- Accepted Answerasked 6 months ago
- asked 3 years ago
- AWS OFFICIALUpdated 21 days ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 5 months ago
What DNS hostname is your mount using?
The dns name is fs-0d2<obfuscated>5b.efs.eu-west-1.amazonaws.com