AWS Network Firewall and FQDN

1

Problem statement: We are migrating some of the on-premise workloads (VMs and DBs) to AWS. AWS network is AD aware but we are using existing on-premise DNS server (instead of Route 53). As part of the migration, we need to build some Firewall polices and Security Group rules for network protection/restriction. I wanted to know if there is any option to use FQDN based rules in AWS network firewall for protocols other than HTTP/HTTPS. This is so that we use FQDN instead of hardcoded IP address of the on-premise servers to avoid any changes in FW policies in future if the on-prem server IPs change. As mentioned above, AWS Network Firewall doesn't seem to support this. Even if it did, I am not sure how AWS Network Firewall can leverage the on-premise DNS server for IP resolution.

Ask: I wanted to know if there is any alternative solution available for this type of requirement. Basically I am trying to understand the best possible solution which can be built to manage all these different FW rule/network restriction requirements leveraging likes of AWS Network FW (stateless/stateful rules), Security Groups, NACL etc such that the final solution is as per recommended standards.

Given this is a very specific requirement please let me know if it is not clear and if you need any further details.

Any advise which you can provide would be greatly appreciated.

3 Answers
2
Accepted Answer

This is a common ask and the short answer is "no". But read on if you're interested as to why that is.

HTTP and HTTPS are interesting protocols because part of the protocol is (in a couple of different ways - Host header or in the certificate) to include the hostname of the intended destination in the protocol. Therefore, if you can capture that information (easy in HTTP; harder in HTTPS and will get even more difficult as TLS 1.3 gains adoption because it's designed to hide more of the certificate information) then you can (in the firewall) make a decision based on that information. Excellent stuff.

Note that if this information wasn't available the only thing the firewall has to go on is the destination IP address. Mapping that back to a FQDN is time consuming (reverse DNS lookups take time); may not work (not all sites have up to date or even accurate reverse DNS lookups); and doesn't work for services that host multiple sites on a single IP address. And that is the case for pretty much everything except for HTTP and HTTPS. There will be exceptions but go with that for now.

The mapping between the FQDN and the IP address is done at the host where the connection originates. And usually that process (DNS) is completely decoupled from the network filtering process (the firewall). The firewall has zero knowledge of what the host did to resolve the FQDN to the IP address. Absolutely none.

Could you build a system where your DNS resolver communicates the DNS query, the resolved IP and the originating host IP to the firewall? Yes, you could. It still wouldn't be perfect - if a host looked up two DNS records which both go to the same IP address and then connected to both of them how would the firewall know which session to allow or drop? That's an edge case but the real-world implications make this process much harder. And things like DNSSEC and DNS-over-HTTPS will make it nearly impossible.

Assuming the resolver can find that information it needs to communicate to the firewall before the originating host tries to connect. That's not too difficult in a small network and it's even easier if the DNS resolver and the firewall are a single device at the edge. But in large networks it simply doesn't scale. Now you need the DNS resolver to communicate to a fleet of (perhaps) distributed firewalls (because you don't know which firewall instance is going to receive the session).

And: It could even be seconds or minutes before the host tries to connect - what if that information has timed out? How much memory do you need to reserve on the firewalls to track everything?

In short, it's not something that can easily be done at scale at there are many ways it can fail.

To answer your question at the end (in long form again): Be careful trying to lock down your network so that only device X can talk to device Y for a specific set of protocols. While you can do that you will end up in a situation where your business, developers or anyone else trying to do something will need to wait for you (or your team) to change the network in order to for them to do their job. Don't let "security" be a blocker to the business.

Sure, set up security groups so that only internal (to your network) hosts can connect to certain instances. That makes sense. But don't write NACLs and firewall rules which are so specific such that when a host changes IP address (this happens all the time!) that you need to spend hours troubleshooting and then get an emergency change control approval so that you can fix things and everyone can go back to work. Ask me how I learned this lesson... (or not - you can guess).

I realise I'm preaching now - apologies. But this is a huge topic as it delves into the risk the business and security teams are willing to tolerate versus the speed at which they wish to carry on working.

Well worth engaging with your local AWS account team - they can bring in Solutions Architects who are specialists in networking, security, compliance, etc. to work out what the "best" solution is for you.

profile pictureAWS
EXPERT
answered 2 years ago
profile picture
EXPERT
reviewed 23 days ago
  • "don't write NACLs and firewall rules which are so specific such that when a host changes IP address (this happens all the time!) that you need to spend hours troubleshooting" Makes sense. I can (and do already) see this happening with the uptick in remote working. Would a suggested method be to use a VPN and filter traffic that way?

  • No, I wouldn't do that as you're adding more complexity, points of failure and bottlencks. I'd suggest not offloading application-level security problems to the network. Design authentication, authorisation and encryption into the application and then provide (relatively) broad controls at the network. It's called "Zero Trust" - the applications shouldn't trust the network; and when you do that all of a sudden it doesn't matter where each of the components are. There doesn't have to be an "outside" or "inside". Lofty goal. Difficult to achieve. But delivers far more agility and flexibility.

0

Thanks Brettski@AWS, it took me a while read all this but you have explained this so well and in so much detail. Thanks for that. Based on all the challenges which you have highlighted, it looks to me that even the commercial NVAs (Palo alto etc.) would face the same challenges wrt. leveraging DNS resolution. Can you confirm this? In other words, is there any commercial product available which can help with this type of implementation given AWS Network firewall clearly can not?

I completely agree with the approach of not locking down network. That was one of the reason I wanted to explore this FQDN based option so we need not hardcode IPs in the policies.

AWSuser
answered 2 years ago
  • Definitely any third-party appliance that does this is going to have the same challenges. At small scale (think home router) it isn't really a big deal - it's (mostly) easy to do. Once you get to "I need redundancy" it's much harder. When you get to "I need multiple ingress and egress points" it's significantly harder. At AWS customer scale it's an enormous challenge. As with all things AWS, I'd never say never - but it's not an easy thing to deliver without impacting scale and performance.

0

Well, DiscrimiNAT Firewall (third-party appliance on AWS marketplace) doesn't seem to have this problem: https://aws.amazon.com/marketplace/pp/prodview-gdrdl5m67w6vg

Autoscales up and down with AWS GWLB as well.

answered 6 months ago
  • I can't speak to that solution specifically but looking at the Marketplace offering it seems to be a single instance that is deployed. As with the same type of solutions you might deploy on a small home or small office network, doing all the work on a single device makes everything a lot simpler. As per my comments above, when the systems are distributed (which they are in AWS) and operate at scale (which they do in AWS) it's a lot harder problem to solve.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions