Amazon VPC Lattice Troubleshooting Part 1 - Client to Amazon VPC Lattice Communication
This is a series of 3 articles providing guidance in troubleshooting connectivity issues with Amazon VPC Lattice.
Note: This post is part 1 in a series - Part 2 - Troubleshooting VPC Lattice Authentication can be found here and Part 3 - Troubleshooting Target Connectivity can be found here
Table of Contents
Before You Begin
We’ll assume you have created a VPC Lattice Service Network, a VPC Lattice Service, a Target Group and you have a Target connected to the Target Group.
You will also need an EC2 host, which will already be associated with a subnet and a VPC. You will need to be able to get a shell on the EC2 host, so ensure the EC2 host has the AmazonSSMManagedInstanceCore
policy attached, as well as the following inline policy (to allow us to use the CLI to inspect our configuration)
In terms of logging, you’ll want to enable VPC Flow logs for both the client and the target side VPCs. You’ll also want to enable VPC Lattice access logs for your service network and service.
In our configuration, we have the following details for the VPC Lattice Service:
- Custom Domain:
webserver.example
- Lattice Domain:
webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws
One or more of these may not be functioning correctly, so we’ll walk through the steps required to identify where the fault lies, and how to remediate it.
Establish connection to VPC Lattice Service
When troubleshooting connectivity to a VPC Lattice Service, follow these troubleshooting steps in order:
- Verify DNS resolution
- Ensure for any custom domains that the zone containing the custom domain is associated with your VPC
- Verifying IP network connectivity between the source and the destination
- Ensure that your VPC has been associated with the VPC Lattice Service Network
- Verifying TCP network connectivity between the source and the destination
- Use curl to check for connection refused, or a HTTP error code, vs a connection timed out error
- Validating security groups and NACLs on either the source or the destination permit the required access
- Use VPC Flow Logs to troubleshoot security group and NACL related issues
1. Verifying DNS resolution
Identify the domain name of your VPC Lattice Service in the console:
From the command line of your EC2 host, run the following command:
sh-5.2$ dig webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws
; <<>> DiG 9.18.33 <<>> webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42144
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws. IN A
;; ANSWER SECTION:
webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws. 60 IN A 169.254.171.1
;; Query time: 0 msec
;; SERVER: 10.0.0.2#53(10.0.0.2) (UDP)
;; WHEN: Sat Mar 15 06:27:46 UTC 2025
;; MSG SIZE rcvd: 119
This shows that there is a globally resolvable name for the VPC Lattice service we’ve configured.
We can also try to resolve the custom domain name.
sh-5.2$ dig webserver.example
; <<>> DiG 9.18.33 <<>> webserver.example
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25627
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;webserver.example. IN A
;; AUTHORITY SECTION:
. 1356 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2025031500 1800 900 604800 86400
;; Query time: 99 msec
;; SERVER: 10.0.0.2#53(10.0.0.2) (UDP)
;; WHEN: Sat Mar 15 06:28:29 UTC 2025
;; MSG SIZE rcvd: 121
sh-5.2$ curl webserver.example
curl: (6) Could not resolve host: webserver.example
This shows that we’re unable to locate the custom domain name in DNS.
For VPC Lattice services to be available via a custom domain name, the custom domain name needs to be registered in Route53 as per https://docs.aws.amazon.com/vpc-lattice/latest/ug/service-custom-domain-name.html#dns-associate-custom
In our scenario, our client is in an entirely seperate VPC (client-vpc) to the VPC we created the service in (service-vpc). You need to ensure that:
- You have a hosted zone that contains the domain name you are mapping to your VPC lattice service
- That hosted zone is associated with the VPC your client is located in
We can look in Route53 to ensure we have the correct VPCs associated with our hosted zone:
In our scenario, client-vpc (ending in b8f) is not associated with the hosted zone. Associating this zone with client-vpc and testing DNS resolution again results in the following:
sh-5.2$ dig webserver.example
; <<>> DiG 9.18.33 <<>> webserver.example
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51412
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;webserver.example. IN A
;; ANSWER SECTION:
webserver.example. 300 IN CNAME webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws.
webserver-00412bb2881b9a1b9.7d67968.vpc-lattice-svcs.ap-southeast-2.on.aws. 60 IN A 169.254.171.1
;; Query time: 0 msec
;; SERVER: 10.0.0.2#53(10.0.0.2) (UDP)
;; WHEN: Sat Mar 15 07:11:28 UTC 2025
;; MSG SIZE rcvd: 150
Associating a VPC with a Route53 zone can take some time before names are visible via DNS lookup.
2. Verifying IP connectivity
If we try to connect via curl to the VPC Lattice Service, we see the following:
sh-5.2$ curl https://webserver.example -v
* Host webserver.example:443 was resolved.
* IPv6: fd00:ec2:80::a9fe:ab01
* IPv4: 169.254.171.1
* Trying 169.254.171.1:443...
* Trying [fd00:ec2:80::a9fe:ab01]:443...
* Immediate connect fail for fd00:ec2:80::a9fe:ab01: Network is unreachable
* connect to 169.254.171.1 port 443 from 10.0.154.64 port 40324 failed: Connection timed out
* Failed to connect to webserver.example port 443 after 131481 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to webserver.example port 443 after 131481 ms: Couldn't connect to server
We see that we are trying to connect to 169.254.171.1 on port 443, but this connection is failing with a ‘Connection timed out’ We also see that we attempt to connect to an IPv6 address, and it fails immediately. This is due to our VPC not having IPv6 enabled, and can be ignored.
Connection timed out errors are usually symptomatic of one of two things:
- Inability to route to the destination host
- Something preventing the traffic flow between the source and destination
We’ll focus on the first item for now. Identify the subnet that your EC2 client is associated with.
Drilling into that subnet, we can take a look at the route table:
A VPC that has been associated with a VPC Lattice Service network will automatically place visible routes in the routing table towards the VPCLattice destination. Here is an example of one correctly configured:
The indication then is that the VPC our client is in has not been associated with the VPC Lattice Service Network. We can validate this in the VPC Lattice console - our client is in VPC vpc-0957cde841630ab8f which is not listed.
Associating the VPC with the VPC Lattice Service network will now populate the routes in the routing table (waiting for the VPC Association to move to the ‘Active’ state):
Going back to our SSM session on the client machine, we can retry our original curl command:
sh-5.2$ curl -v webserver.example
* Host webserver.example:80 was resolved.
* IPv6: fd00:ec2:80::a9fe:ab01
* IPv4: 169.254.171.1
* Trying 169.254.171.1:80...
* Connected to webserver.example (169.254.171.1) port 80
> GET / HTTP/1.1
> Host: webserver.example
> User-Agent: curl/8.5.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection
curl: (56) Recv failure: Connection reset by peer
We see DNS is resolving and we’re no longer getting a connection timed out. This means we’re making an IP connection all the way to the VPC Lattice Service Network, but we’re still having problems with traffic flow.
3. Troubleshooting TCP connectivity
In the above, we received a ‘connection reset by peer’. This is indicative of the remote service refusing our connection, and means the traffic is making it’s way to the remote host. We can check the port we have our VPC Lattice Listener available on, to make sure we’re matching:
You can see our listener is on TCP port 443, but we made our connection with curl using the default, which is HTTP (port 80). Lets try again with https to explicitly set the port to 443:
sh-5.2$ curl -v https://webserver.example
* Host webserver.example:443 was resolved.
* IPv6: fd00:ec2:80::a9fe:ab01
* IPv4: 169.254.171.1
* Trying 169.254.171.1:443...
* Connected to webserver.example (169.254.171.1) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: self-signed certificate in certificate chain
* Closing connection
curl: (60) SSL certificate problem: self-signed certificate in certificate chain
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
This is better - we can see TLS being established, so we know layer 3 (IP) and layer 4 (TCP) are working now. If we tell curl to ignore the certificate with the --insecure flag , we can see the full connection being made to the target.
sh-5.2$ curl --insecure -v https://webserver.example
* Host webserver.example:443 was resolved.
* IPv6: fd00:ec2:80::a9fe:ab01
* IPv4: 169.254.171.1
* Trying 169.254.171.1:443...
* Connected to webserver.example (169.254.171.1) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / RSASSA-PSS
* ALPN: server accepted h2
* Server certificate:
* subject: CN=webserver.example
* start date: Mar 14 23:17:50 2025 GMT
* expire date: Apr 14 00:17:50 2026 GMT
* issuer: C=AU; O=LatticeSolution; OU=LatticeSolution; dnQualifier=LatticeSolution; ST=LatticeSolution; CN=LatticeSolution; serialNumber=12345
* SSL certificate verify result: self-signed certificate in certificate chain (19), continuing anyway.
* Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
* Certificate level 1: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://webserver.example/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: webserver.example]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.5.0]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: webserver.example
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/2 403
< date: Sat, 15 Mar 2025 21:22:00 GMT
< server: Apache/2.4.62 (Amazon Linux)
< last-modified: Mon, 11 Jun 2007 18:53:14 GMT
< etag: "2d-432a5e4a73a80"
< accept-ranges: bytes
< content-length: 45
< content-type: text/html; charset=UTF-8
<
<html><body><h1>It works!</h1></body></html>
* Connection #0 to host webserver.example left intact
4. Troubleshooting Security Groups and NACLs
For client to VPC Lattice connectivity, there are two places where security groups and NACLs can restrict traffic flow:
- Outbound from the client
- Inbound to the VPC Lattice Service network.
For both of these scenarios, we’ll use VPC Flow Logs to identify where the traffic is being restricted.
Typically a security group or NACL issue will look a lot like the routing issue above. We’ll see ‘connection timed out’ issues when trying to establish a TCP connection to the destination.
sh-5.2$ curl --insecure -v https://webserver.example
* Host webserver.example:443 was resolved.
* IPv6: fd00:ec2:80::a9fe:ab01
* IPv4: 169.254.171.1
* Trying 169.254.171.1:443...
* Trying [fd00:ec2:80::a9fe:ab01]:443...
* Immediate connect fail for fd00:ec2:80::a9fe:ab01: Network is unreachable
* connect to 169.254.171.1 port 443 from 10.0.154.64 port 53472 failed: Connection timed out
* Failed to connect to webserver.example port 443 after 133432 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to webserver.example port 443 after 133432 ms: Couldn't connect to server
To troubleshoot these, we need to locate the eni for our client instance:
Find your VPC Flow logs in CloudWatch, and then find the flow log that maches this eni. You’re looking for traffic that is addressed to the IP address that shows up in DNS resolution, in this case it’s 169.254.171.1
2 1234567890 eni-05e275679dec1cf0e 10.0.154.64 169.254.171.1 42628 443 6 5 300 1742074263 1742074299 REJECT OK
The REJECT is indicative of a security group or NACL related issue, so verify that your security groups on the client are permitting outbound traffic to the VPC Lattice address space. You can limit the address space of the destination by using the predefined VPC Lattice prefix list - com.amazonaws.ap-southeast-2.vpc-lattice If you are using NACLs, you will also need to verify your NACL permits the traffic flow.
Security groups can also be in place on the service network association on the VPC Lattice side, so you will have to look in both locations if you are still seeing ‘connection timed out’ issues. NACLs are applied at the subnet level, so you’ll have to check the NACLs associated with the subnet for your client if you’re using them.
Summary
In this article we've covered how to troubleshoot a number of different client to Amazon VPC Lattice related connectivity issues. In Part 2 - Troubleshooting VPC Lattice Authentication of this series, we'll explain how to troubleshoot connection issues with VPC Lattice itself, such as authentication policy and SigV4/SigV4A signing related failures.
Relevant content
- asked 2 years agolg...
- AWS OFFICIALUpdated a day ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago