How to troubleshoot LACP Issues on Direct Connect LAG
Learn how to troubleshoot issues that may occur when using LACP (Link Aggregation Control Protocol) with Direct Connect.
Overview
This article provides a structured approach to diagnose and resolve LACP issues on AWS Direct Connect LAG.
Resolution
What is LACP?
LACP (Link Aggregation Control Protocol) is a subcomponent of the IEEE 802.3ad standard, a discovery protocol that allows multiple Ethernet interfaces to be grouped together to form a single link layer interface. In AWS Direct Connect, you can create a LAG (Link Aggregation Group), which is a logical interface that aggregates multiple dedicated connections at a single endpoint and treats them as a single managed connection.
AWS-Side Diagnosis
Before checking vendor equipment, narrow down the problem scope using AWS Console, CLI, and CloudWatch.
Check LAG and Connection State
AWS Console:
- Direct Connect Console → LAGs
- Verify LAG status and each member connection's state
- Check "Minimum Links" configuration
- Verify number of "Available" connections meets minimum requirement
AWS CLI:
aws directconnect describe-lags --lag-id dxlag-00000 aws directconnect describe-connections --connection-id dxcon-00000
CloudWatch Metrics:
- ConnectionState — Monitor connection status (0=DOWN, 1=UP). Expected: Value should be 1
Check Optical Signal Levels
Verify optical power levels to rule out physical layer issues before investigating LACP.
CloudWatch Metrics:
- ConnectionLightLevelRx — Receive signal strength (dBm)
- ConnectionLightLevelTx — Transmit signal strength (dBm)
Expected: Optical power levels should be within the acceptable range (-14.4 to 2.50 dBm for 1G and 10G connections). If out of range, the issue is physical — not LACP.
Check AWS Health Dashboard
AWS Direct Connect connections can go down due to planned or emergency maintenance.
Action:
- Check the Events section of the AWS Health Dashboard
- Look for ongoing or recently completed maintenance affecting your Direct Connect connection
Note: During maintenance, BGP connections may transition to idle state, which can last from minutes to hours.
Common LACP Failure Patterns
Use the AWS-side diagnosis results above to match your issue to the patterns below:
Pattern 1: Entire LAG is Down
Possible causes:
- Both sides set to LACP Passive (neither initiates negotiation)
- CGW(Customer Gateway) side configured as static "on" mode instead of LACP dynamic mode
- All member links physically down (check optical levels)
- minimumLinks threshold not met
Pattern 2: Some Member Links Down, Others Up
Possible causes:
- Specific link's physical issue (fiber, transceiver, port)
- Speed mismatch on individual member (all connections must have same bandwidth)
- Vendor-side port-channel member configuration inconsistency
Pattern 3: LAG Flaps Intermittently
Possible causes:
- LACP timer mismatch — AWS uses Fast (1-second); if a CGW uses Slow (30-second), detection is delayed
- Optical signal degradation (marginal dBm levels)
- Upstream maintenance events
Pattern 4: LAG is Up but Active Links Below Minimum
Possible causes:
- Active member count dropped below minimumLinks setting → entire LAG goes down even if some links are physically up
- Review minimumLinks value relative to actual available connections
LACP Configuration Requirements for AWS Direct Connect
Verify that CGW-side configuration meets the following requirements:
LACP Mode
- AWS side is always Active
- CGW side must be Active or Passive (dynamic LACP)
- Both sides Passive → LAG will never come up
- Static mode ("on") → AWS requires LACP protocol negotiation
LACP Timer
- AWS uses Fast mode (1-second interval)
- Recommendation: Set CGW side to Fast mode for faster detection and recovery during link failures
- Mismatch (AWS Fast / CGW Slow) may cause delayed failover detection
Member Link Requirements
- All connections in a LAG must have the same bandwidth (mixing 1G and 10G is not supported)
- All connections must terminate at the same AWS Direct Connect endpoint
Minimum Links
- If the number of active links falls below the configured minimum-links value, the entire LAG goes down
- Review this setting when adding or removing connections from a LAG
CGW-Side Verification (Vendor Commands)
Note: Specific command syntax for each vendor may vary depending on the OS version, so please refer to the vendor's official documentation.
After narrowing down the issue using AWS-side diagnosis above, use the following vendor commands to verify CGW router configuration.
Verify Interface and LAG Operational Status
Junos OS:
show interfaces terse | match lag-name show interfaces ae0
Cisco IOS:
show etherchannel summary
Verify LACP Activity Mode and Partner State
Junos OS:
show lacp interfaces interface-name
Cisco IOS:
show lacp neighbor show etherchannel detail
Check that:
- Actor (CGW) mode is Active or Passive
- Partner (AWS) mode shows Active
- Partner System ID is consistent across all member links
Verify LACP Timer Configuration
Junos OS:
show lacp interfaces interface-name
Cisco IOS:
show lacp internal show lacp neighbor detail
Check that periodic timer matches AWS side (Fast / Short).
Check Optical Signal Levels (CGW Side)
Junos OS:
show interfaces diagnostics optics interface-name | grep dBm | except thre
Cisco IOS:
show interfaces transceiver
Related Information
- 언어
- English
