내용으로 건너뛰기

How to troubleshoot LACP Issues on Direct Connect LAG

5분 분량
콘텐츠 수준: 중급
1

Learn how to troubleshoot issues that may occur when using LACP (Link Aggregation Control Protocol) with Direct Connect.

Overview

This article provides a structured approach to diagnose and resolve LACP issues on AWS Direct Connect LAG.


Resolution

What is LACP?

LACP (Link Aggregation Control Protocol) is a subcomponent of the IEEE 802.3ad standard, a discovery protocol that allows multiple Ethernet interfaces to be grouped together to form a single link layer interface. In AWS Direct Connect, you can create a LAG (Link Aggregation Group), which is a logical interface that aggregates multiple dedicated connections at a single endpoint and treats them as a single managed connection.


AWS-Side Diagnosis

Before checking vendor equipment, narrow down the problem scope using AWS Console, CLI, and CloudWatch.

Check LAG and Connection State

AWS Console:

  1. Direct Connect Console → LAGs
  2. Verify LAG status and each member connection's state
  3. Check "Minimum Links" configuration
  4. Verify number of "Available" connections meets minimum requirement

AWS CLI:

aws directconnect describe-lags --lag-id dxlag-00000 aws directconnect describe-connections --connection-id dxcon-00000

CloudWatch Metrics:

  • ConnectionState — Monitor connection status (0=DOWN, 1=UP). Expected: Value should be 1

Check Optical Signal Levels

Verify optical power levels to rule out physical layer issues before investigating LACP.

CloudWatch Metrics:

  • ConnectionLightLevelRx — Receive signal strength (dBm)
  • ConnectionLightLevelTx — Transmit signal strength (dBm)

Expected: Optical power levels should be within the acceptable range (-14.4 to 2.50 dBm for 1G and 10G connections). If out of range, the issue is physical — not LACP.

Check AWS Health Dashboard

AWS Direct Connect connections can go down due to planned or emergency maintenance.

Action:

  • Check the Events section of the AWS Health Dashboard
  • Look for ongoing or recently completed maintenance affecting your Direct Connect connection

Note: During maintenance, BGP connections may transition to idle state, which can last from minutes to hours.


Common LACP Failure Patterns

Use the AWS-side diagnosis results above to match your issue to the patterns below:

Pattern 1: Entire LAG is Down

Possible causes:

  • Both sides set to LACP Passive (neither initiates negotiation)
  • CGW(Customer Gateway) side configured as static "on" mode instead of LACP dynamic mode
  • All member links physically down (check optical levels)
  • minimumLinks threshold not met

Pattern 2: Some Member Links Down, Others Up

Possible causes:

  • Specific link's physical issue (fiber, transceiver, port)
  • Speed mismatch on individual member (all connections must have same bandwidth)
  • Vendor-side port-channel member configuration inconsistency

Pattern 3: LAG Flaps Intermittently

Possible causes:

  • LACP timer mismatch — AWS uses Fast (1-second); if a CGW uses Slow (30-second), detection is delayed
  • Optical signal degradation (marginal dBm levels)
  • Upstream maintenance events

Pattern 4: LAG is Up but Active Links Below Minimum

Possible causes:

  • Active member count dropped below minimumLinks setting → entire LAG goes down even if some links are physically up
  • Review minimumLinks value relative to actual available connections

LACP Configuration Requirements for AWS Direct Connect

Verify that CGW-side configuration meets the following requirements:

LACP Mode

  • AWS side is always Active
  • CGW side must be Active or Passive (dynamic LACP)
  • Both sides Passive → LAG will never come up
  • Static mode ("on") → AWS requires LACP protocol negotiation

LACP Timer

  • AWS uses Fast mode (1-second interval)
  • Recommendation: Set CGW side to Fast mode for faster detection and recovery during link failures
  • Mismatch (AWS Fast / CGW Slow) may cause delayed failover detection

Member Link Requirements

  • All connections in a LAG must have the same bandwidth (mixing 1G and 10G is not supported)
  • All connections must terminate at the same AWS Direct Connect endpoint

Minimum Links

  • If the number of active links falls below the configured minimum-links value, the entire LAG goes down
  • Review this setting when adding or removing connections from a LAG

CGW-Side Verification (Vendor Commands)

Note: Specific command syntax for each vendor may vary depending on the OS version, so please refer to the vendor's official documentation.

After narrowing down the issue using AWS-side diagnosis above, use the following vendor commands to verify CGW router configuration.

Verify Interface and LAG Operational Status

Junos OS:

show interfaces terse | match lag-name show interfaces ae0

Cisco IOS:

show etherchannel summary

Verify LACP Activity Mode and Partner State

Junos OS:

show lacp interfaces interface-name

Cisco IOS:

show lacp neighbor show etherchannel detail

Check that:

  • Actor (CGW) mode is Active or Passive
  • Partner (AWS) mode shows Active
  • Partner System ID is consistent across all member links

Verify LACP Timer Configuration

Junos OS:

show lacp interfaces interface-name

Cisco IOS:

show lacp internal show lacp neighbor detail

Check that periodic timer matches AWS side (Fast / Short).

Check Optical Signal Levels (CGW Side)

Junos OS:

show interfaces diagnostics optics interface-name | grep dBm | except thre

Cisco IOS:

show interfaces transceiver

Related Information

AWS
전문가
게시됨 3달 전302회 조회
댓글 없음

관련 콘텐츠