Skip to content

Commit

Permalink
chassis-packet: Update arp_update script for FAILED and STALE check (s…
Browse files Browse the repository at this point in the history
…onic-net#16311)

chassis-packet: Update arp_update script for FAILED and STALE check (sonic-net#16311)

1. Fixing an issue with FAILED entry resolution retry.
Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows:
2603:10e2:400:3::1 dev PortChannel19 router FAILED
While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4):
2603:10e2:400:3::1 dev PortChannel19 FAILED

The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down.

2. Refreshing STALE entries to make sure the far end is reachable.
STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.
  • Loading branch information
anamehra authored Sep 1, 2023
1 parent 566b5df commit f6897bb
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions files/scripts/arp_update
Original file line number Diff line number Diff line change
Expand Up @@ -25,29 +25,35 @@ while /bin/true; do
for i in ${!STATIC_ROUTE_NEXTHOPS[@]}; do
nexthop="${STATIC_ROUTE_NEXTHOPS[i]}"
if [[ $nexthop == *"."* ]]; then
neigh_state=( $(ip -4 neigh show | grep -w $nexthop | tr -s ' ' | cut -d ' ' -f 3,4) )
neigh_state=$(ip -4 neigh show | grep -w $nexthop | tr -s ' ')
ping_prefix=ping
elif [[ $nexthop == *":"* ]] ; then
neigh_state=( $(ip -6 neigh show | grep -w $nexthop | tr -s ' ' | cut -d ' ' -f 3,4) )
neigh_state=$(ip -6 neigh show | grep -w $nexthop | tr -s ' ')
ping_prefix=ping6
fi
if [[ -z "${neigh_state}" ]] || [[ "${neigh_state[1]}" == "INCOMPLETE" ]] || [[ "${neigh_state[1]}" == "FAILED" ]]; then
# Check if there is an INCOMPLETE, FAILED, or STALE entry and try to resolve it again.
# STALE entries may be present if there is no traffic on a path. A far-end down event may not
# clear the STALE entry. Refresh the STALE entry to clear the table.
if [[ -z "${neigh_state}" ]] || [[ -n $(echo ${neigh_state} | grep 'INCOMPLETE\|FAILED\|STALE') ]]; then
interface="${STATIC_ROUTE_IFNAMES[i]}"
if [[ -z "$interface" ]]; then
# should never be here, handling just in case
logger "ERR: arp_update: missing interface entry for static route $nexthop"
interface=${neigh_state[0]}
continue
fi
intf_up=$(ip link show $interface | grep "state UP")
if [[ -n "$intf_up" ]]; then
pingcmd="timeout 0.2 $ping_prefix -I ${interface} -n -q -i 0 -c 1 -W 1 $nexthop >/dev/null"
eval $pingcmd
logger "arp_update: static route nexthop not resolved, pinging $nexthop on ${neigh_state[0]}"
# STALE entries may appear more often, not logging to prevent periodic syslogs
if [[ -z $(echo ${neigh_state} | grep 'STALE') ]]; then
logger "arp_update: static route nexthop not resolved ($neigh_state), pinging $nexthop on $interface"
fi
fi
fi
done

sleep 300
sleep 150
continue
fi
# find L3 interfaces which are UP, send ipv6 multicast pings
Expand Down

0 comments on commit f6897bb

Please sign in to comment.