From: Donald Sharp Date: Wed, 30 Aug 2023 11:25:06 +0000 (-0400) Subject: bgpd: Add peers back to peer hash when peer_xfer_conn fails X-Git-Tag: docker/9.0.1~7^2 X-Git-Url: https://git.puffer.fish/?a=commitdiff_plain;h=refs%2Fpull%2F14309%2Fhead;p=mirror%2Ffrr.git bgpd: Add peers back to peer hash when peer_xfer_conn fails It was noticed that occassionally peering failed in a testbed upon investigation it was found that the peer was not in the peer hash and we saw these failure messages: Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1) Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn root@doca-hbn-service-bf3-s06-1-ipmi:/var/log/hbn/frr# vtysh -c 'show bgp peerhash' | grep 2001:cafe:1ead:4::4 root@doca-hbn-service-bf3-s06-1-ipmi:/var/log/hbn/frr# Upon looking at the code the peer_xfer_conn function can fail and the bgp_establish code will then return before adding the peer back to the peerhash. This is only part of the failure. The peer also appears to be in a state where it is no longer initiating connection attempts but that will be another commited fix when we figure that one out. Signed-off-by: Donald Sharp (cherry picked from commit 6f8c927b03c454aa309b84cefccc4faa31e0c03f) --- diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c index 09b35bc7e7..00aefafb3d 100644 --- a/bgpd/bgp_fsm.c +++ b/bgpd/bgp_fsm.c @@ -2116,6 +2116,17 @@ static enum bgp_fsm_state_progress bgp_establish(struct peer *peer) peer = peer_xfer_conn(peer); if (!peer) { flog_err(EC_BGP_CONNECT, "%%Neighbor failed in xfer_conn"); + + /* + * A failure of peer_xfer_conn but not putting the peers + * back in the hash ends up with a situation where incoming + * connections are rejected, as that the peer is not found + * when a lookup is done + */ + (void)hash_get(peer->bgp->peerhash, peer, hash_alloc_intern); + if (other) + (void)hash_get(other->bgp->peerhash, other, + hash_alloc_intern); return BGP_FSM_FAILURE; }