]> git.puffer.fish Git - mirror/frr.git/commitdiff
bgpd: Add peers back to peer hash when peer_xfer_conn fails 14309/head
authorDonald Sharp <sharpd@nvidia.com>
Wed, 30 Aug 2023 11:25:06 +0000 (07:25 -0400)
committerMergify <37929162+mergify[bot]@users.noreply.github.com>
Thu, 31 Aug 2023 01:13:04 +0000 (01:13 +0000)
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:

Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for  peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn

root@doca-hbn-service-bf3-s06-1-ipmi:/var/log/hbn/frr# vtysh -c 'show bgp peerhash' | grep 2001:cafe:1ead:4::4
root@doca-hbn-service-bf3-s06-1-ipmi:/var/log/hbn/frr#

Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.

This is only part of the failure.  The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 6f8c927b03c454aa309b84cefccc4faa31e0c03f)

bgpd/bgp_fsm.c

index 09b35bc7e7f92eda53abd7a6805876cc6e56a6e0..00aefafb3d993c02f568473599abfe7e20b003fc 100644 (file)
@@ -2116,6 +2116,17 @@ static enum bgp_fsm_state_progress bgp_establish(struct peer *peer)
        peer = peer_xfer_conn(peer);
        if (!peer) {
                flog_err(EC_BGP_CONNECT, "%%Neighbor failed in xfer_conn");
+
+               /*
+                * A failure of peer_xfer_conn but not putting the peers
+                * back in the hash ends up with a situation where incoming
+                * connections are rejected, as that the peer is not found
+                * when a lookup is done
+                */
+               (void)hash_get(peer->bgp->peerhash, peer, hash_alloc_intern);
+               if (other)
+                       (void)hash_get(other->bgp->peerhash, other,
+                                      hash_alloc_intern);
                return BGP_FSM_FAILURE;
        }