From dc83d712b14805ccb03101c655a0f377c4dfe6f5 Mon Sep 17 00:00:00 2001 From: Donald Sharp Date: Tue, 19 May 2015 18:04:12 -0700 Subject: [PATCH] When a peer that is Established goes down, it is moved into the Clearing state to facilitate clearing of the routes received from the peer - remove from the RIB, reselect best path, update/delete from Zebra and to other peers etc. At the end of this, a Clearing_Completed event is generated to the FSM which will allow the peer to move out of Clearing to Idle. The issue in the code is that there is a possibility of multiple Clearing Completed events being generated for a peer, one per AFI/SAFI. Upon the first such event, the peer would move to Idle. If other events happened (e.g., new connection got established) before the last Clearing_Completed event is received, bad things can happen. Fix to ensure only one Clearing_Completed event is generated. --- bgpd/bgp_fsm.c | 18 +++++++++++++++++- bgpd/bgp_route.c | 32 ++++++++++---------------------- 2 files changed, 27 insertions(+), 23 deletions(-) diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c index 683d8da31a..84710c0553 100644 --- a/bgpd/bgp_fsm.c +++ b/bgpd/bgp_fsm.c @@ -911,7 +911,23 @@ bgp_fsm_change_status (struct peer *peer, int status) * (and must do so before actually changing into Deleted.. */ if (status >= Clearing) - bgp_clear_route_all (peer); + { + bgp_clear_route_all (peer); + + /* If no route was queued for the clear-node processing, generate the + * completion event here. This is needed because if there are no routes + * to trigger the background clear-node thread, the event won't get + * generated and the peer would be stuck in Clearing. Note that this + * event is for the peer and helps the peer transition out of Clearing + * state; it should not be generated per (AFI,SAFI). The event is + * directly posted here without calling clear_node_complete() as we + * shouldn't do an extra unlock. This event will get processed after + * the state change that happens below, so peer will be in Clearing + * (or Deleted). + */ + if (!peer->clear_node_queue->thread) + BGP_EVENT_ADD (peer, Clearing_Completed); + } /* Preserve old status and change into new status. */ peer->ostatus = peer->status; diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c index 47e3c72e74..b29d6ea884 100644 --- a/bgpd/bgp_route.c +++ b/bgpd/bgp_route.c @@ -3968,8 +3968,13 @@ bgp_clear_route (struct peer *peer, afi_t afi, safi_t safi, * on the process_main queue. Fast-flapping could cause that queue * to grow and grow. */ + + /* lock peer in assumption that clear-node-queue will get nodes; if so, + * the unlock will happen upon work-queue completion; other wise, the + * unlock happens at the end of this function. + */ if (!peer->clear_node_queue->thread) - peer_lock (peer); /* bgp_clear_node_complete */ + peer_lock (peer); switch (purpose) { @@ -3996,28 +4001,11 @@ bgp_clear_route (struct peer *peer, afi_t afi, safi_t safi, assert (0); break; } - - /* If no routes were cleared, nothing was added to workqueue, the - * completion function won't be run by workqueue code - call it here. - * XXX: Actually, this assumption doesn't hold, see - * bgp_clear_route_table(), we queue all non-empty nodes. - * - * Additionally, there is a presumption in FSM that clearing is only - * really needed if peer state is Established - peers in - * pre-Established states shouldn't have any route-update state - * associated with them (in or out). - * - * We still can get here in pre-Established though, through - * peer_delete -> bgp_fsm_change_status, so this is a useful sanity - * check to ensure the assumption above holds. - * - * At some future point, this check could be move to the top of the - * function, and do a quick early-return when state is - * pre-Established, avoiding above list and table scans. Once we're - * sure it is safe.. - */ + + /* unlock if no nodes got added to the clear-node-queue. */ if (!peer->clear_node_queue->thread) - bgp_clear_node_complete (peer->clear_node_queue); + peer_unlock (peer); + } void -- 2.39.5