From 7717b1837cf844f62f36c1ca4ee74b61735232a8 Mon Sep 17 00:00:00 2001 From: Daniel Walton Date: Tue, 17 Nov 2015 02:09:57 +0000 Subject: [PATCH] BGP: crash in update_subgroup_merge() Signed-off-by: Daniel Walton Reviewed-by: Donald Sharp Reviewed-by: Vivek Venkatraman Ticket: CM-8191 On my hard node I have a route to 10.0.0.0/22 via eth0, I then learn 10.0.0.8/32 from peer 10.0.0.8 with a nexthop of 10.0.0.8: superm-redxp-05# show ip bgp 10.0.0.8/32 BGP routing table entry for 10.0.0.8/32 Paths: (1 available, no best path) Not advertised to any peer 80 10.0.0.8 from r8(swp6) (10.0.0.8) Origin IGP, metric 0, localpref 100, valid, external AddPath ID: RX 0, TX 9 Last update: Thu Nov 12 14:00:00 2015 superm-redxp-05# I do a lookup for the nexthop and see that 10.0.0.8 is reachable via my eth0 10.0.0.22 so I select a bestpath and install the route. At this point my route to 10.0.0.8 is a /32 that resolves via itself, NHT sees that this is illegal and flags the nexthop as inaccessible. superm-redxp-05# show ip bgp 10.0.0.8/32 BGP routing table entry for 10.0.0.8/32 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to non peer-group peers: r6(swp4) r7(swp5) r8(swp6) r2(10.1.2.2) r3(10.1.3.2) r4(10.1.4.2) 80 10.0.0.8 (inaccessible) from r8(swp6) (10.0.0.8) Origin IGP, metric 0, localpref 100, invalid, external, bestpath-from-AS 80, best AddPath ID: RX 0, TX 9 Last update: Thu Nov 12 14:00:00 2015 superm-redxp-05# at which point we withdraw the route, things churn, we relearn it and go through the whole process over and over again. We end up advertising and withdrawing this route about 9 times a second!! This exposed a crash in the update-group code where we try to merge two sub-groups but the assert on adj_count fails because the timing worked out where one had advertised 10.0.0.8/32 but the other had not. NOTE: the race condition described above will be resolved via a separate patch. --- bgpd/bgp_updgrp.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/bgpd/bgp_updgrp.c b/bgpd/bgp_updgrp.c index a250a7d4bd..3ba476d086 100644 --- a/bgpd/bgp_updgrp.c +++ b/bgpd/bgp_updgrp.c @@ -998,7 +998,7 @@ update_subgroup_find (struct update_group *updgrp, struct peer_af *paf) * Returns TRUE if this subgroup is in a state that allows it to be * merged into another subgroup. */ -static inline int +static int update_subgroup_ready_for_merge (struct update_subgroup *subgrp) { @@ -1032,7 +1032,7 @@ update_subgroup_ready_for_merge (struct update_subgroup *subgrp) * Returns TRUE if the first subgroup can merge into the second * subgroup. */ -static inline int +static int update_subgroup_can_merge_into (struct update_subgroup *subgrp, struct update_subgroup *target) { @@ -1051,22 +1051,10 @@ update_subgroup_can_merge_into (struct update_subgroup *subgrp, CHECK_FLAG(target->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE)) return 0; - /* - * If there are any adv entries on the target, then its adj-out (the - * set of advertised routes) does not match that of the other - * subgrp, and we cannot merge the two. - * - * The adj-out is used when generating a route refresh to a peer in - * a subgroup. If it is not accurate, say it is missing an entry, we - * may miss sending a withdraw for an entry as part of a refresh. - */ - if (!advertise_list_is_empty (target)) - return 0; - - if (update_subgroup_needs_refresh (target)) + if (subgrp->adj_count != target->adj_count) return 0; - return 1; + return update_subgroup_ready_for_merge (target); } /* -- 2.39.5