summaryrefslogtreecommitdiff
path: root/zebra/rib.h
diff options
context:
space:
mode:
authorDonald Sharp <sharpd@nvidia.com>2022-10-21 07:20:44 -0400
committerDonald Sharp <sharpd@nvidia.com>2022-10-26 15:06:23 -0400
commit8d4665aabfba6dc2da854d6cb5cd439930c1ea76 (patch)
treecf3ddf393f4903adb50021f42ee0e18839f75283 /zebra/rib.h
parent659800f3c1db5a7edb97e5d09a9a1817759d02e6 (diff)
zebra: Fix handling of recursive routes when processing closely in time
When zebra receives routes from upper level protocols it decodes the zapi message and places the routes on the metaQ for processing. Suppose we have a route A that is already installed by some routing protocol. And there is a route B that has a nexthop that will be recursively resolved through A. Imagine if a route replace operation for A is going to happen from an upper level protocol at about the same time the route B is going to be installed into zebra. If these routes are received, and decoded, at about the same time there exists a chance that the metaQ will contain both of them at the same time. If the order of installation is [ B, A ]. B will be resolved correctly through A and installed, A will be processed and re-installed into the FIB. If the nexthops have changed for A then the owner of B should be notified about the change( and B can do the correct action here and decide to withdraw or re-install ). Now imagine if the order of routes received for processing on the metaQ is [ A, B ]. A will be received, processed and sent to the dataplane for reinstall. B will then be pulled off the metaQ and fail the install since A is in a `not Installed` state. Let's loosen the restriction in nexthop resolution for B such that if the route we are dependent on is a route replace operation allow the resolution to suceed. This requires zebra to track a new route state( ROUTE_ENTRY_ROUTE_REPLACING ) that can be looked at during nexthop resolution. I believe this is ok because A is a route replace operation, which could result in this: -route install failed, in which case B should be nht'ing and will receive the nht failure and the upper level protocol should remove B. -route install succeeded, no nexthop changes. In this case allowing the resolution for B is ok, NHT will not notify the upper level protocol so no action is needed. -route install succeeded, nexthops changes. In this case allowing the resolution for B is ok, NHT will notify the upper level protocol and it can decide to reinstall B or not based upon it's own algorithm. This set of events was found by the bgp_distance_change topotest(s). Effectively the tests were looking for the bug ( A, B order in the metaQ ) as the `correct` state. When under very heavy load, the A, B ordering caused A to just be installed and fully resolved in the dataplane before B is gotten to( which is entirely possible ). Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Diffstat (limited to 'zebra/rib.h')
-rw-r--r--zebra/rib.h7
1 files changed, 7 insertions, 0 deletions
diff --git a/zebra/rib.h b/zebra/rib.h
index 99f52bcd4e..166500fa5c 100644
--- a/zebra/rib.h
+++ b/zebra/rib.h
@@ -158,6 +158,13 @@ struct route_entry {
* differs from the rib/normal set of nexthops.
*/
#define ROUTE_ENTRY_USE_FIB_NHG 0x40
+/*
+ * Route entries that are going to the dplane for a Route Replace
+ * let's note the fact that this is happening. This will
+ * be useful when zebra is determing if a route can be
+ * used for nexthops
+ */
+#define ROUTE_ENTRY_ROUTE_REPLACING 0x80
/* Sequence value incremented for each dataplane operation */
uint32_t dplane_sequence;