From 46a4e3455b9b1b9d48aaf8983f43867d0ad27080 Mon Sep 17 00:00:00 2001 From: Donald Sharp Date: Mon, 4 Feb 2019 15:16:31 -0500 Subject: [PATCH] zebra: NHT was being run at least 2 times and missreporting data With the data plane changes that were made, we are now running nexthop tracking 2 times. Once at the end of meta-queue insertion and once at the end of receiving a bunch of data from the dataplane. The Addition of the data plane code caused flags to not be set fully for the resolved routes( since we do not know the answer yet ), This in turn caused the nexthop tracking run after the meta-queue to think that the route was not `good`. This would cause it to tell all interested parties that there was no nexthop. After the dataplane insertion we are also no running nht code. This was re-figuring out the nexthop correctly and also correctly reporting to interested parties that there was a path again. Example: donna.cumulusnetworks.com(config)# do show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued route, f - failed route K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:06:47 S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:04:47 C>* 10.50.11.0/24 is directly connected, enp0s3, 00:06:47 C>* 192.168.209.0/24 is directly connected, enp0s8, 00:06:47 C>* 192.168.210.0/24 is directly connected, enp0s9, 00:06:47 donna.cumulusnetworks.com(config)# ip route 4.5.6.7/32 192.168.210.1 donna.cumulusnetworks.com(config)# do show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued route, f - failed route K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:07:06 S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:04 * via 192.168.210.1, enp0s9, 00:00:04 C>* 10.50.11.0/24 is directly connected, enp0s3, 00:07:06 C>* 192.168.209.0/24 is directly connected, enp0s8, 00:07:06 C>* 192.168.210.0/24 is directly connected, enp0s9, 00:07:06 donna.cumulusnetworks.com(config)# Log files for sharp, which is watching 4.5.6.7: 2019/02/04 15:20:54.844288 SHARP: Received update for 4.5.6.7/32 2019/02/04 15:20:54.844820 SHARP: Received update for 4.5.6.7/32 2019/02/04 15:20:54.844836 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0 2019/02/04 15:20:54.844853 SHARP: Nexthop 192.168.210.1, type: 2, ifindex: 4, vrf: 0, label_num: 0 As you can see we have received an update with no nexthops( invalid route ) and a second update immediately after it with 2 nexthops. What's the big deal you say? Well we have code in other daemons that reacts to not having a path for a nexthop. In BGP this will cause us to tear down the peer. In staticd we'll remove the recursively resolved route. In pim we'll remove all paths to the mroute. This is not desirable. The fix is to remove the meta-queue run of nexthop tracking. While running after data plane notice of routes to handle is not ideal we will be fixing this in the future with the nexthop group code, which should know what nexthops are affected by a nexthop group change. Fixed code debug code: donna.cumulusnetworks.com(config)# do show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued route, f - failed route K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:00:46 S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:02 C>* 10.50.11.0/24 is directly connected, enp0s3, 00:00:46 C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:46 C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:46 donna.cumulusnetworks.com(config)# ip route 4.5.6.7/32 192.168.210.1 donna.cumulusnetworks.com(config)# do show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued route, f - failed route K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:00:59 S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:02 * via 192.168.210.1, enp0s9, 00:00:02 C>* 10.50.11.0/24 is directly connected, enp0s3, 00:00:59 C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:59 C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:59 2019/02/04 15:26:20.656395 SHARP: Received update for 4.5.6.7/32 2019/02/04 15:26:20.656440 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0 2019/02/04 15:26:33.688251 SHARP: Received update for 4.5.6.7/32 2019/02/04 15:26:33.688322 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0 2019/02/04 15:26:33.688329 SHARP: Nexthop 192.168.210.1, type: 2, ifindex: 4, vrf: 0, label_num: 0 Signed-off-by: Donald Sharp --- zebra/zebra_rib.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/zebra/zebra_rib.c b/zebra/zebra_rib.c index 995cbf9d1c..73e4b981b5 100644 --- a/zebra/zebra_rib.c +++ b/zebra/zebra_rib.c @@ -2152,14 +2152,6 @@ static void do_nht_processing(void) } } -/* - * All meta queues have been processed. Trigger next-hop evaluation. - */ -static void meta_queue_process_complete(struct work_queue *dummy) -{ - do_nht_processing(); -} - /* Dispatch the meta queue by picking, processing and unlocking the next RN from * a non-empty sub-queue with lowest priority. wq is equal to zebra->ribq and * data @@ -2333,7 +2325,7 @@ static void rib_queue_init(void) /* fill in the work queue spec */ zrouter.ribq->spec.workfunc = &meta_queue_process; zrouter.ribq->spec.errorfunc = NULL; - zrouter.ribq->spec.completion_func = &meta_queue_process_complete; + zrouter.ribq->spec.completion_func = NULL; /* XXX: TODO: These should be runtime configurable via vty */ zrouter.ribq->spec.max_retries = 3; zrouter.ribq->spec.hold = ZEBRA_RIB_PROCESS_HOLD_TIME; -- 2.39.5