From: Donald Sharp Date: Sat, 22 Feb 2025 23:13:15 +0000 (-0500) Subject: staticd: Fix crash because registering unknown vrf X-Git-Url: https://git.puffer.fish/?a=commitdiff_plain;h=da0f552f5d46a2db365565de582252ee3d620038;p=mirror%2Ffrr.git staticd: Fix crash because registering unknown vrf With recent commit: c1adc8f1d6795124df022a36388df173d217a34e staticd has started to crash aproximately 1/10 of the tine in the static_vrf topotest (gdb) bt 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=140400982256064, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x00007fb1a6442476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x00007fb1a6950823 in core_handler (signo=6, siginfo=0x7ffd6d832ff0, context=0x7ffd6d832ec0) at lib/sigevent.c:268 5 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=140400982256064, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x00007fb1a6442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x00007fb1a64287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x00007fb1a699a422 in _zlog_assert_failed (xref=0x55f7dfd3dac0 <_xref.117>, extra=0x55f7dfd30c30 "BUG: NH %pFX registered but not in hashtable") at lib/zlog.c:789 12 0x000055f7dfd1201f in static_zebra_nht_register (nh=0x55f7fd2ecd80, reg=true) at staticd/static_zebra.c:333 13 0x000055f7dfd29c9d in static_install_nexthop (nh=0x55f7fd2ecd80) at staticd/static_routes.c:299 14 0x000055f7dfd2a126 in static_fixup_vrf (vrf=0x55f7fd2333a0, stable=0x55f7fd271030, afi=AFI_IP, safi=SAFI_UNICAST) at staticd/static_routes.c:441 15 0x000055f7dfd2a2be in static_fixup_vrf_ids (vrf=0x55f7fd2333a0) at staticd/static_routes.c:494 16 0x000055f7dfd15b53 in static_vrf_enable (vrf=0x55f7fd2333a0) at staticd/static_vrf.c:124 17 0x00007fb1a696ffa5 in vrf_enable (vrf=0x55f7fd2333a0) at lib/vrf.c:325 18 0x00007fb1a6991c87 in zclient_vrf_add (cmd=33, zclient=0x55f7fd29f740, length=76, vrf_id=8) at lib/zclient.c:2701 19 0x00007fb1a6996cba in zclient_read (thread=0x7ffd6d834230) at lib/zclient.c:4764 20 0x00007fb1a696bd9b in event_call (thread=0x7ffd6d834230) at lib/event.c:2019 21 0x00007fb1a68e1a3a in frr_run (master=0x55f7fd102e10) at lib/libfrr.c:1246 22 0x000055f7dfd1081e in main (argc=7, argv=0x7ffd6d834478, envp=0x7ffd6d8344b8) at staticd/static_main.c:193 Tracking this down, the crash is because the nh believes that is already registered but lookup fails, causing this assert. Looking at the code static_fixup_vrf is changing the vrf_id. I put a zlog_debug right before the change of the nh vrf_id and noticed that the vrf id was UNKNOWN. So, the code is attempting to register into zebra the nexthop with a vrf unknown( which will be ignored ). Modify the code in the registration process to notice that the nh is still UNKNOWN and as such nothing should be done. Signed-off-by: Donald Sharp --- diff --git a/staticd/static_zebra.c b/staticd/static_zebra.c index 552dd3ee1f..3faeb3d37a 100644 --- a/staticd/static_zebra.c +++ b/staticd/static_zebra.c @@ -323,6 +323,10 @@ void static_zebra_nht_register(struct static_nexthop *nh, bool reg) if (!static_zebra_nht_get_prefix(nh, &lookup.nh)) return; + + if (nh->nh_vrf_id == VRF_UNKNOWN) + return; + lookup.nh_vrf_id = nh->nh_vrf_id; lookup.safi = si->safi;