With recent commit:
c1adc8f1d6795124df022a36388df173d217a34e staticd has started to crash
aproximately 1/10 of the tine in the static_vrf topotest
(gdb) bt
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=
140400982256064) at ./nptl/pthread_kill.c:44
1 __pthread_kill_internal (signo=6, threadid=
140400982256064) at ./nptl/pthread_kill.c:78
2 __GI___pthread_kill (threadid=
140400982256064, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
3 0x00007fb1a6442476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
4 0x00007fb1a6950823 in core_handler (signo=6, siginfo=0x7ffd6d832ff0, context=0x7ffd6d832ec0) at lib/sigevent.c:268
5 <signal handler called>
6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=
140400982256064) at ./nptl/pthread_kill.c:44
7 __pthread_kill_internal (signo=6, threadid=
140400982256064) at ./nptl/pthread_kill.c:78
8 __GI___pthread_kill (threadid=
140400982256064, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
9 0x00007fb1a6442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
10 0x00007fb1a64287f3 in __GI_abort () at ./stdlib/abort.c:79
11 0x00007fb1a699a422 in _zlog_assert_failed (xref=0x55f7dfd3dac0 <_xref.117>,
extra=0x55f7dfd30c30 "BUG: NH %pFX registered but not in hashtable") at lib/zlog.c:789
12 0x000055f7dfd1201f in static_zebra_nht_register (nh=0x55f7fd2ecd80, reg=true) at staticd/static_zebra.c:333
13 0x000055f7dfd29c9d in static_install_nexthop (nh=0x55f7fd2ecd80) at staticd/static_routes.c:299
14 0x000055f7dfd2a126 in static_fixup_vrf (vrf=0x55f7fd2333a0, stable=0x55f7fd271030, afi=AFI_IP, safi=SAFI_UNICAST)
at staticd/static_routes.c:441
15 0x000055f7dfd2a2be in static_fixup_vrf_ids (vrf=0x55f7fd2333a0) at staticd/static_routes.c:494
16 0x000055f7dfd15b53 in static_vrf_enable (vrf=0x55f7fd2333a0) at staticd/static_vrf.c:124
17 0x00007fb1a696ffa5 in vrf_enable (vrf=0x55f7fd2333a0) at lib/vrf.c:325
18 0x00007fb1a6991c87 in zclient_vrf_add (cmd=33, zclient=0x55f7fd29f740, length=76, vrf_id=8) at lib/zclient.c:2701
19 0x00007fb1a6996cba in zclient_read (thread=0x7ffd6d834230) at lib/zclient.c:4764
20 0x00007fb1a696bd9b in event_call (thread=0x7ffd6d834230) at lib/event.c:2019
21 0x00007fb1a68e1a3a in frr_run (master=0x55f7fd102e10) at lib/libfrr.c:1246
22 0x000055f7dfd1081e in main (argc=7, argv=0x7ffd6d834478, envp=0x7ffd6d8344b8) at staticd/static_main.c:193
Tracking this down, the crash is because the nh believes that is already
registered but lookup fails, causing this assert. Looking at the code
static_fixup_vrf is changing the vrf_id. I put a zlog_debug right
before the change of the nh vrf_id and noticed that the vrf id was
UNKNOWN. So, the code is attempting to register into zebra the nexthop
with a vrf unknown( which will be ignored ).
Modify the code in the registration process to notice that the nh is
still UNKNOWN and as such nothing should be done.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>