bgpd: fix addressing information of non established outgoing sessions
When trying to connect to a BGP peer that does not respons, the 'show
bgp neighbors' command does not give any indication on the local and
remote addresses used:
> # show bgp neighbors
> BGP neighbor is 192.0.2.150, remote AS 65500, local AS 65500, internal link
> Local Role: undefined
> Remote Role: undefined
> BGP version 4, remote router ID 0.0.0.0, local router ID 192.0.2.1
> BGP state = Connect
> [..]
> Connections established 0; dropped 0
> Last reset 00:00:04, Waiting for peer OPEN (n/a)
> Internal BGP neighbor may be up to 255 hops away.
> BGP Connect Retry Timer in Seconds: 120
> Next connect timer due in 117 seconds
> Read thread: off Write thread: off FD used: 27
The addressing information (address and port) are only available
when TCP session is established, whereas this information is present
at the system level:
Add the display for outgoing BGP session, as the information in
the getsockname() API provides information for connected streams.
When getpeername() API does not give any information, use the peer
configuration (destination port is encoded in peer->port).
> # show bgp neighbors
> BGP neighbor is 192.0.2.150, remote AS 65500, local AS 65500, internal link
> Local Role: undefined
> Remote Role: undefined
> BGP version 4, remote router ID 0.0.0.0, local router ID 192.0.2.1
> BGP state = Connect
> [..]
> Connections established 0; dropped 0
> Last reset 00:00:16, Waiting for peer OPEN (n/a)
> Local host: 192.0.2.1, Local port: 46084
> Foreign host: 192.0.2.150, Foreign port: 179
bgpd: remove useless control checks about TCP connection
When attempting to get the src and destination addresses of a given
connection, the API may return the NULL pointer, but further code
in bgp_zebra_nexthop_set() already does a check about the given
pointer.
Relaxing the error code for all the returned adressing.
Fixes: 1ff9a340588a ("bgpd: bgpd-fsm-fix.patch") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit ba7130309954fbe8d58854339ca43259149e603a)
bgpd: Set LLGR stale routes for all the paths including addpath
Without this patch we set only the first path for the route (if multiple exist)
as LLGR stale and stop doing that for the rest of the paths, which is wrong.
Donatas Abraitis [Thu, 31 Oct 2024 08:47:48 +0000 (10:47 +0200)]
zebra: Add missing new line for help string
```
-A, --asic-offload FRR is interacting with an asic underneath the linux kernel
--v6-with-v4-nexthops Underlying dataplane supports v6 routes with v4 nexthops -s, --nl-bufsize Set netlink receive buffer size
```
Fixes: 1f5611c06d1c243b42279748788f0627793ead9c ("zebra: Allow zebra cli to accept v6 routes with v4 nexthops") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 25ae643996d338b8230fb15a9064843fe85de224)
Louis Scalbert [Fri, 25 Oct 2024 15:54:07 +0000 (17:54 +0200)]
bgpd: fix display of local label in show bgp
Fix the display of the local label in show bgp.
> r1# show bgp ipv4 labeled-unicast 172.16.2.2/32
> BGP routing table entry for 172.16.2.2/32, version 2
> Local label: 16 <---- MISSING
> Paths: (1 available, best #1, table default, vrf (null))
> Advertised to non peer-group peers:
> 192.168.1.2
> 65501
> 192.168.1.2 from 192.168.1.2 (172.16.2.2)
> Origin IGP, metric 0, valid, external, best (First path received)
> Remote label: 3
> Last update: Fri Oct 25 17:55:45 2024
Fixes: 67f67ba481 ("bgpd: Drop label_ntop/label_pton functions") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit e7b3276ace65d59edb4d614158d4f2959f12f868)
pimd: allow resolving bsr via directly connected secondary address
This only matters to single hop nodes that are adjacent to the bsr. More common
with IPv6 where LL address is used in PIM as the primary address. If the BSR IP
happens to be an address on the same interface, the receiving pim router
rejects the BSR address because it expects the BSR IP to resolve via the LL address
even if we have a connected route for the same BSR IP subnet. Effectively, we want to
allow rpf to be resolved via secondary IPs with connected routes on the same interface,
and not limit them to primary addresses.
Enke Chen [Sun, 20 Oct 2024 19:25:46 +0000 (12:25 -0700)]
bgpd: allow value 0 in aigp-metric setting
The value of 0 is accepted from peers, and can also be set by the
route-map "set aigp-metric igp-metric". For coonsistency, it should
be allowed in "set aigp-metric <value>" as well.
Enke Chen [Wed, 16 Oct 2024 18:15:28 +0000 (11:15 -0700)]
bgpd: fix several issues in sourcing AIGP attribute
Fix several issues in sourcing AIGP attribute:
1) AIGP should not be set as default for a redistributed route or a
static network. It should be set by config instead.
2) AIGP sourced by "set aigp-metric igp-metric" in a route-map does
not set the correct value for a redistributed route.
3) When redistribute a connected route like loopback, the AGIP (with
value 0) is sourced by "set aigp-metric igp-metric", but the
attribute is not propagated as the attribute flag is not set.
Enke Chen [Tue, 15 Oct 2024 01:47:59 +0000 (18:47 -0700)]
tests: fix and adjust topotest/bgp_aigp
Fix and adjust the topotest post the fix for route selection with
AIGP.
When there are multiple IGP domains (OSPF in this case), the nexthop
for a BGP route with the AIGP attribute must be resolved in its own
IGP domain.
The changes in r2/bgpd.conf and r3/bgpd.conf are needed as incorrect
IGP metrics are received from NHT for the recursive nexthops. Once
the issue is resolved, the changes can be reverted.
Igor Zhukov [Fri, 4 Oct 2024 06:16:02 +0000 (13:16 +0700)]
zebra: Fix crash during reconnect
fpm_enqueue_rmac_table expects an fpm_rmac_arg* as its argument.
The issue can be reproduced by dropping the TCP session using:
ss -K dst 127.0.0.1 dport = 2620
I used Fedora 40 and frr 9.1.2 and I got the gdb backtrace:
(gdb) bt
0 0x00007fdd7d6997ea in fpm_enqueue_rmac_table (bucket=0x2134dd0, arg=0x2132b60) at zebra/dplane_fpm_nl.c:1217
1 0x00007fdd7dd1560d in hash_iterate (hash=0x21335f0, func=0x7fdd7d6997a0 <fpm_enqueue_rmac_table>, arg=0x2132b60) at lib/hash.c:252
2 0x00007fdd7dd1560d in hash_iterate (hash=0x1e5bf10, func=func@entry=0x7fdd7d698900 <fpm_enqueue_l3vni_table>,
arg=arg@entry=0x7ffed983bef0) at lib/hash.c:252
3 0x00007fdd7d698b5c in fpm_rmac_send (t=<optimized out>) at zebra/dplane_fpm_nl.c:1262
4 0x00007fdd7dd6ce22 in event_call (thread=thread@entry=0x7ffed983c010) at lib/event.c:1970
5 0x00007fdd7dd20758 in frr_run (master=0x1d27f10) at lib/libfrr.c:1213
6 0x0000000000425588 in main (argc=10, argv=0x7ffed983c2e8) at zebra/main.c:492
bgpd: Actually make ` --v6-with-v4-nexthops` it work
It was using `-v` which is actually a _version_.
Fixes: 0435b31bb8ed55377f83d0e19bc085abc3c71b44 ("bgpd: Allow bgp to specify if it will allow v6 routing with v4 nexthops") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 0495cac837ad0f6ff1082746c37e4a48c1068035)
paths key is not there for
'show bgp l2vpn evpn route rd <rd-id> mac <mac> json' uses
evpn prefix as key for each path.
Replace the evpn prefix with "paths".
This aligned with overall EVPN RIB json output like
'show bgp l2vpn evpn route json'
'show bgp l2vpn evpn route rd <> type 2 json'
Donald Sharp [Wed, 25 Sep 2024 16:09:40 +0000 (12:09 -0400)]
zebra: Correctly report metrics
Report the routes metric in IPFORWARDMETRIC1 and return
-1 for the other metrics as required by the IP-FORWARD-MIB.
inetCidrRouteMetric2 OBJECT-TYPE
SYNTAX Integer32
MAX-ACCESS read-create
STATUS current
DESCRIPTION
"An alternate routing metric for this route. The
semantics of this metric are determined by the routing-
protocol specified in the route's inetCidrRouteProto
value. If this metric is not used, its value should be
set to -1."
DEFVAL { -1 }
::= { inetCidrRouteEntry 13 }
I've included metric2 but it's the same for all of them.
Donald Sharp [Wed, 25 Sep 2024 16:06:29 +0000 (12:06 -0400)]
zebra: Fix snmp walk of zebra rib
The snmp walk of the zebra rib was skipping entries
because in_addr_cmp was replaced with a prefix_cmp
which worked slightly differently causing parts
of the zebra rib tree to be skipped.
Louis Scalbert [Thu, 12 Sep 2024 07:31:49 +0000 (09:31 +0200)]
isisd: fix rcap tlv double-free crash
A double-free crash happens when a subTLV of the "Router Capability"
TLV is not readable and a previous "Router Capability" TLV was read.
rcap was supposed to be freed later by isis_free_tlvs() ->
free_tlv_router_cap(). In 78774bbcd5 ("isisd: add isis flex-algo lsp
advertisement"), this was not the case because rcap was not saved to
tlvs->router_cap when the function returned early because of a subTLV
length issue.
Always set tlvs->router_cap to free the memory.
Note that this patch has the consequence that in case of subTLV error,
the previously read "Router Capability" subTLVs are kept in memory.
Fixes: 49efc80d34 ("isisd: Ensure rcap is freed in error case") Fixes: 78774bbcd5 ("isisd: add isis flex-algo lsp advertisement") Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit d61758140d33972c10ecbb72d0a3e528049dd8d6)
When an NHRP peer was forwarding a message, it was copying all
extensions from the originally received packet. The authentication
extension must be regenerated hop by hop per RFC2332.
This fix checks for the auth extension when copying extensions
and omits the original packet auth and instead regenerates a new auth extension.
Donatas Abraitis [Fri, 14 Jun 2024 13:33:32 +0000 (16:33 +0300)]
docker: Set ABUILD_APK_INDEX_OPTS for frr build
In build() stage of abuild, it does `apk index ...` where frr* packages
are unsigned. We don't sign them here, and thus we need to specify `--allow-untrusted`.
Donatas Abraitis [Fri, 14 Jun 2024 08:37:23 +0000 (11:37 +0300)]
docker: Set ABUILD_APK_INDEX_OPTS for libyang
In build() stage of abuild, it does `apk index ...` where libyang* packages
are unsigned. We don't sign them here, and thus we need to specify `--allow-untrusted`.
Louis Scalbert [Tue, 27 Aug 2024 16:22:27 +0000 (18:22 +0200)]
isisd: fix update link params after circuit is up
If the link-params are set when the circuit not yet up, the link-params
are never updated.
isis_link_params_update() is called from isis_circuit_up() but returns
immediately because circuit->state != C_STATE_UP. circuit->state is
updated in isis_csm_state_change after isis_circuit_up().
Do not return isis_link_params_update() if circuit->state != C_STATE_UP.
Fixes: 0fdd8b2b11 ("isisd: update link params after circuit is up") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 6ce6b7a8564f661495fec17f3ea33eeaf9e2f48c)
Dmytro Shytyi [Thu, 8 Aug 2024 13:42:40 +0000 (15:42 +0200)]
topotest: test_bgp_snmp_bgpv4v2_notification
This test checks the bgp crash on rt2 when 2 commands
launched consequently:
T0: rr, config -> router bgp 65004 -> neighbor 192.168.12.2 password 8888
T1: rt2, snmpwalk -v 2c -c public 127.0.0.1 .1.3.6.1.4.1.7336.4.2.1
T2: test if rt2 bgp is crashed.
Louis Scalbert [Tue, 20 Aug 2024 08:33:30 +0000 (10:33 +0200)]
bgpd: fix crash at no rpki
When 'no rpki' is requested and the rtrlib RPKI object was freed, bgpd
is crashing.
RPKI is configured in VRF red.
> ip l set red down
> ip l del red
> printf 'conf\n vrf red\n no rpki' | vtysh
> Core was generated by `/usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:44
> 44 ./nptl/pthread_kill.c: No such file or directory.
> [Current thread is 1 (Thread 0x7fb401f419c0 (LWP 190226))]
> (gdb) bt
> #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:44
> #1 __pthread_kill_internal (signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:78
> #2 __GI___pthread_kill (threadid=140411103615424, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
> #3 0x00007fb4021ad476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
> #4 0x00007fb4025ce22b in core_handler (signo=11, siginfo=0x7fff831b2d70, context=0x7fff831b2c40) at lib/sigevent.c:248
> #5 <signal handler called>
> #6 rtr_mgr_remove_group (config=0x55fe8789f750, preference=11) at /build/make-pkg/output/source/DIST_RTRLIB/rtrlib/rtrlib/rtr_mgr.c:607
> #7 0x00007fb40145f518 in rpki_delete_all_cache_nodes (rpki_vrf=0x55fe8789f4f0) at bgpd/bgp_rpki.c:442
> #8 0x00007fb401463098 in no_rpki_magic (self=0x7fb40146bba0 <no_rpki_cmd>, vty=0x55fe877f5130, argc=2, argv=0x55fe877fccd0) at bgpd/bgp_rpki.c:1732
> #9 0x00007fb40145c09a in no_rpki (self=0x7fb40146bba0 <no_rpki_cmd>, vty=0x55fe877f5130, argc=2, argv=0x55fe877fccd0) at ./bgpd/bgp_rpki_clippy.c:37
> #10 0x00007fb402527abc in cmd_execute_command_real (vline=0x55fe877fd150, vty=0x55fe877f5130, cmd=0x0, up_level=0) at lib/command.c:984
> #11 0x00007fb402527c35 in cmd_execute_command (vline=0x55fe877fd150, vty=0x55fe877f5130, cmd=0x0, vtysh=0) at lib/command.c:1043
> #12 0x00007fb4025281e5 in cmd_execute (vty=0x55fe877f5130, cmd=0x55fe877fb8c0 "no rpki\n", matched=0x0, vtysh=0) at lib/command.c:1209
> #13 0x00007fb4025f0aed in vty_command (vty=0x55fe877f5130, buf=0x55fe877fb8c0 "no rpki\n") at lib/vty.c:615
> #14 0x00007fb4025f2a11 in vty_execute (vty=0x55fe877f5130) at lib/vty.c:1378
> #15 0x00007fb4025f513d in vtysh_read (thread=0x7fff831b5fa0) at lib/vty.c:2373
> #16 0x00007fb4025e9611 in event_call (thread=0x7fff831b5fa0) at lib/event.c:2011
> #17 0x00007fb402566976 in frr_run (master=0x55fe871a14a0) at lib/libfrr.c:1212
> #18 0x000055fe857829fa in main (argc=9, argv=0x7fff831b6218) at bgpd/bgp_main.c:549
Fixes: 8156765abe ("bgpd: Add `no rpki` command") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 4e053d65f1c7edbcc3391026300388513d4c31b0)
There is also an issue when doing "rpki reset" and then "no rpki".
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Donald Sharp [Sat, 10 Aug 2024 23:43:08 +0000 (19:43 -0400)]
zebra: Ensure non-equal id's are not same nhg's
The function zebra_nhg_hash_equal is only used
as a hash function for storage of NHG's and retrieval.
If you have say two nhg's:
31 (25/26)
32 (25/26)
This function would return them as being equal. Which
of course leads to the problem when you attempt to
hash_release 32 but release 31 from the hash. Then later
when you attempt to do hash comparisons 32 has actually
been freed leaving to use after free situations and shit
goes down hill fast.
This hash is only used as part of the hash comparison
function for nexthop group storage. Since this is so
let's always return the 31/32 nhg's are not equal at all.
We possibly have a different problem where we are creating
31 and 32 ( when 31 should have just been used instead of 32 )
but we need to prevent any type of hash release problem at all.
This supercedes any other issue( that should be tracked down
on it's own ). Since you can have use after free situation
that leads to a crash -vs- some possible nexthop group duplication
which is very minor in comparison.
ospfd: fix internal ldp-sync state flags when feature is disabled
When enabling "mpls ldp-sync" under "router ospf" ospfd configures
SET_FLAG(ldp_sync_info->flags, LDP_SYNC_FLAG_IF_CONFIG) so internally knowing
that the ldp-sync feature is enabled. However the flag is not cleared when
turning of the feature using "nompls ldp-sync"!
Louis Scalbert [Fri, 28 Jun 2024 11:22:36 +0000 (13:22 +0200)]
pimd: fix crash on non-existent interface
Fix the following crash when pim options are (un)configured on an
non-existent interface.
> r1(config)# int fgljdsf
> r1(config-if)# no ip pim unicast-bsm
> vtysh: error reading from pimd: Connection reset by peer (104)Warning: closing connection to pimd because of an I/O error!
> #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1 0x00007f70c8f32249 in core_handler (signo=11, siginfo=0x7fffff88e4f0, context=0x7fffff88e3c0) at lib/sigevent.c:258
> #2 <signal handler called>
> #3 0x0000556cfdd9b16d in lib_interface_pim_address_family_unicast_bsm_modify (args=0x7fffff88f130) at pimd/pim_nb_config.c:1910
> #4 0x00007f70c8efdcb5 in nb_callback_modify (context=0x556d00032b60, nb_node=0x556cffeeb9b0, event=NB_EV_APPLY, dnode=0x556d00031670, resource=0x556d00032b48, errmsg=0x7fffff88f710 "", errmsg_len=8192)
> at lib/northbound.c:1538
> #5 0x00007f70c8efe949 in nb_callback_configuration (context=0x556d00032b60, event=NB_EV_APPLY, change=0x556d00032b10, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1888
> #6 0x00007f70c8efee82 in nb_transaction_process (event=NB_EV_APPLY, transaction=0x556d00032b60, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:2016
> #7 0x00007f70c8efd658 in nb_candidate_commit_apply (transaction=0x556d00032b60, save_transaction=true, transaction_id=0x0, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1356
> #8 0x00007f70c8efd78e in nb_candidate_commit (context=..., candidate=0x556cffeb0e80, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1389
> #9 0x00007f70c8f03e58 in nb_cli_classic_commit (vty=0x556d00025a80) at lib/northbound_cli.c:51
> #10 0x00007f70c8f043f8 in nb_cli_apply_changes_internal (vty=0x556d00025a80,
> xpath_base=0x7fffff893bb0 "/frr-interface:lib/interface[name='fgljdsf']/frr-pim:pim/address-family[address-family='frr-routing:ipv4']", clear_pending=false) at lib/northbound_cli.c:178
> #11 0x00007f70c8f0475d in nb_cli_apply_changes (vty=0x556d00025a80, xpath_base_fmt=0x556cfdde9fe0 "./frr-pim:pim/address-family[address-family='%s']") at lib/northbound_cli.c:234
> #12 0x0000556cfdd8298f in pim_process_no_unicast_bsm_cmd (vty=0x556d00025a80) at pimd/pim_cmd_common.c:3493
> #13 0x0000556cfddcf782 in no_ip_pim_ucast_bsm (self=0x556cfde40b20 <no_ip_pim_ucast_bsm_cmd>, vty=0x556d00025a80, argc=4, argv=0x556d00031500) at pimd/pim_cmd.c:4950
> #14 0x00007f70c8e942f0 in cmd_execute_command_real (vline=0x556d00032070, vty=0x556d00025a80, cmd=0x0, up_level=0) at lib/command.c:1002
> #15 0x00007f70c8e94451 in cmd_execute_command (vline=0x556d00032070, vty=0x556d00025a80, cmd=0x0, vtysh=0) at lib/command.c:1061
> #16 0x00007f70c8e9499f in cmd_execute (vty=0x556d00025a80, cmd=0x556d00030320 "no ip pim unicast-bsm", matched=0x0, vtysh=0) at lib/command.c:1227
> #17 0x00007f70c8f51e44 in vty_command (vty=0x556d00025a80, buf=0x556d00030320 "no ip pim unicast-bsm") at lib/vty.c:616
> #18 0x00007f70c8f53bdd in vty_execute (vty=0x556d00025a80) at lib/vty.c:1379
> #19 0x00007f70c8f55d59 in vtysh_read (thread=0x7fffff896600) at lib/vty.c:2374
> #20 0x00007f70c8f4b209 in event_call (thread=0x7fffff896600) at lib/event.c:2011
> #21 0x00007f70c8ed109e in frr_run (master=0x556cffdb4ea0) at lib/libfrr.c:1217
> #22 0x0000556cfdddec12 in main (argc=2, argv=0x7fffff896828, envp=0x7fffff896840) at pimd/pim_main.c:165
> (gdb) f 3
> #3 0x0000556cfdd9b16d in lib_interface_pim_address_family_unicast_bsm_modify (args=0x7fffff88f130) at pimd/pim_nb_config.c:1910
> 1910 pim_ifp->ucast_bsm_accept =
> (gdb) list
> 1905 case NB_EV_ABORT:
> 1906 break;
> 1907 case NB_EV_APPLY:
> 1908 ifp = nb_running_get_entry(args->dnode, NULL, true);
> 1909 pim_ifp = ifp->info;
> 1910 pim_ifp->ucast_bsm_accept =
> 1911 yang_dnode_get_bool(args->dnode, NULL);
> 1912
> 1913 break;
> 1914 }
> (gdb) p pim_ifp
> $1 = (struct pim_interface *) 0x0
Fixes: 3bb513c399 ("lib: adapt to version 2 of libyang") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 6952bea5cdd38057bf8c0a5e9c0fbe916dc73953)
isisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP
1. When the root IS regenerates an LSP, it calls lsp_build() -> lsp_clear_data() to free the TLV memory of the first fragment and all other fragments. If the number of fragments in the regenerated LSP decreases or if no fragmentation is needed, the extra LSP fragments are not immediately deleted. Instead, lsp_seqno_update() -> lsp_purge() is called to set the remaining time to zero and start aging, while also notifying other IS nodes to age these fragments. lsp_purge() usually does not reset lsp->hdr.seqno to zero because the LSP might recover during the aging process.
2. When other IS nodes receive an LSP, they always call process_lsp() -> isis_unpack_tlvs() to allocate TLV memory for the LSP. This does not differentiate whether the received LSP has a remaining lifetime of zero. Therefore, it is rare for an LSP of a non-root IS to have empty TLVs. Of course, if an LSP with a remaining time of zero and already corrupted is received, lsp_update() -> lsp_purge() will be called to free the TLV memory of the LSP, but this scenario is rare.
3. In LFA calculations, neighbors of the root IS are traversed, and each neighbor is taken as a new root to compute the neighbor SPT. During this process, the old root IS will serve as a neighbor of the new root IS, triggering a call to isis_spf_process_lsp() to parse the LSP of the old root IS and obtain its IP vertices and neighboring IS vertices. However, isis_spf_process_lsp() only checks whether the TLVs in the first fragment of the LSP exist, and does not check the TLVs in the fragmented LSP. If the TLV memory of the fragmented LSP of the old root IS has been freed, it can lead to a null pointer access, causing the current crash.
Additionally, for the base SPT, there are only two places where the LSP of the root IS is parsed:
1. When obtaining the UP neighbors of the root IS via spf_adj_list_parse_lsp().
2. When preloading the IP vertices of the root IS via isis_lsp_iterate_ip_reach().
Both of these checks ensure that frag->tlvs is not null, and they do not subsequently call isis_spf_process_lsp() to parse the root IS's LSP. It is very rare for non-root IS LSPs to have empty TLVs unless they are corrupted LSPs awaiting deletion. If it happens, a crash will occur.
The backtrace is as follows:
(gdb) bt
#0 0x00007f3097281fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f30973a2972 in core_handler (signo=11, siginfo=0x7ffce66c2870, context=0x7ffce66c2740) at ../lib/sigevent.c:261
#2 <signal handler called>
#3 0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0)
at ../isisd/isis_spf.c:898
#4 0x000055dfa805743b in isis_spf_loop (spftree=0x55dfa950eee0, root_sysid=0x55dfa950ef6c "") at ../isisd/isis_spf.c:1688
#5 0x000055dfa805784f in isis_run_spf (spftree=0x55dfa950eee0) at ../isisd/isis_spf.c:1808
#6 0x000055dfa8037ff5 in isis_spf_run_neighbors (spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:1259
#7 0x000055dfa803ac17 in isis_spf_run_lfa (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:2300
#8 0x000055dfa8057964 in isis_run_spf_with_protection (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_spf.c:1827
#9 0x000055dfa8057c15 in isis_run_spf_cb (thread=0x7ffce66c38e0) at ../isisd/isis_spf.c:1889
#10 0x00007f30973bbf04 in thread_call (thread=0x7ffce66c38e0) at ../lib/thread.c:1990
#11 0x00007f309735497b in frr_run (master=0x55dfa91733c0) at ../lib/libfrr.c:1198
#12 0x000055dfa8029d5d in main (argc=5, argv=0x7ffce66c3b08, envp=0x7ffce66c3b38) at ../isisd/isis_main.c:273
(gdb) f 3
#3 0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0)
at ../isisd/isis_spf.c:898
898 ../isisd/isis_spf.c: No such file or directory.
(gdb) p te_neighs
$1 = (struct isis_item_list *) 0x120
(gdb) p lsp->tlvs
$2 = (struct isis_tlvs *) 0x0
(gdb) p lsp->hdr
$3 = {pdu_len = 27, rem_lifetime = 0, lsp_id = "\000\000\000\000\000\001\000\001", seqno = 4, checksum = 59918, lsp_bits = 1 '\001'}
The backtrace provided above pertains to version 8.5.4, but it seems that the same issue exists in the code of the master branch as well.
I have reviewed the process for calculating the SPT based on the LSP, and isis_spf_process_lsp() is the only function that does not check whether the TLVs in the fragments are empty. Therefore, I believe that modifying this function alone should be sufficient. If the TLVs of the current fragment are already empty, we do not need to continue processing subsequent fragments. This is consistent with the behavior where we do not process fragments if the TLVs of the first fragment are empty.
Of course, one could argue that lsp_purge() should still retain the TLV memory, freeing it and then reallocating it if needed. However, this is a debatable point because in some scenarios, it is permissible for the LSP to have empty TLVs. For example, after receiving an SNP (Sequence Number PDU) message, an empty LSP (with lsp->hdr.seqno = 0) might be created by calling lsp_new. If the corresponding LSP message is discarded due to domain or area authentication failure, the TLV memory wouldn't be allocated.
Test scenario:
In an LFA network, importing a sufficient number of static routes to cause LSP fragmentation, and then rolling back the imported static routes so that the LSP is no longer fragmented, can easily result in this issue.
Use `vtysh` with this input file:
```
ip route A nh1
ip route A nh2
ip route B nh1
ip route B nh2
```
When running "ip route B" with "nh1" and "nh2", the procedure maybe is:
1) Create the two nexthops: "nh1" and "nh2".
2) Register "nh1" with `static_zebra_nht_register()`, then the states of both
"nh1" and "nht2" are set to "STATIC_SENT_TO_ZEBRA".
3) Register "nh2" with `static_zebra_nht_register()`, then only the routes with
nexthop of "STATIC_START" will be sent to zebra.
So, send the routes with the nexthop of "STATIC_SENT_TO_ZEBRA" to zebra.