Corey Siltala [Fri, 23 Aug 2024 18:04:26 +0000 (18:04 +0000)]
pimd: Fix crash in pimd
ifp->info is not always set in PIM. So add a guard here to stop
it from crashing when addresses are added to a non-PIM enabled interface
and PIM zebra debugging is enabled.
Khem Raj [Fri, 15 Mar 2024 21:34:06 +0000 (14:34 -0700)]
zebra: Mimic GNU basename() API for non-glibc library e.g. musl
musl only provides POSIX version of basename and it has also removed
providing it via string.h header [1] which now results in compile errors
with newer compilers e.g. clang-18
Donald Sharp [Tue, 20 Aug 2024 14:53:34 +0000 (10:53 -0400)]
zebra: Create Singleton nhg's without weights
Currently FRR when it has two nexthop groups:
A
nexthop 1 weight 5
nexthop 2 weight 6
nexthop 3 weight 7
B
nexthop 1 weight 3
nexthop 2 weight 4
nexthop 3 weight 5
We end up with 5 singleton nexthops and two groups:
id 69 via 192.168.119.1 dev enp13s0 scope link proto 194
id 70 via 192.168.119.2 dev enp13s0 scope link proto 194
id 71 via 192.168.119.3 dev enp13s0 scope link proto 194
id 127 via 192.168.119.1 dev enp13s0 scope link proto 194
id 128 via 192.168.119.2 dev enp13s0 scope link proto 194
id 181818168 group 69,182/70,218/71,255 proto 194
id 181818169 group 71,255/127,127/128,170 proto 194
This is not a desirable state to be in. If you have a
link flapping in the network and weights are changing
rapidly you end up with a large number of singleton
nexthops that are being used by the nexthop groups.
This fills up asic space and clutters the table.
Additionally singleton nexthops cannot have any weight
and the fact that you attempt to create a singleton
nexthop with different weights means nothing to the
linux kernel( or any asic dplane ). Let's modify
the code to always create the singleton nexthops
without a weight and then just creating the
NHG's that use the singletons with the appropriate
weight.
id 22 via 192.168.119.1 dev enp13s0 scope link proto 194
id 24 via 192.168.119.2 dev enp13s0 scope link proto 194
id 28 via 192.168.119.3 dev enp13s0 scope link proto 194
id 181818168 group 22,182/24,218/28,255 proto 194
id 181818169 group 22,153/24,204/28,255 proto 194
Donald Sharp [Tue, 20 Aug 2024 12:28:17 +0000 (08:28 -0400)]
lib, zebra: Modify nexthop_cmp to allow you to use weight or not
Currently nexthop weight is a discriminator on whether or not
a nexthop matches. There is a need to no use the weight as
part of this comparison function so let's add a boolean to
allow us to say use this or not.
Christian Hopps [Thu, 22 Aug 2024 01:05:48 +0000 (21:05 -0400)]
tests: update munet to 0.14.10
Changes:
- mutini: handle possible missed zombie cleanup leading to test hangs
- mutini: also we avoid logging in the signal handler which was causing
an exception.
Donald Sharp [Thu, 22 Feb 2024 20:53:45 +0000 (15:53 -0500)]
zebra, tests: Connected and Local routes should have a weight of 1
All routes received by zebra from upper level protocols have a weight
of 1. Let's just make everything extremely consistent in our code.
Lot's of tests needed to be fixed up to make this work.
Donald Sharp [Thu, 22 Feb 2024 21:20:37 +0000 (16:20 -0500)]
zebra: Ensure we cannot send invalid range to kernel
The linux kernel adds 1 upon receipt of a weight, if you
send a 255 it gets unhappy. Let's Limit range to 254 as
that kernel does not like sending of 255.
Dmytro Shytyi [Thu, 8 Aug 2024 13:42:40 +0000 (15:42 +0200)]
topotest: test_bgp_snmp_bgpv4v2_notification
This test checks the bgp crash on rt2 when 2 commands
launched consequently:
T0: rr, config -> router bgp 65004 -> neighbor 192.168.12.2 password 8888
T1: rt2, snmpwalk -v 2c -c public 127.0.0.1 .1.3.6.1.4.1.7336.4.2.1
T2: test if rt2 bgp is crashed.
Louis Scalbert [Tue, 20 Aug 2024 08:33:30 +0000 (10:33 +0200)]
bgpd: fix crash at no rpki
When 'no rpki' is requested and the rtrlib RPKI object was freed, bgpd
is crashing.
RPKI is configured in VRF red.
> ip l set red down
> ip l del red
> printf 'conf\n vrf red\n no rpki' | vtysh
> Core was generated by `/usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:44
> 44 ./nptl/pthread_kill.c: No such file or directory.
> [Current thread is 1 (Thread 0x7fb401f419c0 (LWP 190226))]
> (gdb) bt
> #0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:44
> #1 __pthread_kill_internal (signo=11, threadid=140411103615424) at ./nptl/pthread_kill.c:78
> #2 __GI___pthread_kill (threadid=140411103615424, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
> #3 0x00007fb4021ad476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
> #4 0x00007fb4025ce22b in core_handler (signo=11, siginfo=0x7fff831b2d70, context=0x7fff831b2c40) at lib/sigevent.c:248
> #5 <signal handler called>
> #6 rtr_mgr_remove_group (config=0x55fe8789f750, preference=11) at /build/make-pkg/output/source/DIST_RTRLIB/rtrlib/rtrlib/rtr_mgr.c:607
> #7 0x00007fb40145f518 in rpki_delete_all_cache_nodes (rpki_vrf=0x55fe8789f4f0) at bgpd/bgp_rpki.c:442
> #8 0x00007fb401463098 in no_rpki_magic (self=0x7fb40146bba0 <no_rpki_cmd>, vty=0x55fe877f5130, argc=2, argv=0x55fe877fccd0) at bgpd/bgp_rpki.c:1732
> #9 0x00007fb40145c09a in no_rpki (self=0x7fb40146bba0 <no_rpki_cmd>, vty=0x55fe877f5130, argc=2, argv=0x55fe877fccd0) at ./bgpd/bgp_rpki_clippy.c:37
> #10 0x00007fb402527abc in cmd_execute_command_real (vline=0x55fe877fd150, vty=0x55fe877f5130, cmd=0x0, up_level=0) at lib/command.c:984
> #11 0x00007fb402527c35 in cmd_execute_command (vline=0x55fe877fd150, vty=0x55fe877f5130, cmd=0x0, vtysh=0) at lib/command.c:1043
> #12 0x00007fb4025281e5 in cmd_execute (vty=0x55fe877f5130, cmd=0x55fe877fb8c0 "no rpki\n", matched=0x0, vtysh=0) at lib/command.c:1209
> #13 0x00007fb4025f0aed in vty_command (vty=0x55fe877f5130, buf=0x55fe877fb8c0 "no rpki\n") at lib/vty.c:615
> #14 0x00007fb4025f2a11 in vty_execute (vty=0x55fe877f5130) at lib/vty.c:1378
> #15 0x00007fb4025f513d in vtysh_read (thread=0x7fff831b5fa0) at lib/vty.c:2373
> #16 0x00007fb4025e9611 in event_call (thread=0x7fff831b5fa0) at lib/event.c:2011
> #17 0x00007fb402566976 in frr_run (master=0x55fe871a14a0) at lib/libfrr.c:1212
> #18 0x000055fe857829fa in main (argc=9, argv=0x7fff831b6218) at bgpd/bgp_main.c:549
Kristof Provost [Thu, 15 Aug 2024 16:34:35 +0000 (18:34 +0200)]
zebra: fix loading kernel routs without netlink
Commit 605df8d44 zebra: Use zebra dplane for RTM link and addr broke loading of
kernel routes at startup for configurations without netlink.
It rearranged the startup sequence in zebra_ns_enable() to pass via
zebra_ns_startup_continue(), triggered through zebra_dplane_startup_stage()
calls. However, it neglected to make these calls in the non-netlink code path.
As a result zebra failed to load kernel routes at startup on platforms such
as FreeBSD.
Insert these calls so we run through all of the expected startup stages.
Nathan Bahr [Wed, 26 Jun 2024 17:41:45 +0000 (12:41 -0500)]
pimd, yang: Implement igmp static-group command
This will add a static IGMP group that does not rely on an underlying
socket join which sends traffic to the cpu unneccesarily. Instead, the
groups are joined directly without any IGMP interactions.
New command is under interfaces, 'ip igmp static-group ...'.
Added an alias for 'ip igmp join ...' to 'ip igmp join-group'.
Moved IGMP join groups to new yang list "join-group" and reused
the "static-group" list for the IGMP static groups.
Donald Sharp [Wed, 14 Aug 2024 14:18:41 +0000 (10:18 -0400)]
tests: Fix route_scale startup issues
Upstream CI is frequently running into a situation where
the routes are not being installed. These routes
start at the beginning and suddenly in the middle
they start working properly.
D 1.0.15.183/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.184/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.185/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D>* 1.0.15.186/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.187/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.188/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
Turning on some debugs showed that the failed installed routes are
trying to be matched against the default route. Thus implying
all the connected routes for the test are not yet successfully
installed. Let's modify the test(s) on startup to just ensure
that the connected routes are installed correctly. I am no
longer seeing the problem after this change.
The crash analysis indicates a memory item has been freed.
> #6 0x000076066a629c15 in mt_count_free (mt=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0)
> at lib/memory.c:73
> #7 mt_count_free (ptr=0x60200038b4f0, mt=0x56b57be85e00 <MTYPE_BGP_NAME>) at lib/memory.c:69
> #8 qfree (mt=mt@entry=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0) at lib/memory.c:129
> #9 0x000056b57bb09ce9 in bgp_free (bgp=<optimized out>) at bgpd/bgpd.c:4120
> #10 0x000056b57bb0aa73 in bgp_unlock (bgp=<optimized out>) at ./bgpd/bgpd.h:2513
> #11 peer_free (peer=0x62a000000200) at bgpd/bgpd.c:1313
> #12 0x000056b57bb0aca8 in peer_unlock_with_caller (name=<optimized out>, peer=<optimized out>)
> at bgpd/bgpd.c:1344
> #13 0x000076066a6dbb2c in event_call (thread=thread@entry=0x7ffc8cae1d60) at lib/event.c:2011
> #14 0x000076066a60aa88 in frr_run (master=0x613000000040) at lib/libfrr.c:1214
> #15 0x000056b57b8b2c44 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:543
Actually, the BGP_NAME item has not been used at allocation for
static->prd_pretty, and this results in reaching 0 quicker at bgp
deletion.
Fix this by reassigning MTYPE_BGP_NAME to prd_pretty.
Fixes: 16600df2c4f4 ("bgpd: fix show run of network route-distinguisher") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Mark Stapp [Wed, 14 Aug 2024 12:37:00 +0000 (08:37 -0400)]
tests: add retries to nhg tests in all_proto_startup
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Donatas Abraitis [Wed, 14 Aug 2024 07:16:01 +0000 (10:16 +0300)]
bgpd: Avoid use-after-free when doing `no router bgp` with auto created instances
```
==1145965==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030007159c0 at pc 0x55ade8d962d1 bp 0x7ffec4ce74c0 sp 0x7ffec4ce74b0
READ of size 8 at 0x6030007159c0 thread T0
0 0x55ade8d962d0 in no_router_bgp bgpd/bgp_vty.c:1701
1 0x7efe5aed19ed in cmd_execute_command_real lib/command.c:1002
2 0x7efe5aed1da3 in cmd_execute_command lib/command.c:1061
3 0x7efe5aed2303 in cmd_execute lib/command.c:1227
4 0x7efe5af6c023 in vty_command lib/vty.c:616
5 0x7efe5af6d2d2 in vty_execute lib/vty.c:1379
6 0x7efe5af77df2 in vtysh_read lib/vty.c:2374
7 0x7efe5af64c9b in event_call lib/event.c:1996
8 0x7efe5af03887 in frr_run lib/libfrr.c:1232
9 0x55ade8cd9850 in main bgpd/bgp_main.c:555
10 0x7efe5aa29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
11 0x7efe5aa29e3f in __libc_start_main_impl ../csu/libc-start.c:392
12 0x55ade8cdc314 in _start (/usr/lib/frr/bgpd+0x16f314)
```
Donald Sharp [Tue, 13 Aug 2024 12:58:29 +0000 (08:58 -0400)]
doc: Add doc to show sysctl setting for Sanitizers
In order to run the XXXX Sanitizers over the code as a developer
modern linux distro's require a specific sysctl. Let's document
that so that people are aware of it.
Donald Sharp [Sat, 10 Aug 2024 23:43:08 +0000 (19:43 -0400)]
zebra: Ensure non-equal id's are not same nhg's
The function zebra_nhg_hash_equal is only used
as a hash function for storage of NHG's and retrieval.
If you have say two nhg's:
31 (25/26)
32 (25/26)
This function would return them as being equal. Which
of course leads to the problem when you attempt to
hash_release 32 but release 31 from the hash. Then later
when you attempt to do hash comparisons 32 has actually
been freed leaving to use after free situations and shit
goes down hill fast.
This hash is only used as part of the hash comparison
function for nexthop group storage. Since this is so
let's always return the 31/32 nhg's are not equal at all.
We possibly have a different problem where we are creating
31 and 32 ( when 31 should have just been used instead of 32 )
but we need to prevent any type of hash release problem at all.
This supercedes any other issue( that should be tracked down
on it's own ). Since you can have use after free situation
that leads to a crash -vs- some possible nexthop group duplication
which is very minor in comparison.