Mark Stapp [Wed, 14 Aug 2024 12:37:00 +0000 (08:37 -0400)]
tests: add retries to nhg tests in all_proto_startup
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Donatas Abraitis [Wed, 14 Aug 2024 07:16:01 +0000 (10:16 +0300)]
bgpd: Avoid use-after-free when doing `no router bgp` with auto created instances
```
==1145965==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030007159c0 at pc 0x55ade8d962d1 bp 0x7ffec4ce74c0 sp 0x7ffec4ce74b0
READ of size 8 at 0x6030007159c0 thread T0
0 0x55ade8d962d0 in no_router_bgp bgpd/bgp_vty.c:1701
1 0x7efe5aed19ed in cmd_execute_command_real lib/command.c:1002
2 0x7efe5aed1da3 in cmd_execute_command lib/command.c:1061
3 0x7efe5aed2303 in cmd_execute lib/command.c:1227
4 0x7efe5af6c023 in vty_command lib/vty.c:616
5 0x7efe5af6d2d2 in vty_execute lib/vty.c:1379
6 0x7efe5af77df2 in vtysh_read lib/vty.c:2374
7 0x7efe5af64c9b in event_call lib/event.c:1996
8 0x7efe5af03887 in frr_run lib/libfrr.c:1232
9 0x55ade8cd9850 in main bgpd/bgp_main.c:555
10 0x7efe5aa29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
11 0x7efe5aa29e3f in __libc_start_main_impl ../csu/libc-start.c:392
12 0x55ade8cdc314 in _start (/usr/lib/frr/bgpd+0x16f314)
```
Donald Sharp [Tue, 13 Aug 2024 12:58:29 +0000 (08:58 -0400)]
doc: Add doc to show sysctl setting for Sanitizers
In order to run the XXXX Sanitizers over the code as a developer
modern linux distro's require a specific sysctl. Let's document
that so that people are aware of it.
Donald Sharp [Sat, 10 Aug 2024 23:43:08 +0000 (19:43 -0400)]
zebra: Ensure non-equal id's are not same nhg's
The function zebra_nhg_hash_equal is only used
as a hash function for storage of NHG's and retrieval.
If you have say two nhg's:
31 (25/26)
32 (25/26)
This function would return them as being equal. Which
of course leads to the problem when you attempt to
hash_release 32 but release 31 from the hash. Then later
when you attempt to do hash comparisons 32 has actually
been freed leaving to use after free situations and shit
goes down hill fast.
This hash is only used as part of the hash comparison
function for nexthop group storage. Since this is so
let's always return the 31/32 nhg's are not equal at all.
We possibly have a different problem where we are creating
31 and 32 ( when 31 should have just been used instead of 32 )
but we need to prevent any type of hash release problem at all.
This supercedes any other issue( that should be tracked down
on it's own ). Since you can have use after free situation
that leads to a crash -vs- some possible nexthop group duplication
which is very minor in comparison.
Mark Stapp [Fri, 9 Aug 2024 14:08:21 +0000 (10:08 -0400)]
isisd: fix memory handling in isis_adj_process_threeway()
The adj_process_threeway() api may call the adj_state_change()
api, which may delete the adj struct being examined. Change the
signature so that callers pass a ptr-to-ptr so that they will
see that deletion.
Donald Sharp [Thu, 8 Aug 2024 18:58:04 +0000 (14:58 -0400)]
lib: Don't print warning if not a daemon
vtysh will print out the `stupidly large FD limit` upon
every run of the program if the ulimit is set stupidly
large. Prevent this from being displayed for vtysh.
Fixes: #16516 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
tools: Fix python string escape warnings for frr-reload.py
When using a regex (or anything that uses `\?` escapes) in python, raw
strings (`r"content"`) should be used so python doesn't consume the
escapes itself. Otherwise we get either broken behavior and/or
`SyntaxWarning: invalid escape sequence '\['`
Fixes: 8916953b534f64a7545860ad5b4b36dc2544f33a ("build: fix a few python string escape warnings") Fixes: https://github.com/FRRouting/frr/issues/16522 Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Igor Ryzhov [Wed, 7 Aug 2024 22:25:02 +0000 (01:25 +0300)]
ripd: fix show run output for distribute-list
CLI show callbacks should be defined in frr_ripd_cli_info instead of
frr_ripd_info, because only the former is loaded by mgmtd and only its
callbacks are getting called for config output.
Igor Ryzhov [Wed, 7 Aug 2024 21:40:51 +0000 (00:40 +0300)]
mgmtd: don't add implicit state data when reading config from file
When mgmt reads configuration from file, it shouldn't add implicit state
data to the candidate datastore. Configuration datastores like candidate
should never store state, otherwise they fail validation.
Christian Hopps [Wed, 7 Aug 2024 13:27:57 +0000 (09:27 -0400)]
tests: wait for test client to connect before running test
Vtysh has been improved to startup very quickly this exposed a race in this
test, where the `clear ip rip...` command ran before the test client that
handles it had finished connecting to mgmtd. Add a retried check for the test
client being connected before issuing the `clear ip rip ...` test command.
Donald Sharp [Wed, 31 Jul 2024 14:37:59 +0000 (10:37 -0400)]
tests: Shorten reconnect timer when something goes wrong
When running bfd_bgp_cbit_topo3 and an intial connection
goes wrong, try to connect again as fast as possible as
that the timer is 2 minutes otherwise and the test will
never come back from it.
Donald Sharp [Wed, 31 Jul 2024 14:45:38 +0000 (10:45 -0400)]
tests: Increase route_scale timeouts
This test is frequently failing in the upstream CI. Most
log failures are stating that we expected something like
1 million routes but we have 900k+. Looks like the system
is just loaded a bit more than expected. Let's give these
tests a bit more time to complete.
David Lamparter [Sun, 21 Jul 2024 01:28:35 +0000 (18:28 -0700)]
lib/clippy: add `CMD_ELEMENT_TKN`
The command graph has its tail end nodes pointing at the
`struct cmd_element` rather than a `struct cmd_token`. This is a bit
weird to begin with, but becomes very annoying for the python bindings
where there is just no `struct cmd_element`.
Create a `CMD_ELEMENT_TKN` type for `cmd_token` instead, and replace the
tail end token in the python bindings with an instance of that.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Sun, 21 Jul 2024 01:29:51 +0000 (18:29 -0700)]
lib/clippy: improve graph node member access
Expose all of the struct members of cmd_token, and retrieve them
dynamically rather than copying them around. The problem with copying
them is that they can change as a result of merge(), and if there is an
existing wrapper object around it will not have its copy updated to
match.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgpd: Use bgp_session_reset_safe() for GR update all peers
It might cause this use-after-free:
```
==6523==ERROR: AddressSanitizer: heap-use-after-free on address 0x60300058d720 at pc 0x55f3ab62ab1f bp 0x7ffe5b95a0d0 sp 0x7ffe5b95a0c8
READ of size 8 at 0x60300058d720 thread T0
#0 0x55f3ab62ab1e in bgp_gr_update_mode_of_all_peers bgpd/bgp_fsm.c:2729
#1 0x55f3ab62ab1e in bgp_gr_update_all bgpd/bgp_fsm.c:2779
#2 0x55f3ab73557e in bgp_inst_gr_config_vty bgpd/bgp_vty.c:3037
#3 0x55f3ab74db69 in bgp_graceful_restart bgpd/bgp_vty.c:3130
#4 0x7fc5539a9584 in cmd_execute_command_real lib/command.c:1002
#5 0x7fc5539a98a3 in cmd_execute_command lib/command.c:1061
#6 0x7fc5539a9dcf in cmd_execute lib/command.c:1227
#7 0x7fc553ae493f in vty_command lib/vty.c:616
#8 0x7fc553ae4e92 in vty_execute lib/vty.c:1379
#9 0x7fc553aedd34 in vtysh_read lib/vty.c:2374
#10 0x7fc553ad8a64 in event_call lib/event.c:1995
#11 0x7fc553a0c429 in frr_run lib/libfrr.c:1232
#12 0x55f3ab57b78d in main bgpd/bgp_main.c:555
#13 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#14 0x7fc55342d304 in __libc_start_main_impl ../csu/libc-start.c:360
#15 0x55f3ab5799a0 in _start (/usr/lib/frr/bgpd+0x2e19a0)
0x60300058d720 is located 16 bytes inside of 24-byte region [0x60300058d710,0x60300058d728)
freed by thread T0 here:
#0 0x7fc553eb76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x7fc553a2b713 in qfree lib/memory.c:130
#2 0x7fc553a0e50d in listnode_free lib/linklist.c:81
#3 0x7fc553a0e50d in list_delete_node lib/linklist.c:379
#4 0x55f3ab7ae353 in peer_delete bgpd/bgpd.c:2796
#5 0x55f3ab7ae91f in bgp_session_reset bgpd/bgpd.c:141
#6 0x55f3ab62ab17 in bgp_gr_update_mode_of_all_peers bgpd/bgp_fsm.c:2752
#7 0x55f3ab62ab17 in bgp_gr_update_all bgpd/bgp_fsm.c:2779
#8 0x55f3ab73557e in bgp_inst_gr_config_vty bgpd/bgp_vty.c:3037
#9 0x55f3ab74db69 in bgp_graceful_restart bgpd/bgp_vty.c:3130
#10 0x7fc5539a9584 in cmd_execute_command_real lib/command.c:1002
#11 0x7fc5539a98a3 in cmd_execute_command lib/command.c:1061
#12 0x7fc5539a9dcf in cmd_execute lib/command.c:1227
#13 0x7fc553ae493f in vty_command lib/vty.c:616
#14 0x7fc553ae4e92 in vty_execute lib/vty.c:1379
#15 0x7fc553aedd34 in vtysh_read lib/vty.c:2374
#16 0x7fc553ad8a64 in event_call lib/event.c:1995
#17 0x7fc553a0c429 in frr_run lib/libfrr.c:1232
#18 0x55f3ab57b78d in main bgpd/bgp_main.c:555
#19 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
previously allocated by thread T0 here:
#0 0x7fc553eb83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
#1 0x7fc553a2ae20 in qcalloc lib/memory.c:105
#2 0x7fc553a0d056 in listnode_new lib/linklist.c:71
#3 0x7fc553a0d85b in listnode_add_sort lib/linklist.c:197
#4 0x55f3ab7baec4 in peer_create bgpd/bgpd.c:1996
#5 0x55f3ab65be8b in bgp_accept bgpd/bgp_network.c:604
#6 0x7fc553ad8a64 in event_call lib/event.c:1995
#7 0x7fc553a0c429 in frr_run lib/libfrr.c:1232
#8 0x55f3ab57b78d in main bgpd/bgp_main.c:555
#9 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
```
Donald Sharp [Tue, 30 Jul 2024 11:21:58 +0000 (07:21 -0400)]
zebra: Fix removal of routes on MetaQ when client goes down
It is possible that right before an upper level protocol dies
or is killed routes would be installed into zebra. These routes
could be on the Meta-Q for early route-processing. Leaving us with
a situation where the client is removed, and all it's routes that are
in the rib at that time, and then after that the MetaQ is run and the
routes are reprocessed leaving routes from an upper level daemon
post daemon going away from zebra's perspective. These routes will
be abandoned.