Donald Sharp [Tue, 4 Oct 2022 19:41:36 +0000 (15:41 -0400)]
zebra: Add ctx to netlink message parsing
Add the initial step of passing in a dplane context
to reading route netlink messages. This code
will be run in two contexts:
a) The normal pthread for reading netlink messages from
the kernel
b) The dplane_fpm_nl pthread.
The goal of this commit is too just allow a) to work
b) will be filled in in the future. Effectively
everything should still be working as it should
pre this change. We will just possibly allow
the passing of the context around( but not used )
Donald Sharp [Mon, 3 Oct 2022 17:22:22 +0000 (13:22 -0400)]
zebra: Rearrange dplane_ctx_route_init
In order for a future commit to abstract the dplane_ctx_route_init
so that the kernel can use it, let's move some stuff around
and add a dplane_ctx_route_init_basic that can be used by multiple
different paths
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
create a dplane_ctx_route_init_basic so it can be used
Volta submitted notification changes for the dplane that had a
special use case for their system. Volta is no more, the code
is not being actively developed and from talking with ex-Volta
employees there is no current plans to even maintain this code.
Wrap the special handling of nexthops that their asic-dataplane
did in a bit of code to isolate it and allow for future removal,
as that I do not actually believe anyone else is using this code.
Add a CPP_NOTICE several years into the future that will tell us
to remove the code. If someone starts using it then they will
have to notice this variable to set it and hopefully they will
see my CPP_NOTICE to come talk to us. If this is being used then
we can just remove this wrapper.
In case of BGP unnumbered, BGP fails to free the nexthop
node for peer if the interface is shutdown before
unconfiguring/deleting the BGP neighbor.
This is because, when the interface is shutdown,
peer's LL neighbor address will be cleared. Therefore,
during neighbor deletion, since the peer's neighbor
address is not available, BGP will skip freeing the
nexthop node of this peer. This results in a stale
nexthop node that points to a peer that's already
been freed.
bgpd: Free memory allocated by info_make() when hitting maximum-prefix
```
Direct leak of 112 byte(s) in 1 object(s) allocated from:
0 0x7feb66337a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
1 0x7feb660cbcc3 in qcalloc lib/memory.c:116
2 0x55cc3cba02d1 in info_make bgpd/bgp_route.c:3831
3 0x55cc3cbab4f1 in bgp_update bgpd/bgp_route.c:4733
4 0x55cc3cbb0620 in bgp_nlri_parse_ip bgpd/bgp_route.c:6111
5 0x55cc3cb79473 in bgp_update_receive bgpd/bgp_packet.c:2020
6 0x55cc3cb7c34a in bgp_process_packet bgpd/bgp_packet.c:2929
7 0x7feb6610ecc5 in thread_call lib/thread.c:2006
8 0x7feb660bfb77 in frr_run lib/libfrr.c:1198
9 0x55cc3cb17232 in main bgpd/bgp_main.c:520
10 0x7feb65ae5082 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: 112 byte(s) leaked in 1 allocation(s).
```
Trey Aspelund [Wed, 7 Dec 2022 19:57:09 +0000 (14:57 -0500)]
doc: add FRR/Linux configuration examples for EVPN
The existing EVPN documentation in bgp.rst does not provide a holistic
configuration, just examples of individual features, and doesn't give
an operator any idea of what a compatible Linux netdev configuration
might look like. This introduces evpn.rst which includes a sample
frr.conf and corresponding Linux interface config (via iproute2) that
an operator can use to setup a basic EVPN topology and model their
interface manager's config from.
This initial version of evpn.rst shows Linux netdev config for
traditional bridges (vlan_filtering=0) and traditional vxlan devices
(single VNI). Later changes to this file will cover the use of
VLAN-aware bridges (vlan_filtering=1), single VXLAN devices
(multi VNI), and eventually bonds (for EVPN-MH).
Eventually the plan is to move the existing EVPN content from bgp.rst
into evpn.rst, but for now let's get some user-facing documentation in
place for interface configs.
David Lamparter [Fri, 2 Dec 2022 10:39:56 +0000 (11:39 +0100)]
doc: introduce FRR community "accords"
The idea here is to pass "non-code agreements" through the PR review
mechanism, and have them visible in the git tree.
Two "example" (but real) accords are included, mostly to illustrate the
idea. Both of these should be non-controversial and have had some
previous discussion in random places.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donald Sharp [Wed, 7 Dec 2022 19:06:12 +0000 (14:06 -0500)]
bgpd: Prevent crash in evpn when using default vrf
The default vrf in bgp when created, ends up having the
bgp->name as NULL. This of course crashes the ilk
of `json_object_string_add` when a NULL is passed in.
Go through all the places in bgp_evpn_mh.c and fix
so that it doesn't crash.
Rafael Zalamena [Wed, 7 Dec 2022 14:48:47 +0000 (11:48 -0300)]
pimd: fix MSDP crash on unexpected TLV sizes
Increase the MSDP peer stream buffer size to handle the whole TLV
(maximum is 65KiB due to 16bit field). If the stream is not resized
there will be a crash in the read function attempting to put more than
9192 (`PIM_MSDP_SA_TLV_MAX_SIZE`) bytes.
According to the RFC 3618 Section 12 we should accept the TLV and we
should not reset the peer connection.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
There were concerns that ensuring zebra stopped last led to
problems with zebra's "-r" flag, so we'll revert that for the
time being and reconsider this area.
Donald Sharp [Mon, 5 Dec 2022 15:10:36 +0000 (10:10 -0500)]
bgpd: Change fsm to use an enum for passing state
The BGP fsm uses return codes to pass event success/fail
as well as some extra data to the bgp_event_update function.
Convert this to use a enum instead of an int to track the
changes.
Donald Sharp [Fri, 2 Dec 2022 17:59:30 +0000 (12:59 -0500)]
bgpd: When creating peer convey if it is a CONFIG_NODE or not
When actually creating a peer in BGP, tell the creation if
it is a config node or not. There were cases where the
CONFIG_NODE was being set *after* being placed into
the bgp->peerhash, thus causing collisions between the
doppelganger and the peer and eventually use after free's.
Donald Sharp [Fri, 2 Dec 2022 17:51:34 +0000 (12:51 -0500)]
bgpd: Hash release before we change the underlying hash assumptions
The bgp->peerhash is made up of the sockunion and the CONFIG_NODE
flag. If the CONFIG_NODE flag is moved around or changed then
we get into a situation where both the doppelganger and the peer
actually hash to the exact same thing. Leading to wrongful deletion
and pointers being used after freed.
Donald Sharp [Wed, 30 Nov 2022 16:49:51 +0000 (11:49 -0500)]
bgpd: Peer events should be cleaned up on shutdown
Currently bgp does not stop any events that are on the thread
system for execution on peer deletion. This is not good.
Stop those events and prevent use after free's.
Donald Sharp [Wed, 30 Nov 2022 15:41:54 +0000 (10:41 -0500)]
bgpd: When copying from src to dest do not overwrite the CONFIG_NODE
When the decision has been made to copy a peer configuration from
a peer to another peer because one is taking over. Do not automatically
set the CONFIG_NODE flag. Instead we need to handle that appropriately
when the final decision is made to transfer.
Donald Sharp [Mon, 28 Nov 2022 20:16:15 +0000 (15:16 -0500)]
bgpd: Prevent use after free of peer structure
When changing the peers sockunion structure the bgp->peer
list was not being updated properly. Since the peer's su
is being used for a sorted insert then the change of it requires
that the value be pulled out of the bgp->peer list and then
put back into as well.
Additionally ensure that the hash is always released on peer
deletion.
Lead to this from this decode in a address sanitizer run.
=================================================================
==30778==ERROR: AddressSanitizer: heap-use-after-free on address 0x62a0000d8440 at pc 0x7f48c9c5c547 bp 0x7ffcba272cb0 sp 0x7ffcba272ca8
READ of size 2 at 0x62a0000d8440 thread T0
#0 0x7f48c9c5c546 in sockunion_same lib/sockunion.c:425
#1 0x55cfefe3000f in peer_hash_same bgpd/bgpd.c:890
#2 0x7f48c9bde039 in hash_release lib/hash.c:209
#3 0x55cfefe3373f in bgp_peer_conf_if_to_su_update bgpd/bgpd.c:1541
#4 0x55cfefd0be7a in bgp_stop bgpd/bgp_fsm.c:1631
#5 0x55cfefe4028f in peer_delete bgpd/bgpd.c:2362
#6 0x55cfefdd5e97 in no_neighbor_interface_config bgpd/bgp_vty.c:4267
#7 0x7f48c9b9d160 in cmd_execute_command_real lib/command.c:949
#8 0x7f48c9ba1112 in cmd_execute_command lib/command.c:1009
#9 0x7f48c9ba1573 in cmd_execute lib/command.c:1162
#10 0x7f48c9c87402 in vty_command lib/vty.c:526
#11 0x7f48c9c87832 in vty_execute lib/vty.c:1291
#12 0x7f48c9c8e741 in vtysh_read lib/vty.c:2130
#13 0x7f48c9c7a66d in thread_call lib/thread.c:1585
#14 0x7f48c9bf64e7 in frr_run lib/libfrr.c:1123
#15 0x55cfefc75a15 in main bgpd/bgp_main.c:540
#16 0x7f48c96b009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
#17 0x55cfefc787f9 in _start (/usr/lib/frr/bgpd+0xe27f9)
0x62a0000d8440 is located 576 bytes inside of 23376-byte region [0x62a0000d8200,0x62a0000ddd50)
freed by thread T0 here:
#0 0x7f48c9eb9fb0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fb0)
#1 0x55cfefe3fe42 in peer_free bgpd/bgpd.c:1113
#2 0x55cfefe3fe42 in peer_unlock_with_caller bgpd/bgpd.c:1144
#3 0x55cfefe4092e in peer_delete bgpd/bgpd.c:2457
#4 0x55cfefdd5e97 in no_neighbor_interface_config bgpd/bgp_vty.c:4267
#5 0x7f48c9b9d160 in cmd_execute_command_real lib/command.c:949
#6 0x7f48c9ba1112 in cmd_execute_command lib/command.c:1009
#7 0x7f48c9ba1573 in cmd_execute lib/command.c:1162
#8 0x7f48c9c87402 in vty_command lib/vty.c:526
#9 0x7f48c9c87832 in vty_execute lib/vty.c:1291
#10 0x7f48c9c8e741 in vtysh_read lib/vty.c:2130
#11 0x7f48c9c7a66d in thread_call lib/thread.c:1585
#12 0x7f48c9bf64e7 in frr_run lib/libfrr.c:1123
#13 0x55cfefc75a15 in main bgpd/bgp_main.c:540
#14 0x7f48c96b009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
Donald Sharp [Thu, 1 Dec 2022 18:12:40 +0000 (13:12 -0500)]
bgpd: Fix several use after free's in bgp for the peer
Three fixes:
a) When calling bgp_fsm_change_status with `Deleted` do
not add a new event to the peer's t_event because
we are already in the process of deleting everything
b) When bgp_stop decides to delete a peer return a notification
that it is happening to bgp_event_update so that it does not
set the peer state back to idle or do other processing.
c) bgp_event_update can cause a peer deletion, because
the peer can be deleted in the fsm function but the peer
data structure is still pointed to and used after words.
So lock the peer before entering and prevent a use after
free.
Donald Sharp [Tue, 29 Nov 2022 14:00:39 +0000 (09:00 -0500)]
bgpd: peer creation now takes care of the su
At some point in the past the peer creation was not
properly setting the su and the code had the release
and re-add when setting the su. Since peer_create
got a bit of code to handle the su properly the
need to release then add it back in is negated
so remove the code.
Donald Sharp [Sat, 3 Dec 2022 12:14:54 +0000 (07:14 -0500)]
zebra: Cleanup use after free in shutdown
On shutdown a use after free was being seen of a route table.
Basically the pointer was kept around and resent for cleanup.
Probably something needs to be unwound to make this better
in the future. Just cleaning up the use after free.
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929-=================================================================
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929:==911929==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000127a00 at pc 0x7fb9ad546f5b bp 0x7ffc3cff0330 sp 0x7ffc3 cff0328
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929-READ of size 8 at 0x606000127a00 thread T0
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #0 0x7fb9ad546f5a in route_table_free /home/sharpd/frr8/lib/table.c:103:13
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #1 0x7fb9ad546f04 in route_table_finish /home/sharpd/frr8/lib/table.c:61:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #2 0x6b94ba in zebra_ns_disable_internal /home/sharpd/frr8/zebra/zebra_ns.c:141:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #3 0x6b9158 in zebra_ns_disabled /home/sharpd/frr8/zebra/zebra_ns.c:116:9
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #4 0x7fb9ad43f0f5 in ns_disable_internal /home/sharpd/frr8/lib/netns_linux.c:273:4
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #5 0x7fb9ad43e634 in ns_disable /home/sharpd/frr8/lib/netns_linux.c:368:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #6 0x7fb9ad43e251 in ns_delete /home/sharpd/frr8/lib/netns_linux.c:330:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #7 0x7fb9ad43fbb3 in ns_terminate /home/sharpd/frr8/lib/netns_linux.c:524:3
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #8 0x54f8de in zebra_finalize /home/sharpd/frr8/zebra/main.c:232:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #9 0x7fb9ad5655e6 in thread_call /home/sharpd/frr8/lib/thread.c:2006:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #10 0x7fb9ad3d3343 in frr_run /home/sharpd/frr8/lib/libfrr.c:1198:3
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #11 0x550b48 in main /home/sharpd/frr8/zebra/main.c:476:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #12 0x7fb9acd30d09 in __libc_start_main csu/../csu/libc-start.c:308:16
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #13 0x443549 in _start (/usr/lib/frr/zebra+0x443549)
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929-
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929-0x606000127a00 is located 0 bytes inside of 56-byte region [0x606000127a00,0x606000127a38)
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929-freed by thread T0 here:
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #0 0x4bd33d in free (/usr/lib/frr/zebra+0x4bd33d)
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #1 0x7fb9ad42cc80 in qfree /home/sharpd/frr8/lib/memory.c:141:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #2 0x7fb9ad547305 in route_table_free /home/sharpd/frr8/lib/table.c:141:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #3 0x7fb9ad546f04 in route_table_finish /home/sharpd/frr8/lib/table.c:61:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #4 0x6b94ba in zebra_ns_disable_internal /home/sharpd/frr8/zebra/zebra_ns.c:141:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #5 0x6b9692 in zebra_ns_early_shutdown /home/sharpd/frr8/zebra/zebra_ns.c:164:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #6 0x7fb9ad43f228 in ns_walk_func /home/sharpd/frr8/lib/netns_linux.c:386:9
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #7 0x55014f in sigint /home/sharpd/frr8/zebra/main.c:194:2
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #8 0x7fb9ad50db99 in frr_sigevent_process /home/sharpd/frr8/lib/sigevent.c:130:6
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #9 0x7fb9ad560d07 in thread_fetch /home/sharpd/frr8/lib/thread.c:1775:4
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #10 0x7fb9ad3d332d in frr_run /home/sharpd/frr8/lib/libfrr.c:1197:9
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #11 0x550b48 in main /home/sharpd/frr8/zebra/main.c:476:2
--
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929- #7 0x7fb9acd30d09 in __libc_start_main csu/../csu/libc-start.c:308:16
./bfd_vrf_topo1.test_bfd_vrf_topo1/r2.zebra.asan.911929-
Donald Sharp [Mon, 5 Dec 2022 13:26:01 +0000 (08:26 -0500)]
vtysh: free memory given to us by readline
The rl_callback_handler_install function manual says this:
Set up the terminal for Readline I/O and display the initial expanded value of prompt.
Save the value of lhandler to use as a handler function to call when a complete line
of input has been entered. The handler function receives the text of the line as an
argument. As with readline(), the handler function should free the line when it is
finished with it.
Adding a free removes this memory leak that I am seeing with address sanitizer enabled;
SUMMARY: AddressSanitizer: 99 byte(s) leaked in 5 allocation(s).:
2022-12-05 07:50:57,231 INFO: topolog.r7: vtysh result:
Hello, this is FRRouting (version 8.5-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
Direct leak of 99 byte(s) in 5 object(s) allocated from:
#0 0x49cadd in malloc (/usr/bin/vtysh+0x49cadd)
#1 0x7fc57135d8e8 in xmalloc build/shlib/./xmalloc.c:59:10
SUMMARY: AddressSanitizer: 99 byte(s) leaked in 5 allocation(s).
The `behavior usid` command is installed under the SRv6 Locator node in
the zebra VTY. However, in the SRv6 config write function this command
is wrongly put on the same line as the `prefix X:X::X:X/M` command.
This causes a failure when an SRv6 uSID locator is configured in zebra
and `frr-reload.py` is used to reload the FRR configuration.
This commit prepends a newline character to the `behavior usid` command
in the SRv6 config write function. The output of `show running-config`
before and after this commit is shown below.
Before:
```
Building configuration...
Current configuration:
!
frr version 8.5-dev
!
segment-routing
srv6
locators
locator loc1
prefix fc00:0:1::/48 block-len 32 node-len 16 behavior usid
exit
!
exit
!
exit
!
exit
!
end
```
Ryoga Saito [Sun, 4 Dec 2022 07:51:24 +0000 (16:51 +0900)]
tests: Fix topotests for bgp_srv6l3vpn
In bgp_srv6l3vpn tests, check_ping checks reachability. However, this
function have a bug and if we set expect_connected to True, check will
pass even if all ping packets are lost. This commit fixes this issue.