Louis Scalbert [Wed, 6 Jul 2022 13:37:44 +0000 (15:37 +0200)]
zebra: delete kernel routes using an interface with no more IPv4 address
When the last IPv4 address of an interface is deleted, Linux removes
all routes using this interface without any Netlink advertisement.
Routes that have a IPv4 nexthop are correctly removed from the FRR RIB.
However, routes that only have an interface with no more IPv4 addresses
as a nexthop remains in the FRR RIB.
In this situation, among the routes that this particular interface
nexthop:
- remove from the zebra kernel routes
- reinstall the routes that have been added from FRR. It is useful when
the nexthop is for example a VRF interface.
Add related test cases in the zebra_netlink topotest.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Fri, 29 Apr 2022 17:41:57 +0000 (19:41 +0200)]
bgpd: move mp_nexthop_prefer_global boolean attribute to nh_flag
Previous commits have introduced a new 8 bits nh_flag in the attr
struct that has increased the memory footprint.
Move the mp_nexthop_prefer_global boolean in the attr structure that
takes 8 bits to the new nh_flag in order to go back to the previous
memory utilization.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> VRF r1-cust4:
> B 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08
When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.
The result is:
> VRF r1-cust4:
> B> 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
> * via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Tue, 26 Apr 2022 14:57:45 +0000 (16:57 +0200)]
bgpd: update route leaking when a VRF loopback is received
At bgpd startup, VRF instances are sent from zebra before the
interfaces. When importing a l3vpn prefix from another local VRF
instance, the interfaces are not known yet. The prefix nexthop interface
cannot be set to the loopback or the VRF interface, which causes setting
invalid routes in zebra.
Update route leaking when the loopback or a VRF interface is received
from zebra.
At a VRF interface deletion, zebra voluntarily sends a
ZEBRA_INTERFACE_ADD message to move it to VRF_DEFAULT. Do not update if
such a message is received. VRF destruction will destroy all the related
routes without adding codes.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Wed, 11 May 2022 15:41:36 +0000 (17:41 +0200)]
topotests: update ospf_multi_vrf_bgp_route_leak
Leaked connected routes have now the following nexthop interfaces:
- lo for routes imported from the default VRF
- or the VRF interface for routes imported from the other VRFs.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Mon, 25 Apr 2022 13:14:49 +0000 (15:14 +0200)]
bgpd: fix invalid nexthop interface on leaked routes
There is two cases where the nexthop interface is incorrect:
- Case 1: leaked routes from prefixes stated in 'network <prefix>' are
inactive because they have no nexthop IP address or interface.
- Case 2: leaked routes from 'redistribute connected' contains the
original nexthop interface.
Extract from the routing table:
> VRF r1-cust1:
> S>* 192.0.0.0/24 [1/0] via 192.168.1.2, r1-eth4, weight 1, 00:47:53
> C>* 192.168.1.0/24 is directly connected, r1-eth4, 00:44:15
> S>* 29.0.0.0/24 [1/0] is directly connected, r1-cust5 (vrf r1-cust5), weight 1, 00:00:30
>
> VRF r1-cust4:
> B 10.2.3.4/32 [20/0] is directly connected, unknown (vrf r1-cust1) inactive, weight 1, 00:00:02
> C>* 29.0.0.0/24 is directly connected, r1-cust5, 00:27:40
> B 192.0.0.0/24 [20/0] is directly connected, unknown (vrf r1-cust1) inactive, weight 1, 00:03:40
> B>* 192.168.1.0/24 [20/0] is directly connected, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
The nexthop interface is r1-eth4. It causes issue to traffic leaving
r1-cust4. The following ping to r1-eth4 local address 192.168.1.1 from
r1-cust5 local add does
not respond.
> # tcpdump -lnni r1-cust1 'icmp' &
> # ip vrf exec r1-cust4 ping -c1 192.168.1.1 -I 29.0.0.1
> PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
PING 192.168.1.1 (192.168.1.1) from 29.0.0.1 : 56(84) bytes of data.
18:49:20.635638 IP 29.0.0.1 > 192.168.1.1: ICMP echo request, id 15897, seq 1, length 64
18:49:27.113827 IP 29.0.0.1 > 29.0.0.1: ICMP host 192.168.1.1 unreachable, length 92
Fix description:
When leaking prefix from other VRFs, if the nexthop IP address is not
set in the bgp path info attribures, reset nh_ifindex to the index of
master interface of the incoming BGP instance.
The previous commit has added a routing leak update when a nexthop
update is received from zebra. It indirectly calls
bgp_find_or_add_nexthop() in which a static route triggers a nexthop
cache entry registration that triggers a nexthop update from zebra.
Do not register again the nexthop cache entry if the BGP_STATIC_ROUTE is
already set.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Fri, 29 Apr 2022 12:26:04 +0000 (14:26 +0200)]
bgpd: fix prefix VRF leaking with 'network import-check' (4/5)
If 'network import-check' is defined on the source BGP session, prefixes
that are stated in the network command cannot be leaked to the other
VRFs BGP table even if they are present in the origin VRF RIB if the
'rt import' statement is defined after the 'network <prefix>' ones.
When a prefix nexthop is updated, update the prefix route leaking. The
current state of nexthop validation is now stored in the attributes of
the bgp path info. Attributes are compared with the previous ones at
route leaking update so that a nexthop validation change now triggers
the update of destination VRF BGP table.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Thu, 28 Apr 2022 16:32:20 +0000 (18:32 +0200)]
bgpd: fix prefix VRF leaking with 'network import-check' (1/5)
If 'network import-check' is defined on the source BGP session, prefixes
that are stated in the network command cannot be leaked to the other
VRFs BGP table even if they are present in the origin VRF RIB.
Always validate the nexthop of BGP static routes (i.e. defined with the
network statement) if 'network import-check' is defined on the source
BGP session and the prefix is present in source RIB.
It fixes the issue when the 'rt import' statement is defined after the
'network' ones.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Thu, 28 Apr 2022 15:01:35 +0000 (17:01 +0200)]
bgpd: fix prefix VRF leaking with 'no network import-check'
Prefixes that are stated in the network command cannot be leaked to
the other VRFs BGP table whether or not they are present in the origin
VRF RIB.
Always validate the nexthop of BGP static routes (i.e. defined with the
network statement) if 'no network import-check' is defined on the source
BGP session.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Trey Aspelund [Thu, 15 Dec 2022 18:49:43 +0000 (18:49 +0000)]
bgpd: cleanup macip_path_list for remote macip
ASAN reported the following memleak:
```
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x4d4342 in calloc (/usr/lib/frr/bgpd+0x4d4342)
#1 0xbc3d68 in qcalloc /home/sharpd/frr8/lib/memory.c:116:27
#2 0xb869f7 in list_new /home/sharpd/frr8/lib/linklist.c:64:9
#3 0x5a38bc in bgp_evpn_remote_ip_hash_alloc /home/sharpd/frr8/bgpd/bgp_evpn.c:6789:24
#4 0xb358d3 in hash_get /home/sharpd/frr8/lib/hash.c:162:13
#5 0x593d39 in bgp_evpn_remote_ip_hash_add /home/sharpd/frr8/bgpd/bgp_evpn.c:6881:7
#6 0x59dbbd in install_evpn_route_entry_in_vni_common /home/sharpd/frr8/bgpd/bgp_evpn.c:3049:2
#7 0x59cfe0 in install_evpn_route_entry_in_vni_ip /home/sharpd/frr8/bgpd/bgp_evpn.c:3126:8
#8 0x59c6f0 in install_evpn_route_entry /home/sharpd/frr8/bgpd/bgp_evpn.c:3318:8
#9 0x59bb52 in install_uninstall_route_in_vnis /home/sharpd/frr8/bgpd/bgp_evpn.c:3888:10
#10 0x59b6d2 in bgp_evpn_install_uninstall_table /home/sharpd/frr8/bgpd/bgp_evpn.c:4019:5
#11 0x578857 in install_uninstall_evpn_route /home/sharpd/frr8/bgpd/bgp_evpn.c:4051:9
#12 0x58ada6 in bgp_evpn_import_route /home/sharpd/frr8/bgpd/bgp_evpn.c:6049:9
#13 0x713794 in bgp_update /home/sharpd/frr8/bgpd/bgp_route.c:4842:3
#14 0x583fa0 in process_type2_route /home/sharpd/frr8/bgpd/bgp_evpn.c:4518:9
#15 0x5824ba in bgp_nlri_parse_evpn /home/sharpd/frr8/bgpd/bgp_evpn.c:5732:8
#16 0x6ae6a2 in bgp_nlri_parse /home/sharpd/frr8/bgpd/bgp_packet.c:363:10
#17 0x6be6fa in bgp_update_receive /home/sharpd/frr8/bgpd/bgp_packet.c:2020:15
#18 0x6b7433 in bgp_process_packet /home/sharpd/frr8/bgpd/bgp_packet.c:2929:11
#19 0xd00146 in thread_call /home/sharpd/frr8/lib/thread.c:2006:2
```
The list itself was not being cleaned up when the final list entry was
removed, so make sure we do that instead of leaking memory.
Louis Scalbert [Mon, 14 Feb 2022 13:18:10 +0000 (14:18 +0100)]
bgpd: fix the IGP metric for best path selection on VPN import
Since the commit da0c0ef70c ("bgpd: VRF-Lite fix best path selection"),
the best path selection is made from the comparison of the attributes
of the original route i.e. the ultimate path.
The IGP metric is currently set on the child path instead of the
ultimate path (i.e. the parent path). On eBGP, the ultimate path is the
child path. However, for imported routes, the ultimate path is always
set to 0, which results in skipping the IGP metric comparison when
selecting the best path.
Set the IGP metric on the ultimate path when a BGP nexthop is added or
updated.
Fixes: da0c0ef70c ("bgpd: VRF-Lite fix best path selection") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
bgpd: Add support for flowspec prefixes in bgp_packet_mpattr_prefix_size
Currently, bgp_packet_mpattr_prefix_size (bgpd/bgp_attr.c:3978) always returns zero for Flowspec prefixes.
This is because, for flowspec prefixes, the prefixlen attribute of the prefix struct is always set to 0, and the actual length is bytes is set inside the flowspec_prefix struct instead (see lib/prefix.h:293 and lib/prefix.h:178).
Because of this, with a large number of flowspec NLRIs, bgpd ends up building update messages that exceed the maximum size and cause the peer to drop the connection (bgpd/bgp_updgrp_packet.c:L719).
The proposed change allows the bgp_packet_mpattr_prefix_size to return correct result for flowspec prefixes.
Kuldeep Kashyap [Fri, 9 Dec 2022 15:21:29 +0000 (07:21 -0800)]
tests: Topotests daemon start as per feature test
Currently topotests starts all daemons by default,
made changes to f/w so only needed daemons can
be started, daemons which are needed to tests
particular test suite.
David Lamparter [Tue, 13 Dec 2022 17:34:41 +0000 (18:34 +0100)]
accords: CLI coloring
This is pretty much a writeup of the discussion we had on the FRR weekly
call about an hour ago. I added a bunch of detail but hopefully it
still represents the overall consensus.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Rafael Zalamena [Mon, 12 Dec 2022 18:11:27 +0000 (15:11 -0300)]
ospfd: fix memory leak on SPF calculation
Fix the following problems:
- Always free vertex next hops on `vertex_parent_free`
- Signalize failure on `ospf_spf_add_parent` when parent already exists
so the caller has the chance to `free()` any allocated resources.
- Don't reuse vertex next hops without the reference count logic in
`ospf_nexthop_calculation`. Instead allocate a new copy so it can be
`free()`d later without complications
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Donald Sharp [Tue, 4 Oct 2022 19:41:36 +0000 (15:41 -0400)]
zebra: Add ctx to netlink message parsing
Add the initial step of passing in a dplane context
to reading route netlink messages. This code
will be run in two contexts:
a) The normal pthread for reading netlink messages from
the kernel
b) The dplane_fpm_nl pthread.
The goal of this commit is too just allow a) to work
b) will be filled in in the future. Effectively
everything should still be working as it should
pre this change. We will just possibly allow
the passing of the context around( but not used )
Donald Sharp [Mon, 3 Oct 2022 17:22:22 +0000 (13:22 -0400)]
zebra: Rearrange dplane_ctx_route_init
In order for a future commit to abstract the dplane_ctx_route_init
so that the kernel can use it, let's move some stuff around
and add a dplane_ctx_route_init_basic that can be used by multiple
different paths
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
create a dplane_ctx_route_init_basic so it can be used
Volta submitted notification changes for the dplane that had a
special use case for their system. Volta is no more, the code
is not being actively developed and from talking with ex-Volta
employees there is no current plans to even maintain this code.
Wrap the special handling of nexthops that their asic-dataplane
did in a bit of code to isolate it and allow for future removal,
as that I do not actually believe anyone else is using this code.
Add a CPP_NOTICE several years into the future that will tell us
to remove the code. If someone starts using it then they will
have to notice this variable to set it and hopefully they will
see my CPP_NOTICE to come talk to us. If this is being used then
we can just remove this wrapper.
Donatas Abraitis [Sun, 11 Dec 2022 19:00:19 +0000 (21:00 +0200)]
bgpd: Fix graceful-restart JSON outputs and the crash
Without this patch:
```
donatas-pc# show bgp neighbors graceful-restart json
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
donatas-pc#
```
And, invalid JSON generated when multiple neighbors exist due to json_neighbor
being freed in a loop.
In case of BGP unnumbered, BGP fails to free the nexthop
node for peer if the interface is shutdown before
unconfiguring/deleting the BGP neighbor.
This is because, when the interface is shutdown,
peer's LL neighbor address will be cleared. Therefore,
during neighbor deletion, since the peer's neighbor
address is not available, BGP will skip freeing the
nexthop node of this peer. This results in a stale
nexthop node that points to a peer that's already
been freed.
bgpd: Free memory allocated by info_make() when hitting maximum-prefix
```
Direct leak of 112 byte(s) in 1 object(s) allocated from:
0 0x7feb66337a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
1 0x7feb660cbcc3 in qcalloc lib/memory.c:116
2 0x55cc3cba02d1 in info_make bgpd/bgp_route.c:3831
3 0x55cc3cbab4f1 in bgp_update bgpd/bgp_route.c:4733
4 0x55cc3cbb0620 in bgp_nlri_parse_ip bgpd/bgp_route.c:6111
5 0x55cc3cb79473 in bgp_update_receive bgpd/bgp_packet.c:2020
6 0x55cc3cb7c34a in bgp_process_packet bgpd/bgp_packet.c:2929
7 0x7feb6610ecc5 in thread_call lib/thread.c:2006
8 0x7feb660bfb77 in frr_run lib/libfrr.c:1198
9 0x55cc3cb17232 in main bgpd/bgp_main.c:520
10 0x7feb65ae5082 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: 112 byte(s) leaked in 1 allocation(s).
```
Ryoga Saito [Tue, 6 Dec 2022 17:27:16 +0000 (02:27 +0900)]
tests: Add topotest bgp_vrf_leaking_5549_routes
To verify previous changes, this PR introduces topotest to verify
whether imported routes learnt from BGP unnumbered peers will be active
on VPN RIB and other VRF RIB.
Trey Aspelund [Wed, 7 Dec 2022 19:57:09 +0000 (14:57 -0500)]
doc: add FRR/Linux configuration examples for EVPN
The existing EVPN documentation in bgp.rst does not provide a holistic
configuration, just examples of individual features, and doesn't give
an operator any idea of what a compatible Linux netdev configuration
might look like. This introduces evpn.rst which includes a sample
frr.conf and corresponding Linux interface config (via iproute2) that
an operator can use to setup a basic EVPN topology and model their
interface manager's config from.
This initial version of evpn.rst shows Linux netdev config for
traditional bridges (vlan_filtering=0) and traditional vxlan devices
(single VNI). Later changes to this file will cover the use of
VLAN-aware bridges (vlan_filtering=1), single VXLAN devices
(multi VNI), and eventually bonds (for EVPN-MH).
Eventually the plan is to move the existing EVPN content from bgp.rst
into evpn.rst, but for now let's get some user-facing documentation in
place for interface configs.