Drop redundant function (duplicate), and reset counters for r2 instead of r1.
We check received capabilities on r2, hence we need to flush the counters on r2 too.
Fixes: d1cfd730601e5063d126ca1e78be5695fe909a77 ("tests: Check if ENHE capability can be handled dynamically") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Igor Ryzhov [Fri, 27 Dec 2024 19:33:39 +0000 (21:33 +0200)]
lib: introduce global -w option for VRF netns backend
Current -n option is only for zebra and mgmtd. All other daemons receive
the VRF backend configuration from zebra upon connection to it. This
leads to a potential race condition - daemons need to know the backend
before they start reading their config, but they can be not connected to
zebra yet at this point. As the VRF backend cannot change during runtime,
let's introduce a new global -w option for setting netns backend, to
make sure that all daemons know their VRF backend immediately after
start.
The reason for introducing a new option instead of making -n global is
that ospfd already uses -n for another purposes.
Igor Ryzhov [Fri, 27 Dec 2024 13:10:27 +0000 (15:10 +0200)]
lib, zebra: move ns context intialization to zebra
vrf->ns_ctxt is only ever used in zebra, so move its initialization to
zebra's callback. Ideally this pointer shouldn't even be a part of
library's vrf struct, and moved to zebra-specific struct, but this is
the first step.
Igor Ryzhov [Thu, 26 Dec 2024 21:08:21 +0000 (23:08 +0200)]
lib: remove VRF_BACKEND_UNKNOWN
The backend type cannot be unknown. It is configured to VRF_LITE by
default in zebra anyway, so just init to VRF_LITE in the lib and remove
the UNKNOWN type.
Donald Sharp [Tue, 14 Jan 2025 21:23:40 +0000 (16:23 -0500)]
zebra: On Nexthop install failure don't set Installation failed
Currently FRR when installing a nexthop group, the installation can fail.
The assumption with the code was that the current nexthop group was
not already installed. This leaves a problem state where if the
users of the nexthop group are removed, the nexthop group will be
removed possibly leaving a orphaned nexthop group in the data plane.
FRR on a nexthop group installation does not actually know the status
of the nexthop group in the kernel. It's possible that a earlier
version of the nexthop group is left in play. It's possible that
there is no nexthop group in the kernel at all. Leaving the
Installed flag alone allows upon Zebra removing the nexthop
group when it is removed from zebra.
Donatas Abraitis [Tue, 14 Jan 2025 16:57:45 +0000 (18:57 +0200)]
bgpd: Handle ENHE capability via dynamic capability
FRR supports dynamic capability which is useful to exchange the capabilities
without tearing down the session. ENHE capability was missed to be included
handling via dynamic capability. Let's add it too.
This was missed and asked in Slack that it would be useful.
Donald Sharp [Tue, 14 Jan 2025 20:12:32 +0000 (15:12 -0500)]
zebra: Nexthops need to be ACTIVE in some cases
Currently if you have an interface down event, Zebra
sets the nexthop(s) as !ACTIVE that use it. On
interface up events the singleton nexthops are not being
set as ACTIVE. Due to timing events it is sometimes
possible to end up with a route that is using a singleton
Change singleton nexthops to set the nexthop to ACTIVE.
This will allow the nexthop to be reinstalled appropriately
as well.
I was able to easily reproduce this using sharpd since
it does not attempt to reinstall the routes when a interface
goes up/down.
Before:
D>* 10.0.0.0/32 [150/0] via 192.168.102.34, dummy2, weight 1, 00:00:01
sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up
Chirag Shah [Mon, 13 Jan 2025 02:31:02 +0000 (18:31 -0800)]
tools: fix frr-reload for nbr deletion of no form cmds
When a bgp neighbor removed from associated to peer-group,
the neighbor is fully deleted, subsequent deletion of any
configuration related to the neighbor leads to failure
in frr-reload.
Handle any 'no neighbor ...' part of lines_to_del list
Testing:
Below first line would delete the neighbor swp1.10,
the existing code before the change handles to remove
config starts with 'neighbor swp1.10 ...' but not
'no neighbor swp1.10 ...'.
LaTex doesn't like the Unicon character `≥` which should be represented
by the special sequnce `$\leq$`. Since this is buried all in Sphinx, and
we also have a scrip that look for the character, the easiest fix is to
just use `>=` instead which works without warnings, and without asking to
ignore every warning about the use of every instance of the special char.
Christian Hopps [Sun, 12 Jan 2025 09:42:33 +0000 (09:42 +0000)]
tests: update munet to 0.15.4
- add readline and waitline functions for use with popen objects
- other non-topotest (munet native) run changes
- vm/qemu support booting cloud images (rocky, ubuntu, debian)
- native topology init commands
Enke Chen [Sat, 11 Jan 2025 22:01:36 +0000 (14:01 -0800)]
bgpd: fix churn of aggregate routes from duplicate config
Currently when an aggregate-address is configured, the existing
entry is always removed regardless whether any change is involved.
This would create unnecessary churn of the aggregate route when
the same config is applied for the aggregate address.
The fix is to check for duplicate aggregate-address config.
Rajasekar Raja [Fri, 10 Jan 2025 21:39:12 +0000 (13:39 -0800)]
zebra: Optimize invoking nhg compare func
In some cases, the old_re nhe and the newnhe is same and there is no
point in comparing them both since they are the same. Skip comparing in
such cases.
Ex:
2025/01/09 23:49:27.489020 ZEBRA: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489021 ZEBRA: [ZH3FQ-TE9NV] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 new id: 44 old id: 44
2025/01/09 23:49:27.489021 ZEBRA: [YB8HE-Z86GN] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 NEW 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489023 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489024 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489025 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
2025/01/09 23:49:27.489026 ZEBRA: [ZM3BX-HPETZ] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 OLD 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489027 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Donald Sharp [Thu, 9 Jan 2025 17:51:16 +0000 (12:51 -0500)]
bgpd: Only update peer connection information when needed
Currently bgp is repeatedly grabbing peer connection information.
This is a bit overkill. There are two situations:
a) Opening a connection to the peer
In this case, we know the remote port/address a priori and can get
the local information by just asking the OS.
b) Peer opening a connection to us.
In this case, we know the local port/address a priori and can get
the remote information by just asking the OS.
Modify the code to just grab this data at the appropriate time.
Donald Sharp [Tue, 17 Dec 2024 20:56:19 +0000 (15:56 -0500)]
bgpd: su_remote and su_local are properties of the connection
su_local and su_remote in the peer can change based upon
if we are initiating the remote connection or receiving it.
As such we need to treat it as a property of the connection.
anlan_cs [Thu, 9 Jan 2025 08:36:12 +0000 (16:36 +0800)]
ospfd: avoid the redundant timers
Since the timer thread for ```OSPF_ROUTE_AGGR_DEL``` has been created,
the subsequent "no summary-address" commands shouldn't trigger redundant timers.
Nathan Bahr [Wed, 30 Oct 2024 21:21:50 +0000 (21:21 +0000)]
pimd: Implement rpf lookup mode as a list
Add the support to store lookup modes as a sorted list.
List is non-unique and sorts mode with both lists < modes with one list < global mode (no lists).
This way, when finding the right mode, we will match a lookup using a prefix list before the global mode.
Add passing group address into all lookups (using nht cache and/or synchronous lookup).
Many areas don't have a group address, use PIMADDR_ANY if no valid group is needed.
Nathan Bahr [Wed, 30 Oct 2024 21:16:30 +0000 (21:16 +0000)]
pimd,yang: Expand rpf-lookup-mode command
Add options for group-list and source-list, both of which take a prefix list name.
The prefix list is used to determine the lookup mode for specific sources and/or groups.
Any number of lookup modes can be configured as long as the combination of group
and source list is unique.
A global lookup mode (empty group and source lists) is always added and defaults to mrib-then-urib
as it currently functions. The global lookup mode can be changed as it current exists with the command
`rpf-lookup-mode MODE`.
When determinig which mode to use, match source (and group if provided) against the lists, if they are set.
If a lookup does not specify a group, then only use lookup modes that do not have a group list defined.
A lookup by definition will have a source, so no special handling there.
Donald Sharp [Wed, 8 Jan 2025 14:42:49 +0000 (09:42 -0500)]
tests: bgp_srv6l3vpn_to_bgp_vrf3 needs more time
The test starts with checking for rib insertion
of routes that may take some time after system
startup to come up. Under heavy load this may
cause this test to just fail. Give it more time.
Donald Sharp [Thu, 9 Jan 2025 17:34:50 +0000 (12:34 -0500)]
zebra: Fix leaked nhe
During route processing in zebra, Zebra will create a nexthop
group that matches the nexthops passed down from the routing
protocol. Then Zebra will look to see if it can re-use a
nhe from a previous version of the route entry( say a interface
goes down ). If Zebra decides to re-use an nhe it was just dropping
the route entry created. Which led to nexthop group's that had
a refcount of 0 and in some cases these nexthop groups were installed
into the kernel.
Add a bit of code to see if the returned entry is not being used
and it has no reference count and if so, properly dispose of it.
Donald Sharp [Wed, 8 Jan 2025 14:41:21 +0000 (09:41 -0500)]
tests: bgp_srv6_sid_reachability should give more time
The test starts right in on check_pings with a 10 second
time out. Any type of delay on startup is going to cause
problems. Give the first check_ping significant time
for the test to be fully brought up.
Enke Chen [Thu, 9 Jan 2025 01:34:29 +0000 (17:34 -0800)]
bgpd: apply route-map for aggregate before attribute comparison
Currently when re-evaluating an aggregate route, the full attribute of
the aggregate route is not compared with the existing one in the BGP
table. That can result in unnecessary churns (un-install and then
install) of the aggregate route when a more specific route is added or
deleted, or when the route-map for the aggregate changes. The churn
would impact route installation and route advertisement.
The fix is to apply the route-map for the aggregate first and then
compare the attribute.