Yuqing Zhao [Mon, 13 Jan 2025 10:02:23 +0000 (11:02 +0100)]
staticd: Add infrastructure for SRv6
This commit adds datastructures and helper functions required to support SRv6 in staticd.
* List of locators
* List of SIDs
* Data structure to represent an SRv6 SID
* Functions to allocate/deallocate an SRv6 SID
* Functions to allocate, deallocate and lookup a locator
* Function to initialize/Cleanup SRv6
Igor Ryzhov [Fri, 27 Dec 2024 19:33:39 +0000 (21:33 +0200)]
lib: introduce global -w option for VRF netns backend
Current -n option is only for zebra and mgmtd. All other daemons receive
the VRF backend configuration from zebra upon connection to it. This
leads to a potential race condition - daemons need to know the backend
before they start reading their config, but they can be not connected to
zebra yet at this point. As the VRF backend cannot change during runtime,
let's introduce a new global -w option for setting netns backend, to
make sure that all daemons know their VRF backend immediately after
start.
The reason for introducing a new option instead of making -n global is
that ospfd already uses -n for another purposes.
Igor Ryzhov [Fri, 27 Dec 2024 13:10:27 +0000 (15:10 +0200)]
lib, zebra: move ns context intialization to zebra
vrf->ns_ctxt is only ever used in zebra, so move its initialization to
zebra's callback. Ideally this pointer shouldn't even be a part of
library's vrf struct, and moved to zebra-specific struct, but this is
the first step.
Igor Ryzhov [Thu, 26 Dec 2024 21:08:21 +0000 (23:08 +0200)]
lib: remove VRF_BACKEND_UNKNOWN
The backend type cannot be unknown. It is configured to VRF_LITE by
default in zebra anyway, so just init to VRF_LITE in the lib and remove
the UNKNOWN type.
Donald Sharp [Tue, 14 Jan 2025 21:23:40 +0000 (16:23 -0500)]
zebra: On Nexthop install failure don't set Installation failed
Currently FRR when installing a nexthop group, the installation can fail.
The assumption with the code was that the current nexthop group was
not already installed. This leaves a problem state where if the
users of the nexthop group are removed, the nexthop group will be
removed possibly leaving a orphaned nexthop group in the data plane.
FRR on a nexthop group installation does not actually know the status
of the nexthop group in the kernel. It's possible that a earlier
version of the nexthop group is left in play. It's possible that
there is no nexthop group in the kernel at all. Leaving the
Installed flag alone allows upon Zebra removing the nexthop
group when it is removed from zebra.
Donatas Abraitis [Tue, 14 Jan 2025 16:57:45 +0000 (18:57 +0200)]
bgpd: Handle ENHE capability via dynamic capability
FRR supports dynamic capability which is useful to exchange the capabilities
without tearing down the session. ENHE capability was missed to be included
handling via dynamic capability. Let's add it too.
This was missed and asked in Slack that it would be useful.
Donald Sharp [Tue, 14 Jan 2025 20:12:32 +0000 (15:12 -0500)]
zebra: Nexthops need to be ACTIVE in some cases
Currently if you have an interface down event, Zebra
sets the nexthop(s) as !ACTIVE that use it. On
interface up events the singleton nexthops are not being
set as ACTIVE. Due to timing events it is sometimes
possible to end up with a route that is using a singleton
Change singleton nexthops to set the nexthop to ACTIVE.
This will allow the nexthop to be reinstalled appropriately
as well.
I was able to easily reproduce this using sharpd since
it does not attempt to reinstall the routes when a interface
goes up/down.
Before:
D>* 10.0.0.0/32 [150/0] via 192.168.102.34, dummy2, weight 1, 00:00:01
sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up
Chirag Shah [Mon, 13 Jan 2025 02:31:02 +0000 (18:31 -0800)]
tools: fix frr-reload for nbr deletion of no form cmds
When a bgp neighbor removed from associated to peer-group,
the neighbor is fully deleted, subsequent deletion of any
configuration related to the neighbor leads to failure
in frr-reload.
Handle any 'no neighbor ...' part of lines_to_del list
Testing:
Below first line would delete the neighbor swp1.10,
the existing code before the change handles to remove
config starts with 'neighbor swp1.10 ...' but not
'no neighbor swp1.10 ...'.
LaTex doesn't like the Unicon character `≥` which should be represented
by the special sequnce `$\leq$`. Since this is buried all in Sphinx, and
we also have a scrip that look for the character, the easiest fix is to
just use `>=` instead which works without warnings, and without asking to
ignore every warning about the use of every instance of the special char.
Christian Hopps [Sun, 12 Jan 2025 09:42:33 +0000 (09:42 +0000)]
tests: update munet to 0.15.4
- add readline and waitline functions for use with popen objects
- other non-topotest (munet native) run changes
- vm/qemu support booting cloud images (rocky, ubuntu, debian)
- native topology init commands
Enke Chen [Sat, 11 Jan 2025 22:01:36 +0000 (14:01 -0800)]
bgpd: fix churn of aggregate routes from duplicate config
Currently when an aggregate-address is configured, the existing
entry is always removed regardless whether any change is involved.
This would create unnecessary churn of the aggregate route when
the same config is applied for the aggregate address.
The fix is to check for duplicate aggregate-address config.
Rajasekar Raja [Fri, 10 Jan 2025 21:39:12 +0000 (13:39 -0800)]
zebra: Optimize invoking nhg compare func
In some cases, the old_re nhe and the newnhe is same and there is no
point in comparing them both since they are the same. Skip comparing in
such cases.
Ex:
2025/01/09 23:49:27.489020 ZEBRA: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489021 ZEBRA: [ZH3FQ-TE9NV] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 new id: 44 old id: 44
2025/01/09 23:49:27.489021 ZEBRA: [YB8HE-Z86GN] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 NEW 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489023 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489024 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489025 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
2025/01/09 23:49:27.489026 ZEBRA: [ZM3BX-HPETZ] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 OLD 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489027 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Donald Sharp [Thu, 9 Jan 2025 17:51:16 +0000 (12:51 -0500)]
bgpd: Only update peer connection information when needed
Currently bgp is repeatedly grabbing peer connection information.
This is a bit overkill. There are two situations:
a) Opening a connection to the peer
In this case, we know the remote port/address a priori and can get
the local information by just asking the OS.
b) Peer opening a connection to us.
In this case, we know the local port/address a priori and can get
the remote information by just asking the OS.
Modify the code to just grab this data at the appropriate time.
Donald Sharp [Tue, 17 Dec 2024 20:56:19 +0000 (15:56 -0500)]
bgpd: su_remote and su_local are properties of the connection
su_local and su_remote in the peer can change based upon
if we are initiating the remote connection or receiving it.
As such we need to treat it as a property of the connection.
Donald Sharp [Thu, 9 Jan 2025 20:26:46 +0000 (15:26 -0500)]
zebra: Uninstall NHG in some situations
If you have this series of events:
a) Decision to install a NHG is made in zebra, enqueue to DPLANE
b) Changes to NHG are made and we remove it in the master pthread
Since this NHG is not marked as installed it is not removed
but the NHG data structure is deleted
c) DPLANE installs the NHG
In the end the NHG stays installed but ZEBRA has lost track of it.
Modify the removal code to check to see if the NHG is queued.
There are 2 cases:
a) NHG is kept around for a bit before being deleted. In this case
just see that the NHG is Queued and keep it around too.
b) NHG is not kept around and we are just removing it. In this case
check to see if it is queued and send another deletion event.
anlan_cs [Thu, 9 Jan 2025 08:36:12 +0000 (16:36 +0800)]
ospfd: avoid the redundant timers
Since the timer thread for ```OSPF_ROUTE_AGGR_DEL``` has been created,
the subsequent "no summary-address" commands shouldn't trigger redundant timers.
Nathan Bahr [Wed, 30 Oct 2024 21:21:50 +0000 (21:21 +0000)]
pimd: Implement rpf lookup mode as a list
Add the support to store lookup modes as a sorted list.
List is non-unique and sorts mode with both lists < modes with one list < global mode (no lists).
This way, when finding the right mode, we will match a lookup using a prefix list before the global mode.
Add passing group address into all lookups (using nht cache and/or synchronous lookup).
Many areas don't have a group address, use PIMADDR_ANY if no valid group is needed.
Nathan Bahr [Wed, 30 Oct 2024 21:16:30 +0000 (21:16 +0000)]
pimd,yang: Expand rpf-lookup-mode command
Add options for group-list and source-list, both of which take a prefix list name.
The prefix list is used to determine the lookup mode for specific sources and/or groups.
Any number of lookup modes can be configured as long as the combination of group
and source list is unique.
A global lookup mode (empty group and source lists) is always added and defaults to mrib-then-urib
as it currently functions. The global lookup mode can be changed as it current exists with the command
`rpf-lookup-mode MODE`.
When determinig which mode to use, match source (and group if provided) against the lists, if they are set.
If a lookup does not specify a group, then only use lookup modes that do not have a group list defined.
A lookup by definition will have a source, so no special handling there.