Igor Ryzhov [Mon, 12 Feb 2024 18:34:33 +0000 (20:34 +0200)]
mgmtd: fix be_is_client_interested
Backend "subscribe" API allows daemons to dynamically register xpaths
they are interested in. Such xpaths are not stored in hardcoded
config/oper xpath arrays so this function fails to understand that a
backend daemon is interested in them. Fix by using dynamic xpath maps
instead which store both hardcoded and dynamic xpaths.
Igor Ryzhov [Sat, 10 Feb 2024 00:11:13 +0000 (02:11 +0200)]
lib, mgmtd: rework processing of yang notifications
Currently, YANG notification processing is done using a special type of
callbacks registered in backend clients. In this commit, we start using
regular northbound infrastructure instead, because it already has a
convenient way of registering xpath-specific callbacks without the need
for creating additional structures for each necessary notification. We
also now pass a notification data to the callback, instead of a plain
JSON. This allows to use regular YANG library functions for inspecting
notification fields, instead of manually parsing the JSON.
In `qpb.h` we have a bunch of functions that make use of
`union g_addr`. `union g_addr` is defined in `nexthop.h`, which
actually is NOT included in `qpb.h`.
Igor Ryzhov [Fri, 9 Feb 2024 22:58:49 +0000 (00:58 +0200)]
lib, mgmtd: fix processing of yang notifications
Current code assumes that notification is always sent in stripped JSON
format and therefore notification xpath starts at the third symbol of
notification data. Assuming JSON is more or less fine, because this
representation is internal to FRR, but the assumption about the xpath is
wrong, because it won't work for not top-level notifications. YANG
allows to define notification as a child for some data node deep into
the tree and in this case notification data contains not only the
notification node itself, but also all its parents.
To fix the issue, parse the notification data and get its xpath from its
schema node.
> VRF r1-cust4:
> B 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08
When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.
The result is:
> VRF r1-cust4:
> B> 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
> * via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
ospfd: can not delete "segment-routing node-msd" when SR if off
This fixes the initial implementation of commit 7743f2f8c00 ("OSPFd: Update
Segment Routing PR following review") where it wsa not possible to remove
the "segment-routing node-msd" CLI nodes via vtysh once segment-routing got
disabled.
Closes #14910
Signed-off-by: Christian Breunig <christian@breunig.cc>
pimd: re-evaluated S,G OILs upon RP changes and for empty SG upstream oils
Topology:
TOR11 (FHR) --- LEAF-11---SPINE1 (RP)MSDP SPINE-2(RP)MSDP --- LEAF-12 -- TOR12 (LHR)
| | | | |
| -----------------------------------------------------(ECMP) |
| | | |
-----------------------------------------------------------------------(ECMP)
Issue:
In some triggers, S,G upstream is preserved even with the PP timer expiry, resulting
in S,G with NULL OILS. This could be because we create a dummy S,G upstream and
dummy channel_oif for *,G, where RPF is UNKNOWN. As a result, PIM+VXLAN traffic is never
forwarded downstream to LHR.
Fix:
when the S,G stream is running, Determine if a reevaluation of the outgoing interface
list (OIL) is required. S,G upstream should then inherit the OIL from *,G.
Donald Sharp [Thu, 8 Feb 2024 17:32:26 +0000 (12:32 -0500)]
zebra: When reinstalling a NHG, set REINSTALL flag
The current code is unsetting the fact that the
NHG is installed. It is installed but we are
reinstalling it. Let's note this in the code
appropriately as REINSTALL and not remove the
INSTALLED FLAG.
Donald Sharp [Thu, 8 Feb 2024 17:22:09 +0000 (12:22 -0500)]
zebra: Move nhg reinstallation into if_up proper
The function call in to zebra_interface_nhg_reinstall
is an action that takes place on interface up events
*not* when the connected addresses are added to
a system. this will prevent this function being
called when new connected interfaces come alive
that is an independent operation of the interface
coming up.
Donald Sharp [Thu, 8 Feb 2024 17:05:05 +0000 (12:05 -0500)]
zebra: Remove function abstraction in interface.c
zebra_if_nhg_dependents_XXX were just simple wrapper
functions that inited/deleted data structures.
These were already function calls themselves. Let's
remove the abstraction and make the code simpler.
Donald Sharp [Thu, 8 Feb 2024 16:56:40 +0000 (11:56 -0500)]
zebra: Remove ifp_nhg_XXX functions completely
These functions provided a level of abstraction that forced
us to call multiple functions when a simple data structure
change was all that is needed. Let's consolidate down
and make things a bit simpler.
Donald Sharp [Thu, 8 Feb 2024 16:35:31 +0000 (11:35 -0500)]
zebra: Installation success should not set NHG as valid
The nexthop group is marked as valid/invalid and then
installed. Not installed and then marked valid.
This is just a bit of code removed that might be covering
up other problems that need to be sorted.
Donald Sharp [Thu, 8 Feb 2024 16:30:30 +0000 (11:30 -0500)]
zebra: Use switch when handling return from dplane for nhgs
Convert the dplane results function for nhg's over to
using a switch for the result enum. Let's specifically
call out the unexpected state and also set the nexthop
group as not installed when installation fails.
When installing a NHG via dplane_nexthop_add, it can only return
REQUEST_QUEUED or REQUEST_FAILURE. There is no way SUCCESS can
be returned with the way the dplane works at this point in time.
Remove the code that attempts to set the NHE state appropriately
as it is impossible.
bgpd: fix flushing ipv6 flowspec entries when peering stops
When a BGP flowspec peering stops, the BGP RIB entries for IPv6
flowspec entries are removed, but not the ZEBRA RIB IPv6 entries.
Actually, when calling bgp_zebra_withdraw() function call, only
the AFI_IP parameter is passed to the bgp_pbr_update_entry() function
in charge of the Flowspec add/delete in zebra. Fix this by passing
the AFI parameter to the bgp_zebra_withdraw() function.
Note that using topotest does not show up the problem as the
flowspec driver code is not present and was refused. Without that,
routes are not installed, and can not be uninstalled.
Fixes: 529efa234655 ("bgpd: allow flowspec entries to be announced to zebra") Link: https://github.com/FRRouting/frr/pull/2025 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Igor Ryzhov [Fri, 2 Feb 2024 23:15:46 +0000 (01:15 +0200)]
vtysh: remove resync workaround when exiting to config node
When exiting from a level below the config node, like `router rip`,
vtysh executes a resync by sending "end" and "conf term [file-lock]"
commands to all the daemons. As statet in the description comment, it's
done "in case one of the daemons is somewhere else". I don't think this
actually ever happens, but even if it is, it is a bug in a daemon that
needs to be fixed. This resync was okay before the introduction of
mgmtd, but now it unlocks and locks back the datastores during the
configuration reading process, which can lead to a failure which is
explained in the previous commit.
Igor Ryzhov [Fri, 2 Feb 2024 22:42:58 +0000 (00:42 +0200)]
mgmtd, vtysh: fix possible conflict when reading the config
When FRR starts, after mgmtd is initialized, backend clients connect to
it and request their config. To supply the config, mgmtd creates a
configuration transaction. At the same time, `vtysh -b` tries to read
the startup config and configure mgmtd, which also creates a
configuration transaction. If these two actions happen at the exact same
time, there's a conflict between them, because only a single
configuration translaction is allowed. Because of that, vtysh fails and
the config is completely ignored.
When starting the config reading, vtysh locks candidate and running
datastores in mgmtd. This commit adds locking of running datastore when
initializing the backend client. It allows to retry locking on the vtysh
side and read the config only when the lock is aquired instead of
failing.
This change also prevents running datastore from being changed during
initialization of backend clients. This could lead to a desynchronized
state between mgmtd and backends.
Donald Sharp [Wed, 7 Feb 2024 15:38:02 +0000 (10:38 -0500)]
lib, ospfclient, tests, vtysh: Allow for a minimum fd poll size
There exists cases where just honoring the FD_LIMIT size
as given to us by the operating system makes no sense.
Let's just make a switch to allow for this for things
like vtysh and ospfclient which will never have 1k files
open at any given time.
Fixes: #15315 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Louis Scalbert [Wed, 7 Feb 2024 12:16:26 +0000 (13:16 +0100)]
lib: fix affinity map duplicate
Fix duplicate definition of frr_affinity_map_cli_info in libfrr.so.0 and
libmgmt_be_nb.so.0
> =================================================================
> ==3860488==ERROR: AddressSanitizer: odr-violation (0x7f12c98c4d20):
> [1] size=296 'frr_affinity_map_cli_info' lib/affinitymap_cli.c:77:35
> [2] size=296 'frr_affinity_map_cli_info' lib/affinitymap_cli.c:77:35
> These globals were registered at these points:
> [1]:
> #0 0x7f12c9a36f40 in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:341
> #1 0x7f12c9585b7d in _sub_I_00099_1 (/lib/libfrr.so.0+0x185b7d)
> #2 0x7f12ca437fe1 in call_init elf/dl-init.c:72
>
> [2]:
> #0 0x7f12c9a36f40 in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:341
> #1 0x7f12c93824ed in _sub_I_00099_1 (/lib/libmgmt_be_nb.so.0+0x6f4ed)
> #2 0x7f12ca437fe1 in call_init elf/dl-init.c:72
>
> ==3860488==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
> SUMMARY: AddressSanitizer: odr-violation: global 'frr_affinity_map_cli_info' at lib/affinitymap_cli.c:77:35
> ==3860488==ABORTING
Fixes: dc6ff4c0de ("lib: convert affinity-map to mgmtd") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
David Schweizer [Wed, 24 Jan 2024 10:16:03 +0000 (11:16 +0100)]
topotests: fix snmptrap log OID parsing
Replace OID string parsing of snmptrap log files based on odd/even line
numbers with regex string search to prevent test failures in cases where
log entries don't match assumed order.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Donald Sharp [Sat, 3 Feb 2024 13:52:38 +0000 (08:52 -0500)]
doc: Add some documentation around a new pthread call
Not necessarily the correct place for this but there
is no other place and it needs to be called out and I
would rather have some documentation in place. Long
term I would like to add a bunch of frr_pthread documentation
but at this point in time it's not there. We can
re-arrange when that happens.
Donald Sharp [Mon, 5 Feb 2024 19:15:29 +0000 (14:15 -0500)]
zebra: Reorg `struct route_entry` to have important bits first
The `struct route_entry` had items that were almost never used at
the front of the data structure resulting in items that would be
loaded first into memory that were never used. Let's reorg a
tiny bit and put all the frequently used items in the first cache
line. I'm sure people will notice .000000001 speedup
There was a question in regards to how the update-source
choose the ip address for the source when using the `update-source`
command in BGP. Upon looking at the code, I was a but surprised,
so I decided to document this behavior.