Igor Ryzhov [Mon, 12 Feb 2024 18:34:33 +0000 (20:34 +0200)]
mgmtd: fix be_is_client_interested
Backend "subscribe" API allows daemons to dynamically register xpaths
they are interested in. Such xpaths are not stored in hardcoded
config/oper xpath arrays so this function fails to understand that a
backend daemon is interested in them. Fix by using dynamic xpath maps
instead which store both hardcoded and dynamic xpaths.
Igor Ryzhov [Sat, 10 Feb 2024 00:11:13 +0000 (02:11 +0200)]
lib, mgmtd: rework processing of yang notifications
Currently, YANG notification processing is done using a special type of
callbacks registered in backend clients. In this commit, we start using
regular northbound infrastructure instead, because it already has a
convenient way of registering xpath-specific callbacks without the need
for creating additional structures for each necessary notification. We
also now pass a notification data to the callback, instead of a plain
JSON. This allows to use regular YANG library functions for inspecting
notification fields, instead of manually parsing the JSON.
In `qpb.h` we have a bunch of functions that make use of
`union g_addr`. `union g_addr` is defined in `nexthop.h`, which
actually is NOT included in `qpb.h`.
Igor Ryzhov [Fri, 9 Feb 2024 22:58:49 +0000 (00:58 +0200)]
lib, mgmtd: fix processing of yang notifications
Current code assumes that notification is always sent in stripped JSON
format and therefore notification xpath starts at the third symbol of
notification data. Assuming JSON is more or less fine, because this
representation is internal to FRR, but the assumption about the xpath is
wrong, because it won't work for not top-level notifications. YANG
allows to define notification as a child for some data node deep into
the tree and in this case notification data contains not only the
notification node itself, but also all its parents.
To fix the issue, parse the notification data and get its xpath from its
schema node.
> VRF r1-cust4:
> B 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08
When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.
The result is:
> VRF r1-cust4:
> B> 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
> * via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
ospfd: can not delete "segment-routing node-msd" when SR if off
This fixes the initial implementation of commit 7743f2f8c00 ("OSPFd: Update
Segment Routing PR following review") where it wsa not possible to remove
the "segment-routing node-msd" CLI nodes via vtysh once segment-routing got
disabled.
Closes #14910
Signed-off-by: Christian Breunig <christian@breunig.cc>
pimd: re-evaluated S,G OILs upon RP changes and for empty SG upstream oils
Topology:
TOR11 (FHR) --- LEAF-11---SPINE1 (RP)MSDP SPINE-2(RP)MSDP --- LEAF-12 -- TOR12 (LHR)
| | | | |
| -----------------------------------------------------(ECMP) |
| | | |
-----------------------------------------------------------------------(ECMP)
Issue:
In some triggers, S,G upstream is preserved even with the PP timer expiry, resulting
in S,G with NULL OILS. This could be because we create a dummy S,G upstream and
dummy channel_oif for *,G, where RPF is UNKNOWN. As a result, PIM+VXLAN traffic is never
forwarded downstream to LHR.
Fix:
when the S,G stream is running, Determine if a reevaluation of the outgoing interface
list (OIL) is required. S,G upstream should then inherit the OIL from *,G.
bgpd: fix flushing ipv6 flowspec entries when peering stops
When a BGP flowspec peering stops, the BGP RIB entries for IPv6
flowspec entries are removed, but not the ZEBRA RIB IPv6 entries.
Actually, when calling bgp_zebra_withdraw() function call, only
the AFI_IP parameter is passed to the bgp_pbr_update_entry() function
in charge of the Flowspec add/delete in zebra. Fix this by passing
the AFI parameter to the bgp_zebra_withdraw() function.
Note that using topotest does not show up the problem as the
flowspec driver code is not present and was refused. Without that,
routes are not installed, and can not be uninstalled.
Fixes: 529efa234655 ("bgpd: allow flowspec entries to be announced to zebra") Link: https://github.com/FRRouting/frr/pull/2025 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Igor Ryzhov [Fri, 2 Feb 2024 23:15:46 +0000 (01:15 +0200)]
vtysh: remove resync workaround when exiting to config node
When exiting from a level below the config node, like `router rip`,
vtysh executes a resync by sending "end" and "conf term [file-lock]"
commands to all the daemons. As statet in the description comment, it's
done "in case one of the daemons is somewhere else". I don't think this
actually ever happens, but even if it is, it is a bug in a daemon that
needs to be fixed. This resync was okay before the introduction of
mgmtd, but now it unlocks and locks back the datastores during the
configuration reading process, which can lead to a failure which is
explained in the previous commit.
Igor Ryzhov [Fri, 2 Feb 2024 22:42:58 +0000 (00:42 +0200)]
mgmtd, vtysh: fix possible conflict when reading the config
When FRR starts, after mgmtd is initialized, backend clients connect to
it and request their config. To supply the config, mgmtd creates a
configuration transaction. At the same time, `vtysh -b` tries to read
the startup config and configure mgmtd, which also creates a
configuration transaction. If these two actions happen at the exact same
time, there's a conflict between them, because only a single
configuration translaction is allowed. Because of that, vtysh fails and
the config is completely ignored.
When starting the config reading, vtysh locks candidate and running
datastores in mgmtd. This commit adds locking of running datastore when
initializing the backend client. It allows to retry locking on the vtysh
side and read the config only when the lock is aquired instead of
failing.
This change also prevents running datastore from being changed during
initialization of backend clients. This could lead to a desynchronized
state between mgmtd and backends.
Donald Sharp [Wed, 7 Feb 2024 15:38:02 +0000 (10:38 -0500)]
lib, ospfclient, tests, vtysh: Allow for a minimum fd poll size
There exists cases where just honoring the FD_LIMIT size
as given to us by the operating system makes no sense.
Let's just make a switch to allow for this for things
like vtysh and ospfclient which will never have 1k files
open at any given time.
Fixes: #15315 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Louis Scalbert [Wed, 7 Feb 2024 12:16:26 +0000 (13:16 +0100)]
lib: fix affinity map duplicate
Fix duplicate definition of frr_affinity_map_cli_info in libfrr.so.0 and
libmgmt_be_nb.so.0
> =================================================================
> ==3860488==ERROR: AddressSanitizer: odr-violation (0x7f12c98c4d20):
> [1] size=296 'frr_affinity_map_cli_info' lib/affinitymap_cli.c:77:35
> [2] size=296 'frr_affinity_map_cli_info' lib/affinitymap_cli.c:77:35
> These globals were registered at these points:
> [1]:
> #0 0x7f12c9a36f40 in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:341
> #1 0x7f12c9585b7d in _sub_I_00099_1 (/lib/libfrr.so.0+0x185b7d)
> #2 0x7f12ca437fe1 in call_init elf/dl-init.c:72
>
> [2]:
> #0 0x7f12c9a36f40 in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:341
> #1 0x7f12c93824ed in _sub_I_00099_1 (/lib/libmgmt_be_nb.so.0+0x6f4ed)
> #2 0x7f12ca437fe1 in call_init elf/dl-init.c:72
>
> ==3860488==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
> SUMMARY: AddressSanitizer: odr-violation: global 'frr_affinity_map_cli_info' at lib/affinitymap_cli.c:77:35
> ==3860488==ABORTING
Fixes: dc6ff4c0de ("lib: convert affinity-map to mgmtd") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
David Schweizer [Wed, 24 Jan 2024 10:16:03 +0000 (11:16 +0100)]
topotests: fix snmptrap log OID parsing
Replace OID string parsing of snmptrap log files based on odd/even line
numbers with regex string search to prevent test failures in cases where
log entries don't match assumed order.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Donald Sharp [Sat, 3 Feb 2024 13:52:38 +0000 (08:52 -0500)]
doc: Add some documentation around a new pthread call
Not necessarily the correct place for this but there
is no other place and it needs to be called out and I
would rather have some documentation in place. Long
term I would like to add a bunch of frr_pthread documentation
but at this point in time it's not there. We can
re-arrange when that happens.
Donald Sharp [Mon, 5 Feb 2024 19:15:29 +0000 (14:15 -0500)]
zebra: Reorg `struct route_entry` to have important bits first
The `struct route_entry` had items that were almost never used at
the front of the data structure resulting in items that would be
loaded first into memory that were never used. Let's reorg a
tiny bit and put all the frequently used items in the first cache
line. I'm sure people will notice .000000001 speedup
There was a question in regards to how the update-source
choose the ip address for the source when using the `update-source`
command in BGP. Upon looking at the code, I was a but surprised,
so I decided to document this behavior.
Igor Ryzhov [Sat, 3 Feb 2024 19:41:37 +0000 (21:41 +0200)]
staticd: fix nht memory leak
When a static route with a gateway nexthop is created, the nexthop is
sent to zebra for NHT, and added to a local hash. When the nexthop's VRF
is deleted from kernel, nexthop still stays in the hash. This is a
memory leak, because it is never deleted from there. Even if the VRF is
recreated in kernel, it is assigned with a new `vrf_id` so the old hash
entry is not reused, and a new one is created. To fix the issue, remove
all nexthops from the hash when the corresponding VRF is deleted, do not
add nexthops to the hash if their corresponding VRF doesn't exist in
kernel and don't add the same nexthop to the hash multiple times.
Igor Ryzhov [Sat, 3 Feb 2024 19:34:18 +0000 (21:34 +0200)]
staticd: fix processing nht updates
When processing an NHT update, we should go though all `static_vrf`
structures instead of regualar `vrf`, because some of `static_vrf` may
not have corresponding `vrf` and will miss the update.
Igor Ryzhov [Sat, 3 Feb 2024 20:27:49 +0000 (22:27 +0200)]
lib: add ietf-yang-metadata to the list of built-in modules
We don't need to manually load built-in modules. This fixes the
following warning in mgmtd:
```
YANG model "ietf-yang-metadata@*" "*@*"not embedded, trying external file
```
Igor Ryzhov [Sat, 3 Feb 2024 20:43:58 +0000 (22:43 +0200)]
mgmtd: disable lib code for config reading
mgmtd reads config files on its own, it doesn't need libfrr to do that.
The code is already skipped, because mgmtd uses `di->read_in` thread for
config reading and libfrr doesn't reschedule the thread, so this commit
just removes the dead code.