Donald Sharp [Fri, 11 Oct 2024 13:33:35 +0000 (09:33 -0400)]
fpm: Allow max fpm message size to float based on ecmp
Currently the max message size is 4k. With a 256 way
ecmp FRR is seeing message sizes that are in the
6k size. There is desire to allow this to increase as
well to 512. Since the multipath size directly effects
how big the message may be when sending the routes ecmp
let's give a bit of headroom for this value when compiling
FRR at greater sizes. Additionally since we know not everyone
is using such large ecmp, allow them to build as appropriate
for their use cases.
Donald Sharp [Fri, 11 Oct 2024 00:08:32 +0000 (20:08 -0400)]
zebra: Slow down fpm_process_queue
When the fpm_process_queue has run out of space
but has written to the fpm output buffer, schedule
it to wake up immediately, as that the write will go out
pretty much immediately, since it was scheduled first.
If the fpm_process_queue has not written to the output
buffer then delay the processing by 10 milliseconds to
allow a possibly backed up write processing to have a
chance to complete it's work.
Donald Sharp [Thu, 10 Oct 2024 20:00:08 +0000 (16:00 -0400)]
zebra: Only notify dplane work pthread when needed
The fpm_nl_process function was getting the count
of the total number of ctx's processed. This leads
to after having processed 1 context to always signal
the dataplane that there is work to do. Change the
code to only notify the dplane worker when a context
was actually added to the outgoing context queue.
Louis Scalbert [Fri, 11 Oct 2024 05:12:23 +0000 (07:12 +0200)]
bgpd: split nexthop-local unchanged peer subgroup
5bb99ccad2 ("bgpd: reset ipv6 invalid link-local nexthop") now resets
the link-local when originating and destination peers are not on the
same network segment. However, it does not work all the time.
The fix compares the 'from' and 'peer' global IPv6 address. However,
'peer' refers to one of the peers of subgroup. The subgroup may contain
peers located on different network segment.
Split nexthop-local unchanged peer subgroup by network segment.
Louis Scalbert [Wed, 9 Oct 2024 15:08:44 +0000 (17:08 +0200)]
bgpd: reset ipv6 invalid link-local nexthop
If the "nexthop-local unchanged" setting is enabled, it preserves the
IPv6 link-local nexthop from the originating peer. However, if the
originating and destination peers are not on the same network segment,
the originating peer's IPv6 link-local address will be unreachable from
the destination peer.
In such cases, reset the IPv6 link-local nexthop, even if "nexthop-local
unchanged" is set on the destination peer.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Donald Sharp [Mon, 7 Oct 2024 16:40:46 +0000 (12:40 -0400)]
*: Allow 16 bit size for nexthops
Currently FRR is limiting the nexthop count to a uint8_t not a
uint16_t. This leads to issues when the nexthop count is 256
which results in the count to overflow to 0 causing problems
in the code.
Donald Sharp [Fri, 4 Oct 2024 13:51:46 +0000 (09:51 -0400)]
zebra: Do not retry in 30 seconds on pw reachability failure
Currently the zebra pw code has setup a retry to install the
pw after 30 seconds when it is decided that reachability to
the pw is gone. This causes a failure mode where the
pw code just goes and re-installs the pw after 30 seconds
in the non-reachability case. Instead it should just be
reinstalling after reachability is restored.
Donald Sharp [Fri, 4 Oct 2024 13:38:25 +0000 (09:38 -0400)]
zebra: Move pw status settting until after we get results
Currently the pw code sets the status of the pw for install
and uninstall immediately when notifying the dplane. This
is incorrect in that we do not actually know the status at
this point in time. When we get the result is when to set
the status.
Christian Hopps [Mon, 7 Oct 2024 03:23:31 +0000 (03:23 +0000)]
lib: add flag to have libyang load internal ietf-yang-library module
Mgmtd makes use of libyang's internal ietf-yang-library module to add
support for said module to FRR management. Previously, mgmtd was loading
this module explicitly; however, that required that libyang's
`ietf-yang-library.yang` module definition file be co-located with FRR's
yang files so that it (and ietf-datastore.yang) would be found when
searched for by libyang using FRRs search path. This isn't always the
case depending on how the user compiles and installs libyang so mgmtd
was failing to run in some cases.
Instead of doing it the above way we simply tell libyang to load it's
internal version of ietf-yang-library when we initialize the libyang
context.
This required adding a boolean to a couple of the init functions which
is why so many files are touched (although all the changes are minimal).
anlan_cs [Sun, 6 Oct 2024 13:06:15 +0000 (21:06 +0800)]
isisd: fix wrong check for MT commands
```
anlan# show run
!
interface eth0
ip router isis A
exit
!
router isis A
metric-style narrow <- NOT wide
exit
!
end
anlan (config)# int eth0
anlan (config-if)# no isis topology ipv6-unicast
% Configuration failed.
Error type: validation
Error description: Multi topology IS-IS can only be used with wide metrics
```
The MT commands are mainly controlled by the binded area, not by interface.
Currently if there is any MT configuration in the area, `metric-style` must
be with the `wide` mode, this requirement is sufficient. So, the
unnecessary/wrong check for MT in the interface should be removed.
anlan_cs [Sat, 5 Oct 2024 08:43:53 +0000 (16:43 +0800)]
tools: fix some special commands for reloading pim
The issue is we can't remove all pim configurations including some
special configurations (e.g., `no ip pim bsm`) for one interface.
For one pim-disable interface, all such pim depdendent options
(including `ip pim ` and `no ip pim `) should be completely removed.
Also append `no ip multicast` for the same purpose, it is no use at present,
but for future use.
The running config:
```
interface A
ip pim
no ip pim bsm
exit
```
Reload the new config:
```
interface A
exit
```
Before:
```
2024-10-05 20:52:33,467 INFO: Executed "interface A no ip pim exit"
2024-10-05 20:52:33,482 INFO: Executed "interface A ip pim bsm exit"
```
And the pim configurations in running configuration are not removed after reloading:
```
interface A
ip pim <- Wrong
exit
```
After:
```
2024-10-05 20:56:27,489 INFO: Executed "interface A no ip pim exit"
```
And all the pim configuration are removed.
Igor Zhukov [Fri, 4 Oct 2024 06:16:02 +0000 (13:16 +0700)]
zebra: Fix crash during reconnect
fpm_enqueue_rmac_table expects an fpm_rmac_arg* as its argument.
The issue can be reproduced by dropping the TCP session using:
ss -K dst 127.0.0.1 dport = 2620
I used Fedora 40 and frr 9.1.2 and I got the gdb backtrace:
(gdb) bt
0 0x00007fdd7d6997ea in fpm_enqueue_rmac_table (bucket=0x2134dd0, arg=0x2132b60) at zebra/dplane_fpm_nl.c:1217
1 0x00007fdd7dd1560d in hash_iterate (hash=0x21335f0, func=0x7fdd7d6997a0 <fpm_enqueue_rmac_table>, arg=0x2132b60) at lib/hash.c:252
2 0x00007fdd7dd1560d in hash_iterate (hash=0x1e5bf10, func=func@entry=0x7fdd7d698900 <fpm_enqueue_l3vni_table>,
arg=arg@entry=0x7ffed983bef0) at lib/hash.c:252
3 0x00007fdd7d698b5c in fpm_rmac_send (t=<optimized out>) at zebra/dplane_fpm_nl.c:1262
4 0x00007fdd7dd6ce22 in event_call (thread=thread@entry=0x7ffed983c010) at lib/event.c:1970
5 0x00007fdd7dd20758 in frr_run (master=0x1d27f10) at lib/libfrr.c:1213
6 0x0000000000425588 in main (argc=10, argv=0x7ffed983c2e8) at zebra/main.c:492
bgpd: Actually make ` --v6-with-v4-nexthops` it work
It was using `-v` which is actually a _version_.
Fixes: 0435b31bb8ed55377f83d0e19bc085abc3c71b44 ("bgpd: Allow bgp to specify if it will allow v6 routing with v4 nexthops") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Donald Sharp [Tue, 1 Oct 2024 18:31:08 +0000 (14:31 -0400)]
lib: nexthop code should use uint16_t for nexthop counting
It's possible to specify via the cli and configure how many
nexthops that are allowed on the system. If you happen to
have > 255 then things are about to get interesting otherwise.
Donald Sharp [Mon, 30 Sep 2024 19:09:42 +0000 (15:09 -0400)]
bgpd: Cleanup multipath figuring out in bgp
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
Donald Sharp [Thu, 26 Sep 2024 14:46:23 +0000 (10:46 -0400)]
bgpd: Ensure mpath data is only on bestpath
The mpath data structure has data that is only relevant
for the first mpath in the list. It is not being used
anywhere else. Let's document that a bit more.