Mark Stapp [Mon, 25 Nov 2024 20:37:39 +0000 (15:37 -0500)]
zebra: avoid a race during FPM dplane plugin shutdown
During zebra shutdown, the main pthread and the FPM pthread can
deadlock if the FPM pthread is in fpm_reconnect(). Each pthread
tries to use event_cancel_async() to cancel tasks that may be
scheduled for the other pthread - this leads to a deadlock as
neither thread can progress.
This adds an atomic boolean that's managed as each pthread
enters and leaves the cleanup code in question, preventing the
two threads from running into the deadlock.
Donald Sharp [Thu, 7 Sep 2023 12:06:28 +0000 (08:06 -0400)]
tests: fix max med on startup
The test is failing because on r2 we are looking for a metric of 777
on startup, but the start of looking for this happens to be after
the 5 second delay that is setup in the config.
Notice that the 5 second delay for the max med expires at 29 seconds but the show routes
on r2 does not even begin until 34 seconds, long after the max med has expired and the
test has moved on.
Let's relax the max-med timer to 30 seconds and modify the test to wait a bit longer for
both finding it and expiring timer.
Donald Sharp [Thu, 7 Sep 2023 11:57:26 +0000 (07:57 -0400)]
tests: Fix ospfapi client to clear ospf process
Test is failing locally:
2023-09-06 18:39:56,865 DEBUG: r1: vtysh result:
Hello, this is FRRouting (version 9.1-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
r1# conf t
r1(config)# router ospf
r1(config-router)# ospf router-id 1.1.1.1
For this router-id change to take effect, use "clear ip ospf process" command
r1(config-router)#
2023-09-06 18:39:56,865 DEBUG: root: GOT LINE: 'SUCCESS: 1.0.0.0'
2023-09-06 18:39:56,866 DEBUG: root: GOT LINE: '2023-09-06 18:39:55,982 INFO: TESTER: root: Waiting for 1.1.1.1'
2023-09-06 18:39:56,867 DEBUG: root: GOT LINE: '2023-09-06 18:39:55,982 DEBUG: TESTER: root: expected '1.1.1.1' != '1.0.0.0''
2023-09-06 18:39:56,867 DEBUG: root: GOT LINE: 'waiting on notify'
Sure looks like the router-id is not allowed to be changed because
neighbors have already been formed. If we are changing the router-id
then let's clear the process to allow it to correctly change.
And finally, reload the configuration
`python3 frr-reload.py --reload /etc/frr/frr.conf`
frr-reload returns the error below:
```
Failed to execute segment-routing srv6 no source-address 1::1 exit exit
"segment-routing -- srv6 -- no source-address 1::1 -- exit -- exit" we failed to remove this command
% Unknown command: no source-address 1::1
Olivier Dugeon [Sat, 23 Nov 2024 17:50:21 +0000 (18:50 +0100)]
ospfd: Correct invalid SR-MPLS output label
When OSPFd starts, there is 2 possible scenarios for Segment Routing:
1/ Routes associated to Prefixes are not yet available i.e. Segment Routing LSA
are received before LSA Type 1. In this case, the function
ospf_sr_nhlfe_update() is triggered when a new SPF is launch. Thus, neighbors
and output label are always synchronise with the routing table.
2/ Routes are already available i.e. LSA Type 1 are received before Segment
Routing LSA, in particular the Router Information which contains the SRGB.
During nhlfe computation, perfixes are leave with incomplete configuration, in
particular, the SR nexthop is set to NULL. If this scenario is handle through
the function update_out_nhlfe (triggered when SRGB is received or modified from
a neighbor node), the output label is not correctly configured as the nexthop
SR node associated to the prefix has been leave to NULL.
This patch correct this problem by calling the function compute_nhlfe() when
the nexthop SR Node associated to the prefix is NULL within the
update_out_nhlfe() function. Thus, we guarantee that the SR prefix is always
correctly configuration indpedently of the scenario i.e. arrival of the
different LSA.
Donald Sharp [Fri, 22 Nov 2024 16:02:15 +0000 (11:02 -0500)]
lib, zebra: Do not have duplicate memory type problems
In zebra_mpls.c it has a usage of MTYPE_NH_LABEL which is
defined in both lib/nexthop.c and zebra/zebra_mpls.c. The
usage in zebra_mpls.c is a realloc. This leads to a crash:
(gdb) bt
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:44
1 __pthread_kill_internal (signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:78
2 __GI___pthread_kill (threadid=126487246404032, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
3 0x0000730a1b442476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
4 0x0000730a1b94fb18 in core_handler (signo=6, siginfo=0x7ffeed1e07b0, context=0x7ffeed1e0680) at lib/sigevent.c:268
5 <signal handler called>
6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:44
7 __pthread_kill_internal (signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:78
8 __GI___pthread_kill (threadid=126487246404032, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
9 0x0000730a1b442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
10 0x0000730a1b4287f3 in __GI_abort () at ./stdlib/abort.c:79
11 0x0000730a1b9984f5 in _zlog_assert_failed (xref=0x730a1ba59480 <_xref.16>, extra=0x0) at lib/zlog.c:789
12 0x0000730a1b8f8908 in mt_count_free (mt=0x576e0edda520 <MTYPE_NH_LABEL>, ptr=0x576e36617b80) at lib/memory.c:74
13 0x0000730a1b8f8a59 in qrealloc (mt=0x576e0edda520 <MTYPE_NH_LABEL>, ptr=0x576e36617b80, size=16) at lib/memory.c:112
14 0x0000576e0ec85e2e in nhlfe_out_label_update (nhlfe=0x576e368895f0, nh_label=0x576e3660e9b0) at zebra/zebra_mpls.c:1462
15 0x0000576e0ec833ff in lsp_install (zvrf=0x576e3655fb50, label=17, rn=0x576e366197c0, re=0x576e3660a590) at zebra/zebra_mpls.c:224
16 0x0000576e0ec87c34 in zebra_mpls_lsp_install (zvrf=0x576e3655fb50, rn=0x576e366197c0, re=0x576e3660a590) at zebra/zebra_mpls.c:2215
17 0x0000576e0ecbb427 in rib_process_update_fib (zvrf=0x576e3655fb50, rn=0x576e366197c0, old=0x576e36619660, new=0x576e3660a590) at zebra/zebra_rib.c:1084
18 0x0000576e0ecbc230 in rib_process (rn=0x576e366197c0) at zebra/zebra_rib.c:1480
19 0x0000576e0ecbee04 in process_subq_route (lnode=0x576e368e0270, qindex=8 '\b') at zebra/zebra_rib.c:2661
20 0x0000576e0ecc0711 in process_subq (subq=0x576e3653fc80, qindex=META_QUEUE_BGP) at zebra/zebra_rib.c:3226
21 0x0000576e0ecc07f9 in meta_queue_process (dummy=0x576e3653fae0, data=0x576e3653fb80) at zebra/zebra_rib.c:3265
22 0x0000730a1b97d2a9 in work_queue_run (thread=0x7ffeed1e3f30) at lib/workqueue.c:282
23 0x0000730a1b96b039 in event_call (thread=0x7ffeed1e3f30) at lib/event.c:1996
24 0x0000730a1b8e4d2d in frr_run (master=0x576e36277e10) at lib/libfrr.c:1232
25 0x0000576e0ec35ca9 in main (argc=7, argv=0x7ffeed1e4208) at zebra/main.c:536
Clearly replacing a label stack is an operation that should be owned by
lib/nexthop.c. So lets move this function into there and have
zebra_mpls.c just call the function to replace the label stack.
Donald Sharp [Thu, 21 Nov 2024 14:26:52 +0000 (09:26 -0500)]
tests: Ensure connected routes are installed before continuing
Upon high load the ospf_instance_redistribute test can attempt
to install routes with sharpd before the connected routes have
fully baked themselves into zebra. Since sharpd intentionally
has no retry mechanism we need to ensure that the test is waiting
a small bit.
Donald Sharp [Thu, 21 Nov 2024 14:16:14 +0000 (09:16 -0500)]
tools: Add pim msdp show commands to support bundle
The support bundle was not gathering any msdp data
for pim at all. Let's add a bit to allow us to
have more data here when a suppport bundle is generated
Louis Scalbert [Tue, 5 Nov 2024 16:13:38 +0000 (17:13 +0100)]
tests: add bgp_vpnv4_route_leak_basic
bgp_vrf_route_leak_basic uses "import/export vrf" commands to perform
route leaks between VRF on the r1 router. The same result can be
achieved by using the "route-target import / export" commands.
Copy bgp_vrf_route_leak_basic to bgp_vpnv4_route_leak_basic. Change
BGP configuration to handle the route leaks with "route-target import /
export". Change to retry timers. No other changes.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Donald Sharp [Wed, 20 Nov 2024 21:07:34 +0000 (16:07 -0500)]
bgpd: Allow bfd to work if peer known but interface address not yet
If bgp is coming up and bgp has not received the interface address yet
but bgp has knowledge about a bfd peering, allow it to set the peering
data appropriately.
Donald Sharp [Wed, 20 Nov 2024 14:22:46 +0000 (09:22 -0500)]
tests: zebra_fec_nexthop_resolution improve
a) timers are really large preventing convergence in 30 seconds
b) The same configuration does not need to be initiated 60 times
when things are not working properly. Once is enough.
Donald Sharp [Wed, 20 Nov 2024 14:18:39 +0000 (09:18 -0500)]
bgpd: bgp_connect should return an `enum connect_result`
This function when it is run by bgp_start is expected
to return a `enum connect_result`. But instead
the function returns a variety of values that are
not really being checked for. Consolidate to a correct
choice.
Nathan Bahr [Fri, 1 Nov 2024 19:15:52 +0000 (19:15 +0000)]
tests: PIM AutoRP tests expanded
Now with a full AutoRP implementation, we can test AutoRP in a full network setup
beginning with candidate RP announcements all the way through discovery and active RP
selection.
Nathan Bahr [Fri, 1 Nov 2024 19:14:47 +0000 (19:14 +0000)]
pimd: Implement autorp mapping agent
Fully flushed out the AutoRP implementation now with the AutoRP mapping agent.
This touched most of AutoRP in order to have common reuse of containers for each
section of AutoRP operation (Candidate RP announcement, Mapping agent, Discovery).
Many debugs had guards added and many more debug logs added.
Donatas Abraitis [Tue, 19 Nov 2024 14:25:12 +0000 (16:25 +0200)]
bgpd: Disable sending ROV extended community by default
https://datatracker.ietf.org/doc/html/rfc8097 defines ROV extended community,
but https://datatracker.ietf.org/doc/draft-ietf-sidrops-avoid-rpki-state-in-bgp
is against sending it by default even for iBGP peers.
Donatas Abraitis [Mon, 18 Nov 2024 21:29:53 +0000 (23:29 +0200)]
bgpd: Optimize the way parsing communities if no community alias exists
If at least one community alias is configured, then let's do the work,
otherwise we don't need to spend time on splitting stuff and creating
a new string.
Acee Lindem [Mon, 18 Nov 2024 17:05:31 +0000 (17:05 +0000)]
tests: Add topotest for OSPF multi-instance default origination.
This change adds a topotest to test various case of OSPF multi-instance
origination including cases where the criteria route is from another
instance of OSPF, as well as the same OSPF instance (where a default
should not be originated).
Donatas Abraitis [Mon, 18 Nov 2024 09:10:05 +0000 (11:10 +0200)]
lib: Fix Lua script unit test
When building for big-endian architectures, this is failing because of
long long / int casting issues, let's use a separate integer to get the
results.
This is especially important when building the Docker images for multiple arches.
When originating a default AS-External LSA in one OSPF instance,
it wasn't working if the criteria route was installed by another OSPF
instance. This required more flexible processing of the OSPF external
route information.
Also fix problem multi-instance display for "show ip ospf
<instance> database ...".
Donatas Abraitis [Fri, 15 Nov 2024 07:54:07 +0000 (09:54 +0200)]
bgpd: Validate both nexthop information (NEXTHOP and NLRI)
If we receive an IPv6 prefix e.g.: 2001:db8:100::/64 with nextop: 0.0.0.0, and
mp_nexthop: fc00::2, we should not treat this with an invalid nexthop because
of 0.0.0.0. We MUST check for MP_REACH attribute also and decide later if we
have at least one a valid nexthop.
Rajasekar Raja [Thu, 14 Nov 2024 07:09:55 +0000 (23:09 -0800)]
bgpd : backpressure - Fix to pop items off zebra_announce FIFO for few EVPN triggers
In cases such as 'no advertise-all-vni' and L2 VNI DELETE, we need to
pop all the VPN routes present in the bgp_zebra_announce FIFO yet to
be processed regardless of VNI is configured or not.
NOTE: NO need to pop the VPN routes in two cases
1) In free_vni_entry
- Called by bgp_free()->bgp_evpn_cleanup().
- Since bgp_delete is called before bgp_free and we pop all the dest
pertaining to bgp under delete.
2) evpn_delete_vni() when user configures "no vni" since the withdraw
of all routes happen in normal cycle.