Mark Stapp [Wed, 30 Oct 2024 15:02:17 +0000 (11:02 -0400)]
zebra: separate zebra ZAPI server open and accept
Separate zebra's ZAPI server socket handling into two phases:
an early phase that opens the socket, and a later phase that
starts listening for client connections.
Philippe Guibert [Mon, 28 Oct 2024 17:20:13 +0000 (18:20 +0100)]
topotests: fix bmp_collector handling of empty as-path
When configuring the bgp_bmp test with an additional
peer that sends empty AS-PATH, the bmp collector is stopping:
> [2024-10-28 17:41:51] Finished dissecting data from ('192.0.2.1', 33922)
> [2024-10-28 17:41:52] Data received from ('192.0.2.1', 33922): length 195
> [2024-10-28 17:41:52] Got message type: <class 'bmp.BMPRouteMonitoring'>
> [2024-10-28 17:41:52] unpack_from requires a buffer of at least 2 bytes for unpacking 2 bytes at offset 0 (actual buffer size is 0)
> [2024-10-28 17:41:52] TCP session closed with ('192.0.2.1', 33922)
> [2024-10-28 17:41:52] Server shutting down on 192.0.2.10:1789
The parser attempts to read an empty AS-path and considers the length
value as a length in bytes, whereas RFC mentions this value as
definining the number of AS-PAths. Fix this in the parser.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 25 Oct 2024 20:06:11 +0000 (22:06 +0200)]
topotests: bmp, create shared library for bmp
The bgp_bmp and bgp_bmp_vrf tests use similar routines
to test the bmp, but are duplicates. Gather the bgp_bmp
and the bgp_bmp_vrf tests in a single bgp_bmp folder.
- Create a bgpbmp.py library under the bgp_bmp test folder
- The bgp_bmp and bgp_bmp_vrf test are renamed to bgp_bmp_1
and bgp_bmp_2 test.
- Maintain separate folder for config and output results. Adapt
the bgp_bmp library accordingly.
- The json output for bgp_bmp_2 test has no referenc to hostame.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Mon, 11 Mar 2024 10:51:55 +0000 (11:51 +0100)]
bgpd: fix use real SID in BGP nexthop tracking
When receiving an SRv6 BGP update, the nexthop tracking is used
to find out the reachability of the BGP update.
> # show bgp ipv6 vpn fd00:200::/64
> Paths: (1 available, best #1)
> [..]
> 4:4::4:4 from 4:4::4:4 (4.4.4.4)
> Origin incomplete, metric 0, localpref 100, valid, internal, best (First path received)
> Extended Community: RT:52:100
> Remote label: 16
> Remote SID: 2001:db8:f4::
> Last update: Mon Mar 11 11:50:04 2024
The IPv6 address used is the "Remote SID". Actually, this value is
incomplete. Remote SID stands for the attribute value received in BGP,
while the label value determines a complement of SRv6 SID value. The
transposition technique authorises that in BGP, and in the above case,
the incoming BGP update has used the transposition length.
When there is a transposition in the SID attribute available, use the
real SID address. The nexthop tracking will use that forged address.
> # show bgp nexthop
> Current BGP nexthop cache:
> 4:4::4:4 valid [IGP metric 30], #paths 0, peer 4:4::4:4
> gate fe80::dced:1ff:fed6:878c, if ntfp3
> Last update: Mon Mar 11 11:50:02 2024
> 2001:db8:f4:1:: valid [IGP metric 0], #paths 2
> gate fe80::dced:1ff:fed6:878c, if ntfp3
Fixes: 26c747ed6c0b ("bgpd: extend make_prefix to form srv6-based prefix") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 22 Nov 2024 14:57:25 +0000 (15:57 +0100)]
topotests: bgp_evpn_rt5, add test for advertise route-map service
Use the advertise route-map command, and check that it
filters out correctly the undesirable prefixes. Reversely,
check that undoing that route-map recovers all prefixes.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Tue, 26 Nov 2024 13:19:34 +0000 (14:19 +0100)]
bgpd: fix use single whitespace when displaying flowspec entries
There is an extra space in the 'Displayed' line of show bgp command,
that should not be present.
Fix this by being consistent with the output of the other address
families.
Fixes: ("a1baf9e84f71") bgpd: Use single whitespace when displaying show bgp summary Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Mark Stapp [Mon, 25 Nov 2024 20:37:39 +0000 (15:37 -0500)]
zebra: avoid a race during FPM dplane plugin shutdown
During zebra shutdown, the main pthread and the FPM pthread can
deadlock if the FPM pthread is in fpm_reconnect(). Each pthread
tries to use event_cancel_async() to cancel tasks that may be
scheduled for the other pthread - this leads to a deadlock as
neither thread can progress.
This adds an atomic boolean that's managed as each pthread
enters and leaves the cleanup code in question, preventing the
two threads from running into the deadlock.
Donald Sharp [Thu, 7 Sep 2023 12:06:28 +0000 (08:06 -0400)]
tests: fix max med on startup
The test is failing because on r2 we are looking for a metric of 777
on startup, but the start of looking for this happens to be after
the 5 second delay that is setup in the config.
Notice that the 5 second delay for the max med expires at 29 seconds but the show routes
on r2 does not even begin until 34 seconds, long after the max med has expired and the
test has moved on.
Let's relax the max-med timer to 30 seconds and modify the test to wait a bit longer for
both finding it and expiring timer.
Donald Sharp [Thu, 7 Sep 2023 11:57:26 +0000 (07:57 -0400)]
tests: Fix ospfapi client to clear ospf process
Test is failing locally:
2023-09-06 18:39:56,865 DEBUG: r1: vtysh result:
Hello, this is FRRouting (version 9.1-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
r1# conf t
r1(config)# router ospf
r1(config-router)# ospf router-id 1.1.1.1
For this router-id change to take effect, use "clear ip ospf process" command
r1(config-router)#
2023-09-06 18:39:56,865 DEBUG: root: GOT LINE: 'SUCCESS: 1.0.0.0'
2023-09-06 18:39:56,866 DEBUG: root: GOT LINE: '2023-09-06 18:39:55,982 INFO: TESTER: root: Waiting for 1.1.1.1'
2023-09-06 18:39:56,867 DEBUG: root: GOT LINE: '2023-09-06 18:39:55,982 DEBUG: TESTER: root: expected '1.1.1.1' != '1.0.0.0''
2023-09-06 18:39:56,867 DEBUG: root: GOT LINE: 'waiting on notify'
Sure looks like the router-id is not allowed to be changed because
neighbors have already been formed. If we are changing the router-id
then let's clear the process to allow it to correctly change.
Donald Sharp [Tue, 24 Sep 2024 14:46:11 +0000 (10:46 -0400)]
tests: Add some test cases for snmp
Noticed that we were not really attempting to even test
large swaths of our snmp infrastructure. Let's load
up some very simple configs for those daemons that
FRR supports and ensure that SNMP is working to
some extent.
Donald Sharp [Fri, 20 Sep 2024 01:39:50 +0000 (21:39 -0400)]
zebra: Remove some unused functions on linux build
The functions:
if_get_flags
if_flags_update
if_flags_mangle
are never invoked from a linux netlink build. Put a #ifdef
around those functions so that they are not included on the
linux build as that they are not needed there.
Chirag Shah [Tue, 19 Nov 2024 20:24:30 +0000 (12:24 -0800)]
zebra: EVPN check vxlan oper up in vlan mapping
When VLAN-VNI mapping is updated, do not set the L2VNI up event
if the associated VXLAN device is not up.
This may result in bgp synced remote routes to skip installing
in Zebra and onwards (Kernel).
And finally, reload the configuration
`python3 frr-reload.py --reload /etc/frr/frr.conf`
frr-reload returns the error below:
```
Failed to execute segment-routing srv6 no source-address 1::1 exit exit
"segment-routing -- srv6 -- no source-address 1::1 -- exit -- exit" we failed to remove this command
% Unknown command: no source-address 1::1
Olivier Dugeon [Sat, 23 Nov 2024 17:50:21 +0000 (18:50 +0100)]
ospfd: Correct invalid SR-MPLS output label
When OSPFd starts, there is 2 possible scenarios for Segment Routing:
1/ Routes associated to Prefixes are not yet available i.e. Segment Routing LSA
are received before LSA Type 1. In this case, the function
ospf_sr_nhlfe_update() is triggered when a new SPF is launch. Thus, neighbors
and output label are always synchronise with the routing table.
2/ Routes are already available i.e. LSA Type 1 are received before Segment
Routing LSA, in particular the Router Information which contains the SRGB.
During nhlfe computation, perfixes are leave with incomplete configuration, in
particular, the SR nexthop is set to NULL. If this scenario is handle through
the function update_out_nhlfe (triggered when SRGB is received or modified from
a neighbor node), the output label is not correctly configured as the nexthop
SR node associated to the prefix has been leave to NULL.
This patch correct this problem by calling the function compute_nhlfe() when
the nexthop SR Node associated to the prefix is NULL within the
update_out_nhlfe() function. Thus, we guarantee that the SR prefix is always
correctly configuration indpedently of the scenario i.e. arrival of the
different LSA.
Donald Sharp [Fri, 22 Nov 2024 16:02:15 +0000 (11:02 -0500)]
lib, zebra: Do not have duplicate memory type problems
In zebra_mpls.c it has a usage of MTYPE_NH_LABEL which is
defined in both lib/nexthop.c and zebra/zebra_mpls.c. The
usage in zebra_mpls.c is a realloc. This leads to a crash:
(gdb) bt
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:44
1 __pthread_kill_internal (signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:78
2 __GI___pthread_kill (threadid=126487246404032, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
3 0x0000730a1b442476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
4 0x0000730a1b94fb18 in core_handler (signo=6, siginfo=0x7ffeed1e07b0, context=0x7ffeed1e0680) at lib/sigevent.c:268
5 <signal handler called>
6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:44
7 __pthread_kill_internal (signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:78
8 __GI___pthread_kill (threadid=126487246404032, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
9 0x0000730a1b442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
10 0x0000730a1b4287f3 in __GI_abort () at ./stdlib/abort.c:79
11 0x0000730a1b9984f5 in _zlog_assert_failed (xref=0x730a1ba59480 <_xref.16>, extra=0x0) at lib/zlog.c:789
12 0x0000730a1b8f8908 in mt_count_free (mt=0x576e0edda520 <MTYPE_NH_LABEL>, ptr=0x576e36617b80) at lib/memory.c:74
13 0x0000730a1b8f8a59 in qrealloc (mt=0x576e0edda520 <MTYPE_NH_LABEL>, ptr=0x576e36617b80, size=16) at lib/memory.c:112
14 0x0000576e0ec85e2e in nhlfe_out_label_update (nhlfe=0x576e368895f0, nh_label=0x576e3660e9b0) at zebra/zebra_mpls.c:1462
15 0x0000576e0ec833ff in lsp_install (zvrf=0x576e3655fb50, label=17, rn=0x576e366197c0, re=0x576e3660a590) at zebra/zebra_mpls.c:224
16 0x0000576e0ec87c34 in zebra_mpls_lsp_install (zvrf=0x576e3655fb50, rn=0x576e366197c0, re=0x576e3660a590) at zebra/zebra_mpls.c:2215
17 0x0000576e0ecbb427 in rib_process_update_fib (zvrf=0x576e3655fb50, rn=0x576e366197c0, old=0x576e36619660, new=0x576e3660a590) at zebra/zebra_rib.c:1084
18 0x0000576e0ecbc230 in rib_process (rn=0x576e366197c0) at zebra/zebra_rib.c:1480
19 0x0000576e0ecbee04 in process_subq_route (lnode=0x576e368e0270, qindex=8 '\b') at zebra/zebra_rib.c:2661
20 0x0000576e0ecc0711 in process_subq (subq=0x576e3653fc80, qindex=META_QUEUE_BGP) at zebra/zebra_rib.c:3226
21 0x0000576e0ecc07f9 in meta_queue_process (dummy=0x576e3653fae0, data=0x576e3653fb80) at zebra/zebra_rib.c:3265
22 0x0000730a1b97d2a9 in work_queue_run (thread=0x7ffeed1e3f30) at lib/workqueue.c:282
23 0x0000730a1b96b039 in event_call (thread=0x7ffeed1e3f30) at lib/event.c:1996
24 0x0000730a1b8e4d2d in frr_run (master=0x576e36277e10) at lib/libfrr.c:1232
25 0x0000576e0ec35ca9 in main (argc=7, argv=0x7ffeed1e4208) at zebra/main.c:536
Clearly replacing a label stack is an operation that should be owned by
lib/nexthop.c. So lets move this function into there and have
zebra_mpls.c just call the function to replace the label stack.