Renato Westphal [Fri, 7 Jun 2024 15:03:17 +0000 (12:03 -0300)]
tests: introduce method to update reference data in isis_tilfa_topo1
The isis_tilfa_topo1 topotest is comprehensive and contains a large
amount of reference data. One problem is that, when changes occur,
updating this reference data can be difficult.
To address this problem, this commit introduces a method to
automatically regenerate the reference data by setting the `REGEN_DATA`
environment variable.
When `REGEN_DATA` is set, the topotest regenerates reference data
from the current run instead of comparing against existing reference
data. Note that regenerated data must be manually verified for
correctness.
This commit also simplifies the reference data by replacing all diff
files with complete JSON snapshots.
Renato Westphal [Fri, 7 Jun 2024 13:41:38 +0000 (10:41 -0300)]
tests: rework isis_tilfa_topo1 to fix timing issues
In this topotest, steps 10-15 were added to test the IS-IS switchover
functionality. In short, two cases were tested: switchover after a
link down event and switchover after a BFD down event. Both cases
were tested in sequence on the same router, rt6. This involved the
following steps:
- Setting the SPF delay timer to 15 seconds
- Shutting down the eth-rt5 interface from the switch side
- Testing the post-switchover RIB and LIB (triggered by the link down
event)
- Testing the post-SPF RIB and LIB
- Bringing the eth-rt5 interface back up
- Configuring a BFD session between rt6 and rt5
- Shutting down the eth-rt5 interface from the switch side once again
- Testing the post-switchover RIB and LIB (triggered by the BFD down
event)
- Testing the post-SPF RIB and LIB
Since the time window to test the post-switchover RIB and LIB was too
narrow (10 seconds), these tests were having sporadic failures.
To resolve this problem, we can simplify the switchover test as follows:
- Setting the SPF delay timer to 60 seconds (not 15)
- Disabling "link-detect" on rt6's eth-rt5 interface
- Shutting down the eth-rt5 interface from the switch side
- On rt6, testing the post-switchover RIB and LIB (triggered by the
BFD down event)
- On rt5, testing the post-switchover RIB and LIB (triggered by the
link down event)
Notice how we can test both post-link-down and post-BFD-down switchover
cases simultaneously by having different "link-detect" configurations
on rt5 and rt6. Additionally, by using a larger SPF delay timer, the
time window to test the post-switchover RIB and LIB is much larger
and less prone to sporadic failures.
Christian Hopps [Sat, 8 Jun 2024 19:37:47 +0000 (15:37 -0400)]
ci: do apt-get update before installing required modules
- Use `uname -r` to also install specific module versions since
with github runners the running kernel can become out-dated with
the deployed packages.
Christian Hopps [Thu, 6 Jun 2024 18:08:00 +0000 (14:08 -0400)]
mgmtd: add empty notif xpath map for completeness
New back-end clients may need to add notification static allocations so
we should have it available for those users, rather than requiring the
new user delve into the mgmtd infra and modify it themselves.
Louis Scalbert [Fri, 24 May 2024 14:34:23 +0000 (16:34 +0200)]
zebra: fix show route memory consumption
When displaying a route table in JSON, a table JSON object is storing
all the prefix JSON objects containing the prefix information. This
results in excessive memory allocation for JSON objects, potentially
leading to an out-of-memory error on the machine with large routing
tables.
To Fix the memory consumption issue for the "show ip[v6] route [vrf XX]
json" command, display the prefixes one by one and free the memory of
each JSON object after it has been displayed.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Fri, 24 May 2024 15:06:59 +0000 (17:06 +0200)]
zebra: fix show route vrf all memory consumption
0e2fc3d67f ("vtysh, zebra: Fix malformed json output for multiple vrfs
in command 'show ip route vrf all json'") has been reverted in the
previous commit. Although the fix was correct, it was consuming too muca
memory when displaying large route tables.
A root JSON object was storing all the JSON objects containing the route
tables, each containing their respective prefixes in JSON objects. This
resulted in excessive memory allocation for JSON objects, potentially
leading to an out-of-memory error on the machine.
To Fix the memory consumption issue for the "show ip[v6] route vrf all
json" command, display the tables one by one and free the memory of each
JSON object after it has been displayed.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Rajesh Girada [Mon, 20 May 2024 16:34:41 +0000 (09:34 -0700)]
ospf6d: Handling Topo Change in GR-HELPER mode for max-age lsas
Description:
OSPF6 GR HELPER router should consider as TOPOCHANGE when
it receives lsas with max age and should exit from Helper.
But, it is not exiting from helper because this max age lsa is
considered as duplicated lsa since the sender uses same seq
number for max age lsa from the previous lsa update.
Currently, topo change is not considered for duplicated lsas.
So removed the duplicated check when validating TOPOCHNAGE.
Christian Hopps [Thu, 6 Jun 2024 08:50:05 +0000 (08:50 +0000)]
tests: munet: update to version 0.14.9
Topotest relevant changes:
- add support for `timeout` arg to `cmd_*()`
- handle invalid regexp in CLI commands
- fix long interface name support
Full munet changelog:
munet: 0.14.9: add support for `timeout` arg to `cmd_*()`
munet: 0.14.8: cleanup the cleanup (kill) on launch options
munet: 0.14.7: allow multiple extra commands for shell console init
munet: 0.14.6:
- qemu: gather gcda files where munet can find them
- handle invalid regexp in CLI commands
munet: 0.14.5:
- (podman) pull missing images for containers
- fix long interface name support
- add another router example
munet: 0.14.4: mutest: add color to PASS/FAIL indicators on tty consoles
munet: 0.14.3: Add hostnet node that runs it's commands in the host network namespace.
munet: 0.14.2:
- always fail mutest tests on bad json inputs
- improve ssh-remote for common use-case of connecting to host connected devices
- fix ready-cmd for python v3.11+
munet: 0.14.1: Improved host interface support.
Dave LeRoy [Wed, 5 Jun 2024 17:22:57 +0000 (10:22 -0700)]
nhrpd: cleans up shortcut cache entries on termination
nhrp_shortcut_terminate() previously was just freeing the associated AFI shortcut
RIBs and not addressing existing shortcut cache entries. This cause a use after
free issue in vrf_terminate() later in the terminate sequence
NHRP: Received signal 7 at 1717516286 (si_addr 0x1955d, PC 0x7098786912c0); aborting...
NHRP: zlog_signal+0xf5 709878ad12557fff3d992eb0 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: core_handler+0xb5 709878b0db857fff3d992ff0 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: __sigaction+0x50 7098786425207fff3d993140 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x709878600000)
NHRP: ---- signal ----
NHRP: __lll_lock_wait_private+0x90 7098786912c07fff3d9936d8 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x709878600000)
NHRP: pthread_mutex_lock+0x112 7098786980027fff3d9936e0 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x709878600000)
NHRP: _event_add_read_write+0x63 709878b1f4237fff3d993700 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: zclient_send_message+0xd4 709878b376147fff3d993770 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: nhrp_route_announce+0x1ad 5ab34d63d39d7fff3d993790 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: nhrp_shortcut_cache_notify+0xd8 5ab34d63e7587fff3d99d4e0 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: nhrp_cache_free+0x165 5ab34d632f257fff3d99d510 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: hash_iterate+0x4d 709878ab949d7fff3d99d540 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: nhrp_cache_interface_del+0x37 5ab34d633eb77fff3d99d580 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: nhrp_if_delete_hook+0x26 5ab34d6350d67fff3d99d5a0 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: if_delete_retain+0x3d 709878abcd1d7fff3d99d5c0 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: if_delete+0x4c 709878abd87c7fff3d99d600 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: if_terminate+0x53 709878abda837fff3d99d630 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: vrf_terminate_single+0x24 709878b23c747fff3d99d670 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: nhrp_request_stop+0x34 5ab34d6368447fff3d99d690 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: frr_sigevent_process+0x53 709878b0df537fff3d99d6a0 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: event_fetch+0x6c5 709878b204057fff3d99d6c0 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: frr_run+0xd3 709878ac81637fff3d99d840 /usr/lib/frr/libfrr.so.0 (mapped at 0x709878a00000)
NHRP: main+0x195 5ab34d6319157fff3d99d960 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
NHRP: __libc_init_first+0x90 709878629d907fff3d99d980 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x709878600000)
NHRP: __libc_start_main+0x80 709878629e407fff3d99da20 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x709878600000)
NHRP: _start+0x25 5ab34d631b657fff3d99da70 /usr/lib/frr/nhrpd (mapped at 0x5ab34d621000)
Philippe Guibert [Fri, 13 Jan 2023 14:59:52 +0000 (15:59 +0100)]
bgpd: fix labels in adj-rib-in
In a BGP L3VPN context using ADJ-RIB-IN (ie. enabled with
'soft-reconfiguration inbound'), after applying a deny route-map and
removing it, the remote MPLS label information is lost. As a result, BGP
is unable to re-install the related routes in the RIB.
Philippe Guibert [Fri, 24 Feb 2023 10:53:46 +0000 (11:53 +0100)]
topotests: add bgp test to check the ADJ-RIB-IN label value
The test is done on r2. A BGP update is received on r2, and is
filtered on r2. The RIB of r2 does not have the BGP update stored,
but the ADJ-RIB-IN is yet present. To demonstrate this, if the
inbound route-map is removed, then the BGP update should be copied
from the the ADJ-RIB-IN and added to the RIB with the label
value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Philippe Guibert [Fri, 24 Feb 2023 10:22:14 +0000 (11:22 +0100)]
topotests: add bgp test to check the ADJ-RIB-OUT label value
This test ensures that when r1 changes the label value, then
the new value is automatically propagated to remote peer.
This demonstrates that the ADJ-RIB-OUT to r2 has been correctly
updated.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Mon, 5 Feb 2024 16:05:20 +0000 (17:05 +0100)]
bgpd: check and set extra num_labels
The handling of MPLS labels in BGP faces an issue due to the way labels
are stored in memory. They are stored in bgp_path_info but not in
bgp_adj_in and bgp_adj_out structures. As a consequence, some
configuration changes result in losing labels or even a bgpd crash. For
example, when retrieving routes from the Adj-RIB-in table
("soft-reconfiguration inbound" enabled), labels are missing.
bgp_path_info stores the MPLS labels, as shown below:
To solve those issues, a solution would be to set label data to the
bgp_adj_in and bgp_adj_out structures in addition to the
bgp_path_info_extra structure. The idea is to reference a common label
pointer in all these three structures. And to store the data in a hash
list in order to save memory.
However, an issue in the code prevents us from setting clean data
without a rework. The extra->num_labels field, which is intended to
indicate the number of labels in extra->label[], is not reliably checked
or set. The code often incorrectly assumes that if the extra pointer is
present, then a label must also be present, leading to direct access to
extra->label[] without verifying extra->num_labels. This assumption
usually works because extra->label[0] is set to MPLS_INVALID_LABEL when
a new bgp_path_info_extra is created, but it is technically incorrect.
Cleanup the label code by setting num_labels each time values are set in
extra->label[] and checking extra->num_labels before accessing the
labels.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
TL;DR; When we configure `advertise-all-vni` (in this case), a new BGP instance
is created with the name vrf100, and ASN 50. Next, when we create
`router bgp 100 vrf vrf100`, we look for the BGP instance with the same name
and we found it, but ASNs are different 50 vs. 100.
Every such a new auto created instance is flagged with BGP_VRF_AUTO.
After the fix:
```
router bgp 50
!
address-family l2vpn evpn
advertise-all-vni
exit-address-family
exit
!
router bgp 100 vrf vrf100
exit
!
end
donatas.net(config)# router bgp 51
BGP is already running; AS is 50
donatas.net(config)# router bgp 50
donatas.net(config-router)# router bgp 101 vrf vrf100
BGP is already running; AS is 100
donatas.net(config)# router bgp 100 vrf vrf100
donatas.net(config-router)#
```
Georgi Valkov [Tue, 4 Jun 2024 10:35:54 +0000 (13:35 +0300)]
zebra: fix compilation with GCC14
Fixes:
zebra/zebra_netns_notify.c: In function 'zebra_ns_ready_read':
zebra/zebra_netns_notify.c:266:40: error: implicit declaration of function 'basename' [-Wimplicit-function-declaration]
266 | if (strmatch(VRF_DEFAULT_NAME, basename(netnspath))) {
| ^~~~~~~~
Fixed by including libgen.h, then since basename may modify its
parameter, allocate a copy on the stack, using strdupa, and pass the
temporary string to basename.
According to the man page for basename:
With glibc, one gets the POSIX version of basename() when
<libgen.h> is included, and the GNU version otherwise.
The POSIX version of basename may modify the contents of path,
so we should to pass a copy when calling this function.
zebra: display srv6 encapsulation source-address when configured
The 'show running-config' does not display the ipv6 source address
when a locator is not configured. Fix this by systematically displaying
the ipv6 source address.
Fixes: 6a0956169b31 ("zebra: Add encap source address to SRv6 config write function") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Christian Hopps [Fri, 31 May 2024 17:08:16 +0000 (13:08 -0400)]
ci: only run conflict check on pull-requests
This change will stop this action from running on forked repos.
Previously whenever one pushed a change to one's development branch the
action would "run but skip" which still generated an email notifications
and thus was very annoying. :)
David Ward [Sun, 2 Jun 2024 10:42:23 +0000 (06:42 -0400)]
bgpd: Adjust terminology related to DSCP
The default DSCP used for BGP connections is CS6. The DSCP value is
not part of the TCP header.
When setting the IP_TOS or IPV6_TCLASS socket options, the argument
is not the 6-bit DSCP value, but an 8-bit value for the former IPv4
Type of Service field or IPv6 Traffic Class field, respectively.
Fixes: 425bd64be847 ("bgpd: Allow bgp to control the DSCP session TOS value") Signed-off-by: David Ward <david.ward@ll.mit.edu>
Donald Sharp [Thu, 30 May 2024 19:48:33 +0000 (15:48 -0400)]
pimd: Give a clearer warning when the kernel is not compiled right
When the kernel is not compiled with mroute vrf's enabled it will
fail the call to initialize the vrf. As such let's recognize this
specific error code and output a specific warning to the operator
to help them figure this problem out.
Although the End-of-RIB marker is specified for the purpose of BGP
graceful restart, it is noted that the generation of such a marker
upon completion of the initial update would be useful for routing
convergence in general, and thus the practice is recommended.
Thus, it might be reasonable to send EoR regardless of whether the Graceful Restart
capability is received or not from the peer.
There is no reason to call `igmp_anysource_forward_stop()` inside a call to
`igmp_get_source_by_addr()`; not only it is not expected for a "get" function
to perform such an action, but also the decision to start/stop forwarding is
already handled correctly by pim outside `igmp_get_source_by_addr()`.
That call was left there from the days pim was initially imported into the sources.
The problem/crash was happening because `igmp_find_source_by_addr()` would fail to
find the group/source combo when mixing `(*, G)` and `(S, G)`. When having an existing
flow `(*, G)`, and a new `(S, G)` igmp is received, a new entry is correctly created.
`igmp_anysource_forward_stop(group)` always stops and eventually frees `(*, G)`, even
when the new igmp is `(S, G)`, leaving a bad state. I.e, the new entry for `(S, G)`
causes `(*, G)` to be deleted.
Tested the fix with multiple receivers on the same interface with several ssm and
any source senders and receivers with various combination of start/stop orders and
they all worked correctly.
tests: add a topotest to verify nhrp shortcuts in a redundant nhs topology
Contains 2 testcases. The first does a basic configuration/connectivity.
The second testcase initiates a shortcut through the primary NHS,
verifies shortcut routes are installed. Primary NHS interface brought
down and verify that the shortcut is not impacted. Finally verify that
after the shortcut expires, it is able to be re-established via a backup
NHS.
When nhrpd is shutdown via nhrp_request_stop() the shutdown
sequence was not handling the case where there are active
shortcut routes installed. The zebra client and shortcut rib
were being cleaned up before vrf_terminate() had an opportunity
to delete the active routes.
Iggy Frankovic [Thu, 30 May 2024 11:59:54 +0000 (07:59 -0400)]
ospf6d: Prevent heap-buffer-overflow with unknown type
When parsing a osf6 grace lsa field and we receive an
unknown tlv type, ospf6d was not incrementing the pointer
to get beyond the tlv. Leaving a situation where ospf6d
would parse the packet incorrectly.