Upon reconfiguration of the default instance, the prefixes are never set
into a meta queue by mq_add_handler(). They are never processed for
zebra RIB installation and announcements of update/withdraw.
Louis Scalbert [Wed, 12 Feb 2025 11:56:49 +0000 (12:56 +0100)]
bgpd: fix default instance name when un-hiding
When unconfiguring a default BGP instance with VPN SAFI configurations,
the default BGP structure remains but enters a hidden state. Upon
reconfiguration, the instance name incorrectly appears as "VIEW ?"
instead of "VRF default". And the name_pretty pointer
The name_pretty pointer is replaced by another one with the incorrect
name. This also leads to a memory leak as the previous pointer is not
properly freed.
Donald Sharp [Thu, 20 Feb 2025 19:28:15 +0000 (14:28 -0500)]
bgpd: remove dmed check not required in bestpath selection
As part of the upstream master commit (f3575f61c7 bgpd: Sort the
bgp_path_inf) the snippet of the code for dmed check condition
left out, which leads to an issue of selecting incorrect bestpath.
As an example:
During the bestpath selection local route looses to another path due
to dmed condition being hit.
The snippet of the logs:
2025/02/20 03:06:20.131441 BGP: [JW7VP-K1YVV]
[2]:[0]:[48]:[00:92:00:00:00:10](VRF default): Comparing path
27.0.0.7 flags Valid with path Static announcement flags Selected Valid Attr Changed Unsorted
2025/02/20 03:06:20.131445 BGP: [SYTDR-QV6X9] [2]:[0]:[48]:[00:92:00:00:00:10]: path 27.0.0.7 loses to path Static announcement as ES 03:44:38:39:ff:ff:02:00:00:01 is same and local
2025/02/20 03:06:20.131452 BGP: [JW7VP-K1YVV] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): Comparing path 27.0.0.8 flags Valid with path Static announcement flags Selected Valid Attr Changed Unsorted
2025/02/20 03:06:20.131456 BGP: [SYTDR-QV6X9] [2]:[0]:[48]:[00:92:00:00:00:10]: path 27.0.0.8 loses to path Static announcement as ES 03:44:38:39:ff:ff:02:00:00:01 is same and local
2025/02/20 03:06:20.131458 BGP: [WEWEC-8SE72] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): path Static announcement is the bestpath from AS 0 <<<< static is best
2025/02/20 03:06:20.131463 BGP: [Z3A78-GM3G5] bgp_best_selection: [2]:[0]:[48]:[00:92:00:00:00:10](VRF default) pi 27.0.0.7 dmed
2025/02/20 03:06:20.131467 BGP: [Z3A78-GM3G5] bgp_best_selection: [2]:[0]:[48]:[00:92:00:00:00:10](VRF default) pi 27.0.0.8 dmed
2025/02/20 03:06:20.131471 BGP: [N6CTF-2RSKS] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): After path selection, newbest is path 27.0.0.7 oldbest was Static announce
Donald Sharp [Tue, 18 Feb 2025 15:25:47 +0000 (10:25 -0500)]
bgpd: Fix another crash in orf
I was pointed at yet another crash in the orf code. I think it
stems from basicaly the same problem as the last one. Let's just
make sure that the orf_plist is handled appropriately.
Shbinging [Mon, 17 Feb 2025 06:45:05 +0000 (14:45 +0800)]
doc: correct `ip rip split-horizon` command in the RIP documentation.
The previous version incorrectly spelled the command as `ip split-horizon`. The correct command is `ip rip split-horizon`, as indicated in the code at line 675 of rip_cli.c.
Additional machine readable information can be printed via the `extra`
argument.
Example:
```python
log.debug("exit context"), extra={"line": line, "ctx_keys": ctx_keys})
log.error(f"Failed to execute command {' '.join(cmd)}", extra={"cmd": cmd})
```
Signed-off-by: Giovanni Tataranni <g.tataranni@gmail.com>
tests: Fix intermittent failures in `srv6_encap_src_addr` topotest
The `srv6_encap_src_addr` runs a vtysh command to configure the SRv6
encapsulation source address and then immediately invokes an iproute2
command to verify that zebra has set this address in the kernel. There
is no wait between the two operations and the verification is attempted
only once. If the topotest does not find the expected address it fails
immediately.
The problem is that when topotest is run on a heavyily loaded system,
it can take some time for zebra to set the address in the kernel.
In this case, when the topotest checks the kernel address right after
running the vtysh command, it doesn't find the expected address because
zebra hasn't set it yet.
This commit gives zebra some time to configure the address. It keeps to
check that the address is the expected one for about 1 minute. If after
1 minute the address is not the expected one then the test fails.
isisd: Request SRv6 locator after zebra connection
When SRv6 is enabled and an SRv6 locator is specified in the IS-IS
configuration, IS-IS may attempt to request SRv6 locator information from
zebra before the connection is fully established. If this occurs, the
request fails with the following error:
staticd: Failed to register nexthop after networking restart
Problem:
After networking restart, staticd unregistered the nexthop
but failed to register the nexthop again, which caused the
nexthop to remain inactive in zebra for static route.
Fix:
Call to static_zebra_nht_register() from static_install_path() was
removed in 3c05d53bf8defc36acdfe6e78064e068d60c649f. Adding it back
so that staticd can register the nexthop for static routes.
Testing:
After networking restart trigger on h1:
Before fix:
```
h1# show ipv6 route vrf vrf1012
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric,
t - Table-Direct, Z - FRR,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF vrf1012:
S ::/0 [1/0] via 2003:7:2::1, swp1.2 inactive, weight 1, 00:00:39
K>* ::/0 [255/8192] unreachable (ICMP unreachable) (vrf default), 00:00:39
L * 2000:9:12::3/128 is directly connected, vrf1012, 00:00:39
C>* 2000:9:12::3/128 is directly connected, vrf1012, 00:00:39
C>* 2003:7:2::/125 is directly connected, swp1.2, 00:00:37
L>* 2003:7:2::3/128 is directly connected, swp1.2, 00:00:37
C>* fe80::/64 is directly connected, swp1.2, 00:00:37
h1#
```
After fix:
```
h1# show ipv6 route vrf vrf1012
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric,
t - Table-Direct, Z - FRR,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF vrf1012:
S>* ::/0 [1/0] via 2003:7:2::1, swp1.2, weight 1, 00:00:15
K * ::/0 [255/8192] unreachable (ICMP unreachable) (vrf default), 00:00:17
L * 2000:9:12::3/128 is directly connected, vrf1012, 00:00:17
C>* 2000:9:12::3/128 is directly connected, vrf1012, 00:00:17
C>* 2003:7:2::/125 is directly connected, swp1.2, 00:00:15
L>* 2003:7:2::3/128 is directly connected, swp1.2, 00:00:15
```
Christian Hopps [Tue, 11 Feb 2025 07:12:06 +0000 (07:12 +0000)]
lib: nb: call child destroy CBs when YANG container is deleted
Previously the code was only calling the child destroy callbacks if the target
deleted node was a non-presence container. We now add a flag to the callback
structure to instruct northbound to perform the rescursive delete for code that
wishes for this to happen.
- Fix wrong relative path lookup in keychain destroy callback
Philippe Guibert [Thu, 30 Jan 2025 08:14:00 +0000 (09:14 +0100)]
isisd, lib: add some codepoints usually shared with other vendors
Some codepoints can not be read by interoperating with CISCO.
This is because PSP/USP flavor are used by default, and the display of
the isis output has to be adapted.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
David Schweizer [Wed, 12 Feb 2025 12:07:38 +0000 (13:07 +0100)]
lib,zebra: Allow class E prefixes in RIB
Changes allow ipv4 class E addresses and prefixes in the 240.0.0.0/4
range to be configured on interfaces, imported from the kernel routing
table and redistributed as connected routes in zebra by default.
Changes also fix routes with class E prefixes in kernel routing table
getting rejected by zebra during early daemon startup.
Drivin this change in default behavior are cloud providers (with
customers still using obsolete ipv4 protocol, i.e. Azure, AWS) running
out of ip space and abusing class E for addressing instances (announced
via BGP) over tunneling connections back to customers on premise
infrastructure.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Donald Sharp [Fri, 14 Feb 2025 12:55:09 +0000 (07:55 -0500)]
bgpd: When removing the prefix list drop the pointer
We are very very rarely seeing this crash:
0 0x7f36ba48e389 in prefix_list_apply_ext lib/plist.c:789
1 0x55eff3fa4126 in subgroup_announce_check bgpd/bgp_route.c:2334
2 0x55eff3fa858e in subgroup_process_announce_selected bgpd/bgp_route.c:3440
3 0x55eff4016488 in subgroup_announce_table bgpd/bgp_updgrp_adv.c:808
4 0x55eff401664e in subgroup_announce_route bgpd/bgp_updgrp_adv.c:861
5 0x55eff40111df in peer_af_announce_route bgpd/bgp_updgrp.c:2223
6 0x55eff3f884cb in bgp_announce_route_timer_expired bgpd/bgp_route.c:5892
7 0x7f36ba4ec239 in event_call lib/event.c:2019
8 0x7f36ba41a22a in frr_run lib/libfrr.c:1295
9 0x55eff3e668b7 in main bgpd/bgp_main.c:557
10 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
11 0x7f36b9e2d304 in __libc_start_main_impl ../csu/libc-start.c:360
12 0x55eff3e64a30 in _start (/home/ci/cibuild.1407/frr-source/bgpd/.libs/bgpd+0x2fda30)
0x608000037038 is located 24 bytes inside of 88-byte region [0x608000037020,0x608000037078)
freed by thread T0 here:
0 0x7f36ba8b76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
1 0x7f36ba439bd7 in qfree lib/memory.c:131
2 0x7f36ba48d3a3 in prefix_list_free lib/plist.c:156
3 0x7f36ba48d3a3 in prefix_list_delete lib/plist.c:247
4 0x7f36ba48fbef in prefix_bgp_orf_remove_all lib/plist.c:1516
5 0x55eff3f679c4 in bgp_route_refresh_receive bgpd/bgp_packet.c:2841
6 0x55eff3f70bab in bgp_process_packet bgpd/bgp_packet.c:4069
7 0x7f36ba4ec239 in event_call lib/event.c:2019
8 0x7f36ba41a22a in frr_run lib/libfrr.c:1295
9 0x55eff3e668b7 in main bgpd/bgp_main.c:557
10 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
previously allocated by thread T0 here:
0 0x7f36ba8b83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
1 0x7f36ba4392e4 in qcalloc lib/memory.c:106
2 0x7f36ba48d0de in prefix_list_new lib/plist.c:150
3 0x7f36ba48d0de in prefix_list_insert lib/plist.c:186
4 0x7f36ba48d0de in prefix_list_get lib/plist.c:204
5 0x7f36ba48f9df in prefix_bgp_orf_set lib/plist.c:1479
6 0x55eff3f67ba6 in bgp_route_refresh_receive bgpd/bgp_packet.c:2920
7 0x55eff3f70bab in bgp_process_packet bgpd/bgp_packet.c:4069
8 0x7f36ba4ec239 in event_call lib/event.c:2019
9 0x7f36ba41a22a in frr_run lib/libfrr.c:1295
10 0x55eff3e668b7 in main bgpd/bgp_main.c:557
11 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Let's just stop trying to save the pointer around in the peer->orf_plist
data structure. There are other design problems but at least lets
stop the crash from possibly happening.
Fixes: #18138 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Thu, 13 Feb 2025 16:41:27 +0000 (11:41 -0500)]
tools: watchfrr should ignore frr_global_options
watchfrr is currently being started with $frr_global_options
This is problematic as that it has a entirely different cli
than the rest of the daemons and we have no plans to make
this equivalent.
Fixes: #18107 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
David Lamparter [Tue, 28 Sep 2021 12:40:23 +0000 (14:40 +0200)]
pimd: add IGMPv2/MLDv1 immediate-leave
(Somewhat) useful when dealing with an interface that has only one host
attached. Only works for IGMPv2 and MLDv1, other protocol versions have
no leave message.
Co-authored-by: David Lamparter <equinox@opensourcerouting.org> Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Donatas Abraitis [Tue, 11 Feb 2025 19:22:12 +0000 (21:22 +0200)]
zebra: Do not flush an existing vni configuration trying to remove wrong vni
Before:
```
pc.donatas.net(config)# do sh run | include vni
vni 1
pc.donatas.net(config)# no vni 2
pc.donatas.net(config)# do sh run | include vni
pc.donatas.net(config)#
```
Acee Lindem [Thu, 6 Feb 2025 18:36:06 +0000 (18:36 +0000)]
ospfd: Replace LSDB callbacks with LSA Update/Delete hooks.
Replace the LSDB callbacks with LSA update and delete hooks using the
the FRR hook mechanism. Remove redundant callbacks by placing the LSA
update and delete hooks in a single place so that deletes don't need
to be handled by the update hook. Simplify existing OSPF TE and OSPF
API Server callbacks now that there is no ambiguity or redundancy.
Also cleanup the debugging by separating out opaque-lsa debugging
from the overloaded event debugging.
Louis Scalbert [Wed, 12 Feb 2025 12:49:50 +0000 (13:49 +0100)]
bgpd: release manual vpn label on instance deletion
When a BGP instance with a manually assigned VPN label is deleted, the
label is not released from the Zebra label registry. As a result,
reapplying a configuration with the same manual label leads to VPN
prefix export failures.
Louis Scalbert [Wed, 12 Feb 2025 11:50:42 +0000 (12:50 +0100)]
bgpd: fix incorrect json in bgp_show_table_rd
In bgp_show_table_rd(), the is_last argument is determined using the
expression "next == NULL" to check if the RD table is the last one. This
helps ensure proper JSON formatting.
However, if next is not NULL but is no longer associated with a BGP
table, the JSON output becomes malformed.
Updates the condition to also verify the existence of the next bgp_dest
table.
Fixes: 1ae44dfcba ("bgpd: unify 'show bgp' with RD with normal unicast bgp show") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
pimd: Fix for data packet loss when FHR is LHR and RP
Topology:
A single router is acting as the First Hop Router (FHR), Last Hop Router (LHR), and RP.
RC and Issue:
When an upstream S,G is in join state, it sends a register message to the RP.
If the RP has the receiver, it sends a register stop message and switches to the shortest path.
When the register stop message is processed, it removes pimreg, moves to prune,
and starts the reg stop timer.
When the reg stop timer expires, PIM changes S,G state to Join Pending and sends out a NULL
register message to RP. RP receives it and fails to send Reg stop because SPT is not set at that point.
The problem is when the register stop timer pops and state is in Join Pending.
According to https://www.rfc-editor.org/rfc/rfc4601#section-4.4.1,
we need to put back the pimreg reg tunnel into the S,G mroute.
This causes data to be sent to the control plane and subsequently interrupts the line rate.
Fix:
If the router is FHR and RP to the group,
ignore SPT status and send out a register stop message back to the DR (in this context, the same router).
Donald Sharp [Thu, 6 Feb 2025 23:40:30 +0000 (18:40 -0500)]
zebra: Allow fpm_listener to continue to try to read
Currently when the fpm_listener attempts to read say X
bytes it may only get Y( which is less than X ). In this
case we should assume that the dplane_fpm_nl code is just
being slow, as that we know it is possible for it to send
a partial fpm message. Let's just loosen the constraints
a bit and allow data to flow.