Loïc Sang [Wed, 19 Jun 2024 14:19:22 +0000 (16:19 +0200)]
bgpd: avoid clearing routes for peers that were never established
Under heavy system load with many peers in passive mode and a large
number of routes, bgpd can enter an infinite loop. This occurs while
processing timeout BGP_OPEN messages, which prevents it from accepting
new connections. The following log entries illustrate the issue:
>bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
>bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
>bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating
The issue occurs when bgpd handles a massive number of routes in the RIB
while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it
fails to process these packets promptly, leading the remote peer to
close the connection and resend BGP_OPEN packets.
When bgpd eventually starts processing these timeout BGP_OPEN packets,
it finds the TCP connection closed by the remote peer, resulting in
"bgp_stop()" being called. For each timeout peer, bgpd must iterate
through the routing table, which is time-consuming and causes new
incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.
To address this issue, the code is modified to check if the peer has
been established at least once before calling "bgp_clear_route_all()".
This ensures that routes are only cleared for peers that had a
successful session, preventing unnecessary iterations over the routing
table for peers that never established a connection.
With this change, BGP_OPEN timeout messages may still occur, but in the
worst case, bgpd will stabilize. Before this patch, bgpd could enter a
loop where it was unable to accpet any new connections.
Chirag Shah [Wed, 19 Jun 2024 00:21:49 +0000 (17:21 -0700)]
zebra: fix evpn mh bond member proto reinstall
In case of EVPN MH bond, a member port going in
protodown state due to external reason (one case being linkflap),
frr updates the state correctly but upon manually
clearing external reason trigger FRR to reinstate
protodown without any reason code.
Fix is to ensure if the protodown reason was external
and new state is to have protodown 'off' then do no reinstate
protodown.
Ticket: #3947432
Testing:
switch:#ip link show swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9216 qdisc
pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen
1000
link/ether 1c:34:da:2c:aa:68 brd ff:ff:ff:ff:ff:ff protodown on
protodown_reason <linkflap>
switch:#ip link set swp1 protodown off protodown_reason linkflap off
switch:#ip link show swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9216 qdisc
pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen
1000
link/ether 1c:34:da:2c:aa:68 brd ff:ff:ff:ff:ff:ff
zebra, lib: add locator name in sid notify messages
In the near future, some daemons may only register SIDs. This may be
the case for the pathd daemon when creating SRv6 binding SIDs.
When a locator is getting deleted at ZEBRA level, the daemon may have
an easy way to find out the SIds to unregister to.
This commit proposes to add the locator name to the SID_SRV6_NOTIFY
message whenever possible. Only case when an allocation failure happens,
the locator will not be present. In all other places, the notify API
at procol levels has the locator name extra-parameter.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
zhou-run [Mon, 17 Jun 2024 08:45:09 +0000 (16:45 +0800)]
isisd: After the router switches IS-IS type several times, the neighbor adjacency cannot be established.
1. Router A is configured with "is-type level-1-2", while Router B is configured with "is-type level-1". Only level 1 neighbor entries are present on Router A.
2. After configuring Router B with "is-type level-2-only", both level 1 and level 2 neighbor entries exist on Router A. The state of these entries is UP, and the level 1 neighbor entry is currently aging.
3. Before the level 1 neighbor entry on Router A ages out, configuring Router B with "is-type level-1", both level 1 and level 2 neighbor entries exist on Router A. The level 2 neighbor entry is UP and will age out normally. However, the level 1 neighbor entry remains in the Initializing state, preventing the establishment of level 1 neighbor adjacency between Router A and Router B.
When the adjacency type of the link is switched in function isis_circuit_is_type_set, the function circuit_resign_level() is called to delete the old level's circuit->u.bc.lan_neighs linked list. If the old level is not level-1-2, the function circuit_commence_level() is called to create a new level's circuit->u.bc.lan_neighs linked list, but neither of these functions handle the circuit->u.bc.adjdb linked list. This leads to a situation where upon receiving hello packets again before the circuit->u.bc.adjdb linked list entries age out, the circuit->u.bc.lan_neighs linked list is not constructed based on the circuit->u.bc.adjdb linked list. As a result, the hello packets sent will consistently lack an SNPA, causing the neighbor to remain unable to establish an adjacency upon receiving the hello packets.
Donald Sharp [Fri, 14 Jun 2024 17:36:51 +0000 (13:36 -0400)]
zebra: Prevent starvation in dplane_thread_loop
When removing a large number of routes, the linux kernel can take the
cpu for an extended amount of time, leaving a situation where FRR
detects a starvation event.
Let's modify the loop in dplane_thread_loop such that after a provider
has been run, check to see if the event should yield, if so, stop
and reschedule this for the future.
Donald Sharp [Thu, 13 Jun 2024 19:30:00 +0000 (15:30 -0400)]
zebra: Use built in data structure counter
Instead of keeping a counter that is independent
of the queue's data structure. Just use the queue's
built-in counter. Ensure that it's pthread safe by
keeping it wrapped inside the mutex for adding/deleting
to the queue.
Currently, when a locator is deleted in zebra, zebra notifies only the
zclient that owns the locator.
With the introduction of SID Manager, the locator is no longer owned by
any client. Instead, the locator is owned by Zebra, and clients can
allocate and release SIDs from the locator using the ZAPI
ZEBRA_SRV6_MANAGER_GET_SID and ZEBRA_SRV6_MANAGER_RELEASE_SID.
Therefore, when a locator is removed in Zebra, we need to notify all
daemons so that they can release/uninstall the SIDs allocated by that
locator.
Previous commits introduced two new ZAPI operations,
`ZEBRA_SRV6_MANAGER_GET_SRV6_SID` and
`ZEBRA_SRV6_MANAGER_RELEASE_SRV6_SID`. These operations allow a daemon
to interact with the SRv6 SID Manager to get and release an SRv6 SID,
respectively.
This commit extends the SID Manager by adding logic to process the
requests `ZEBRA_SRV6_MANAGER_GET_SRV6_SID` and
`ZEBRA_SRV6_MANAGER_RELEASE_SRV6_SID`, and allocate/release SIDs to
requesting daemons.
Add functions to allocate/release SRv6 SIDs. SIDs can be allocated
either explicitly (allocate a specific SID) or dynamically (allocate any
available SID).
The previous commits introduced a new operation,
`ZEBRA_SRV6_MANAGER_GET_LOCATOR`, allowing a daemon to request
information about a specific SRv6 locator from the SRv6 SID Manager.
This commit extends the SID Manager to respond to a
`ZEBRA_SRV6_MANAGER_GET_LOCATOR` request and provide the requested
locator information.
Add two new ZAPI operations: `ZEBRA_SRV6_MANAGER_GET_SRV6_SID` and
`ZEBRA_SRV6_MANAGER_RELEASE_SRV6_SID`. These APIs allow a daemon to get and
release an SRv6 SID, respectively.
Add a new ZAPI operation, ZEBRA_SRV6_MANAGER_GET_LOCATOR, which allows a
daemon to request information about a specific locator from the SRv6 SID
Manager.
Add the CLI to choose the SID format of a locator. When the SID format
of a locator is changed, the SIDs allocated from that locator might no
longer be valid (for example, because the new format might involve a
different SID allocation schema). In such a case, it is necessary to
notify all the zclients so that they can withdraw/uninstall the old SIDs
that use the previous format and allocate/install/advertise the new SIDs
based on the new format.
An SRv6 block is an IPv6 prefix from which SIDs are allocated. This
commit adds support for SRv6 SID blocks. Specifically, it adds a data
structure to store information about an SRv6 block (e.g., its occupancy
status, which SIDs have been allocated and which are available, which
SID format is used for that block, etc.). It also adds some functions to
manage the block (allocate / free / lookup).
These functions will be used in the next commits to support the
allocation of SIDs from a block in the SID Manager.
Add functionalities to manage SRv6 SID formats (register / unregister /
lookup) and create two SID formats upon SRv6 Manager initialization:
`uncompressed-f4024` and `usid-f3216`.
In future commits, we will add the CLI to allow the user to choose
between the two formats.
Direct leak of 1008 byte(s) in 14 object(s) allocated from:
0 0x7f9ea0dc7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
1 0x7f9ea034d51f in json_object_new_object (/lib/x86_64-linux-gnu/libjson-c.so.3+0x351f)
2 0x564b56d0fed6 in show_ip_ospf_interface_common ospfd/ospf_vty.c:4011
3 0x564b56d1068c in show_ip_ospf_interface ospfd/ospf_vty.c:4285
4 0x7f9ea06fe1c0 in cmd_execute_command_real lib/command.c:1002
5 0x7f9ea06fe684 in cmd_execute_command lib/command.c:1060
6 0x7f9ea06feb03 in cmd_execute lib/command.c:1227
7 0x7f9ea08415b2 in vty_command lib/vty.c:616
8 0x7f9ea0841a5d in vty_execute lib/vty.c:1379
9 0x7f9ea084b367 in vtysh_read lib/vty.c:2374
10 0x7f9ea08350cd in event_call lib/event.c:2011
11 0x7f9ea0764386 in frr_run lib/libfrr.c:1217
12 0x564b56c25b18 in main ospfd/ospf_main.c:295
13 0x7f9e9fd5bc86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 7168 byte(s) in 14 object(s) allocated from:
0 0x7f9ea0dc7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
1 0x7f9ea0350fa4 in lh_table_new (/lib/x86_64-linux-gnu/libjson-c.so.3+0x6fa4)
Indirect leak of 1232 byte(s) in 14 object(s) allocated from:
0 0x7f9ea0dc7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
1 0x7f9ea0350f82 in lh_table_new (/lib/x86_64-linux-gnu/libjson-c.so.3+0x6f82)
SUMMARY: AddressSanitizer: 9408 byte(s) leaked in 42 allocation(s).
***********************************************************************************
```
Fixes: e24ff4c275f0729f75be9f68d08be80ac1e0ec56 ("ospfd: Drop `interfaceIp` from `show ip ospf neigh json") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Error type: validation
Error description: YANG error(s):
Path: Data location "/frr-interface:lib/interface[name='x']/frr-isisd:isis/metric/level-1".
Error: Must condition ". < 64 or /frr-isisd:isis/instance[area-tag = current()/../../area-tag]/metric-style = 'wide' or not(/frr-isisd:isis/instance[area-tag = current()/../../area-tag]/metric-style)" not satisfied.
Path: Data location "/frr-interface:lib/interface[name='x']/frr-isisd:isis/metric/level-2".
Error: Must condition ". < 64 or /frr-isisd:isis/instance[area-tag = current()/../../area-tag]/metric-style = 'wide' or not(/frr-isisd:isis/instance[area-tag = current()/../../area-tag]/metric-style)" not satisfied
```
Add a topotest that ensures that when addpath is enabled and two
paths with same nexthop are received, they are sent to ZEBRA which
detects 'duplicate nexthop'.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Mon, 10 Jun 2024 06:38:22 +0000 (08:38 +0200)]
topotests: add API to detect if iproute2 is json capable
Some tests may want to use the json facility of iproute2 to
dump some results.
Add an internal API in lib/topotest.py that tells whether iproute2
is json capable or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 30 May 2024 13:47:11 +0000 (15:47 +0200)]
bgpd: fix do not skip paths with same nexthop
Under a setup where two BGP prefixes are available from multiple sources,
if one of the two prefixes is recursive over the other BGP prefix, then
it will not be considered as multipath. The below output shows the two
prefixes 192.0.2.24/32 and 192.0.2.21/32. The 192.0.2.[5,6,8] are the
known IP addresses visible from the IGP.
> # show bgp ipv4 192.0.2.24/32
> *>i 192.0.2.24/32 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> # show bgp ipv4 192.0.2.21/32
> *>i 192.0.2.21/32 192.0.2.5 0 100 0 i
> *=i 192.0.2.6 0 100 0 i
> *=i 192.0.2.8 0 100 0 i
The bgp best selection algorithm refuses to consider the paths to
'192.0.2.24/32' as multipath, whereas the BGP paths which use the
BGP peer as nexthop are considered multipath.
> ... has the same nexthop as the bestpath, skip it ...
Previously, this condition has been added to prevent ZEBRA from
installing routes with same nexthop:
> Here you can see the two paths with nexthop 210.2.2.2
> superm-redxp-05# show ip route 2.23.24.192/28
> Routing entry for 2.23.24.192/28
> Known via "bgp", distance 20, metric 0, best
> Last update 00:32:12 ago
> * 210.2.2.2, via swp3
> * 210.2.0.2, via swp1
> * 210.2.1.2, via swp2
> * 210.2.2.2, via swp3
> [..]
But today, ZEBRA knows how to handle it. When receiving incoming routes,
nexthop groups are used. At creation, duplicated nexthops are
identified, and will not be installed. The below output illustrate the
duplicate paths to 172.16.0.200 received by an other peer.
> r1# show ip route 172.18.1.100 nexthop-group
> Routing entry for 172.18.1.100/32
> Known via "bgp", distance 200, metric 0, best
> Last update 00:03:03 ago
> Nexthop Group ID: 75757580
> 172.16.0.200 (recursive), weight 1
> * 172.31.0.3, via r1-eth1, label 16055, weight 1
> * 172.31.2.4, via r1-eth2, label 16055, weight 1
> * 172.31.0.3, via r1-eth1, label 16006, weight 1
> * 172.31.2.4, via r1-eth2, label 16006, weight 1
> * 172.31.8.7, via r1-eth4, label 16008, weight 1
> 172.16.0.200 (duplicate nexthop removed) (recursive), weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16055, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16055, weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16006, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16006, weight 1
> 172.31.8.7, via r1-eth4 (duplicate nexthop removed), label 16008, weight 1
Fix this by proposing to let ZEBRA handle this duplicate decision.
Fixes: 7dc9d4e4e360 ("bgp may add multiple path entries with the same nexthop") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Dave LeRoy [Wed, 5 Jun 2024 19:10:11 +0000 (12:10 -0700)]
nhrpd: add cisco-authentication password support
Taking over this development from https://github.com/FRRouting/frr/pull/14788
This commit addresses 4 issues found in the previous PR
1) FRR would accept messages from a spoke without authentication when FRR NHRP had auth configured.
2) The error indication was not being sent in network byte order
3) The debug print in nhrp_connection_authorized was not correctly printing the received password
4) The addresses portion of the mandatory part of the error indication was invalid on the wire (confirmed in wireshark)
Signed-off-by: Dave LeRoy <dleroy@labn.net> Co-authored-by: Volodymyr Huti <volodymyr.huti@gmail.com>
Renato Westphal [Fri, 7 Jun 2024 15:03:17 +0000 (12:03 -0300)]
tests: introduce method to update reference data in isis_tilfa_topo1
The isis_tilfa_topo1 topotest is comprehensive and contains a large
amount of reference data. One problem is that, when changes occur,
updating this reference data can be difficult.
To address this problem, this commit introduces a method to
automatically regenerate the reference data by setting the `REGEN_DATA`
environment variable.
When `REGEN_DATA` is set, the topotest regenerates reference data
from the current run instead of comparing against existing reference
data. Note that regenerated data must be manually verified for
correctness.
This commit also simplifies the reference data by replacing all diff
files with complete JSON snapshots.
Renato Westphal [Fri, 7 Jun 2024 13:41:38 +0000 (10:41 -0300)]
tests: rework isis_tilfa_topo1 to fix timing issues
In this topotest, steps 10-15 were added to test the IS-IS switchover
functionality. In short, two cases were tested: switchover after a
link down event and switchover after a BFD down event. Both cases
were tested in sequence on the same router, rt6. This involved the
following steps:
- Setting the SPF delay timer to 15 seconds
- Shutting down the eth-rt5 interface from the switch side
- Testing the post-switchover RIB and LIB (triggered by the link down
event)
- Testing the post-SPF RIB and LIB
- Bringing the eth-rt5 interface back up
- Configuring a BFD session between rt6 and rt5
- Shutting down the eth-rt5 interface from the switch side once again
- Testing the post-switchover RIB and LIB (triggered by the BFD down
event)
- Testing the post-SPF RIB and LIB
Since the time window to test the post-switchover RIB and LIB was too
narrow (10 seconds), these tests were having sporadic failures.
To resolve this problem, we can simplify the switchover test as follows:
- Setting the SPF delay timer to 60 seconds (not 15)
- Disabling "link-detect" on rt6's eth-rt5 interface
- Shutting down the eth-rt5 interface from the switch side
- On rt6, testing the post-switchover RIB and LIB (triggered by the
BFD down event)
- On rt5, testing the post-switchover RIB and LIB (triggered by the
link down event)
Notice how we can test both post-link-down and post-BFD-down switchover
cases simultaneously by having different "link-detect" configurations
on rt5 and rt6. Additionally, by using a larger SPF delay timer, the
time window to test the post-switchover RIB and LIB is much larger
and less prone to sporadic failures.
Christian Hopps [Sat, 8 Jun 2024 19:37:47 +0000 (15:37 -0400)]
ci: do apt-get update before installing required modules
- Use `uname -r` to also install specific module versions since
with github runners the running kernel can become out-dated with
the deployed packages.
Christian Hopps [Thu, 6 Jun 2024 18:08:00 +0000 (14:08 -0400)]
mgmtd: add empty notif xpath map for completeness
New back-end clients may need to add notification static allocations so
we should have it available for those users, rather than requiring the
new user delve into the mgmtd infra and modify it themselves.
Louis Scalbert [Fri, 24 May 2024 14:34:23 +0000 (16:34 +0200)]
zebra: fix show route memory consumption
When displaying a route table in JSON, a table JSON object is storing
all the prefix JSON objects containing the prefix information. This
results in excessive memory allocation for JSON objects, potentially
leading to an out-of-memory error on the machine with large routing
tables.
To Fix the memory consumption issue for the "show ip[v6] route [vrf XX]
json" command, display the prefixes one by one and free the memory of
each JSON object after it has been displayed.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Louis Scalbert [Fri, 24 May 2024 15:06:59 +0000 (17:06 +0200)]
zebra: fix show route vrf all memory consumption
0e2fc3d67f ("vtysh, zebra: Fix malformed json output for multiple vrfs
in command 'show ip route vrf all json'") has been reverted in the
previous commit. Although the fix was correct, it was consuming too muca
memory when displaying large route tables.
A root JSON object was storing all the JSON objects containing the route
tables, each containing their respective prefixes in JSON objects. This
resulted in excessive memory allocation for JSON objects, potentially
leading to an out-of-memory error on the machine.
To Fix the memory consumption issue for the "show ip[v6] route vrf all
json" command, display the tables one by one and free the memory of each
JSON object after it has been displayed.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>