Rajasekar Raja [Fri, 4 Apr 2025 20:27:03 +0000 (13:27 -0700)]
staticd: Avoid requesting SRv6 sid from zebra when loc and sid block dont match
Currently, when the locator block and sid block differs, staticd would
still go ahead and request zebra to allocate the SID which it does if
there is atleast one match (from any locators).
Only when staticd tries to install the route, it sees that the locator
block and sid block are different and avoids installing the route.
Fix:
Check if the locator block and sid block match before even requesting
Zebra to allocate one.
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
bgpd: flowspec: remove sizelimit check applied to the wrong length field (issue 18557)
Section 4.1 of RFC8955 defines how the length field of flowspec NLRIs is encoded.
The method use implies a maximum length of 4095 for a single flowspec NLRI.
However, in bgp_flowspec.c, we check the length attribute of the bgp_nlri structure against this maximum value, which actually is the *total* length of all NLRI included in the considered MP_REACH_NLRI path attribute.
Due to this confusion, frr would reject valid announces that contain many flowspec NLRIs, when their cummulative length exceeds 4095, and close the session.
The proposed change removes that check entirely. Indeed, there is no need to check the length field of each invidual NLRI because the method employed make it impossible to encode a length greater than 4095.
`"` was accidentally added, and random tests failures happening.
Fixes: a4f61b78dd382c438ff4fec2fda7450ecc890edf ("tests: Check if routes are marked as stale and retained with N-bit for GR") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
G. Paul Ziemba [Sun, 30 Mar 2025 22:43:04 +0000 (15:43 -0700)]
bgpd: rfapi: track outstanding rib and import timers, free mem at exit
While here, also make "VPN SAFI clear" test wait for clear result
(tests/topotests/bgp_rfapi_basic_sanity{,_config2})
Original RFAPI code relied on the frr timer system to remember
various allocations that were supposed to be freed at future times
rather than manage a parallel database. However, if bgpd is terminated
before the times expire, those pending allocations are marked as
memory leaks, even though they wouldn't be leaks under normal operation.
This change adds some hash tables to track these outstanding
allocations that are associated with pending timers, and uses
those tables to free the allocations when bgpd exits.
Donald Sharp [Sat, 29 Mar 2025 16:02:11 +0000 (12:02 -0400)]
bgpd: Free memory associated with aspath_dup
Fix this:
==3890443== 92 (48 direct, 44 indirect) bytes in 1 blocks are definitely lost in loss record 68 of 98
==3890443== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3890443== by 0x49737B3: qcalloc (memory.c:106)
==3890443== by 0x3EA63B: aspath_dup (bgp_aspath.c:703)
==3890443== by 0x2F5438: route_set_aspath_exclude (bgp_routemap.c:2604)
==3890443== by 0x49BC52A: route_map_apply_ext (routemap.c:2708)
==3890443== by 0x2C1069: bgp_input_modifier (bgp_route.c:1925)
==3890443== by 0x2C9F12: bgp_update (bgp_route.c:5205)
==3890443== by 0x2CF281: bgp_nlri_parse_ip (bgp_route.c:7271)
==3890443== by 0x2A28C7: bgp_nlri_parse (bgp_packet.c:338)
==3890443== by 0x2A7F5C: bgp_update_receive (bgp_packet.c:2448)
==3890443== by 0x2ACCA6: bgp_process_packet (bgp_packet.c:4046)
==3890443== by 0x49EB77C: event_call (event.c:2019)
==3890443== by 0x495FAD1: frr_run (libfrr.c:1247)
==3890443== by 0x208D6D: main (bgp_main.c:557)
Donald Sharp [Fri, 28 Mar 2025 18:58:01 +0000 (14:58 -0400)]
bgpd: On shutdown, unlock table when clearing the bgp metaQ
There are some tables not being freed upon shutdown. This
is happening because the table is being locked as dests
are being put on the metaQ. When in shutdown it was clearing
the MetaQ it was not unlocking the table
Modified the fsm state machine to attempt to not
clear routes on a peer that was not established.
The peer should be not a peer self. We do not want
to ever clear the peer self.
Donald Sharp [Thu, 27 Mar 2025 12:51:05 +0000 (08:51 -0400)]
yang: Limit eigrp to just 1 instance per vrf
Currently EIGRP has built in yang code that expects only
1 ASN used per vrf. Let's just limit the operator from
putting themselves in a bad position by allowing something like
this:
router eigrp 33
....
!
router eigrp 99
...
!
no router eigrp 99 would crash because of assumptions
made in the yang processing.
Let's just hard code that assumption into the EIGRP yang
at the moment such that it will not allow you to enter
a `router eigrp 99` instance at all.
This is purely a software limitation to prevent the code
from violating it's current assumptions. I do not see
much need to support this at this point in time so I
fixed the problem this way instead of having to possibly
touch a bunch of code.
Donald Sharp [Wed, 26 Mar 2025 15:14:57 +0000 (11:14 -0400)]
tests: Modify simple_snmp_test to use frr.conf
The simple_snmp_test was not properly testing
the rip snmp code because of weirdness w/ mgmtd
and non-integrated configs. Modify the whole
test to use a integrated config and voila
ripd is talking snmp again in the test.
Donatas Abraitis [Wed, 26 Mar 2025 12:47:44 +0000 (14:47 +0200)]
tests: Use label 0x800000 instead of 0x000000 for BMP tests
Related-to: 94e2aadf7187d7d695babce21033b5bc8e454f25 ("bgpd: Set the label for MP_UNREACH_NLRI 0x800000 instead of 0x000000") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Martin Buck [Tue, 25 Mar 2025 15:53:12 +0000 (16:53 +0100)]
tests: Fix wait times in test_ospf6_gr_topo1 topotest
Increase wait times to at least the minimum wait time accepted by
topotest.run_and_expect(). Also change poll interval to 1s, no point in
doings this more frequently.
Finally, slightly improve the topology diagram to also include area numbers.
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Donatas Abraitis [Wed, 26 Mar 2025 08:30:52 +0000 (10:30 +0200)]
bgpd: Set the label for MP_UNREACH_NLRI 0x800000 instead of 0x000000
RFC8277 says:
The procedures in [RFC3107] for withdrawing the binding of a label
or sequence of labels to a prefix are not specified clearly and correctly.
=> How to Explicitly Withdraw the Binding of a Label to a Prefix
Suppose a BGP speaker has announced, on a given BGP session, the
binding of a given label or sequence of labels to a given prefix.
Suppose it now wishes to withdraw that binding. To do so, it may
send a BGP UPDATE message with an MP_UNREACH_NLRI attribute. The
NLRI field of this attribute is encoded as follows:
Upon transmission, the Compatibility field SHOULD be set to 0x800000.
Upon reception, the value of the Compatibility field MUST be ignored.
[RFC3107] also made it possible to withdraw a binding without
specifying the label explicitly, by setting the Compatibility field
to 0x800000. However, some implementations set it to 0x000000. In
order to ensure backwards compatibility, it is RECOMMENDED by this
document that the Compatibility field be set to 0x800000, but it is
REQUIRED that it be ignored upon reception.
In FRR case where a single label is used per-prefix, we should send 0x800000,
and not 0x000000.
Donatas Abraitis [Tue, 25 Mar 2025 15:35:41 +0000 (17:35 +0200)]
tests: Check if routes are marked as stale and retained with N-bit for GR
Related-to: b7c657d4e065f310fcf6454abae1a963c208c3b8 ("bgpd: Retain the routes if we do a clear with N-bit set for Graceful-Restart") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Donald Sharp [Tue, 25 Mar 2025 14:43:14 +0000 (10:43 -0400)]
bgpd: Delay processing MetaQ in some events
If the number of peers that are being handled on
the peer connection fifo is greater than 10, that
means we have some network event going on. Let's
allow the packet processing to continue instead
of running the metaQ. This has advantages because
everything else in BGP is only run after the metaQ
is run. This includes best path processing,
installation of the route into zebra as well as
telling our peers about this change. Why does
this matter? It matters because if we are receiving
the same route multiple times we limit best path processing
to much fewer times and consequently we also limit
the number of times we send the route update out and
we install the route much fewer times as well.
Prior to this patch, with 512 peers and 5k routes.
CPU time for bgpd was 3:10, zebra was 3:28. After
the patch CPU time for bgpd was 0:55 and zebra was
0:25.
Here are the prior `show event cpu`:
Event statistics for bgpd:
Showing statistics for pthread default
--------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 20.749 33144 0 395 1 396 0 0 0 T (bgp_generate_updgrp_packets)
0 9936.199 1818 5465 43588 5466 43589 0 0 0 E bgp_handle_route_announcements_to_zebra
0 0.220 84 2 20 3 20 0 0 0 T update_subgroup_merge_check_thread_cb
0 0.058 2 29 43 29 43 0 0 0 E zclient_connect
0 17297.733 466 37119 67428 37124 67429 0 0 0 W zclient_flush_data
1 0.134 7 19 40 20 42 0 0 0 R vtysh_accept
0 151.396 1067 141 1181 142 1189 0 0 0 R vtysh_read
0 0.297 1030 0 14 0 14 0 0 0 T (bgp_routeadv_timer)
0 0.001 1 1 1 2 2 0 0 0 T bgp_sync_label_manager
2 9.374 544 17 261 17 262 0 0 0 R bgp_accept
0 0.001 1 1 1 2 2 0 0 0 T bgp_startup_timer_expire
0 0.012 1 12 12 13 13 0 0 0 E frr_config_read_in
0 0.308 1 308 308 309 309 0 0 0 T subgroup_coalesce_timer
0 4.027 105 38 77 39 78 0 0 0 T (bgp_start_timer)
0 112206.442 1818 61719 84726 61727 84736 0 0 0 TE work_queue_run
0 0.345 1 345 345 346 346 0 0 0 T bgp_config_finish
0 0.710 620 1 6 1 9 0 0 0 W bgp_connect_check
2 39.420 8283 4 110 5 111 0 0 0 R zclient_read
0 0.052 1 52 52 578 578 0 0 0 T bgp_start_label_manager
0 0.452 87 5 90 5 90 0 0 0 T bgp_announce_route_timer_expired
0 185.837 3088 60 537 92 21705 0 0 0 E bgp_event
0 48719.671 4346 11210 78292 11215 78317 0 0 0 E bgp_process_packet
Showing statistics for pthread BGP I/O thread
---------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 321.915 28597 11 86 11 265 0 0 0 W bgp_process_writes
515 115.586 26954 4 121 4 128 0 0 0 R bgp_process_reads
Event statistics for zebra:
Showing statistics for pthread default
--------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 0.109 2 54 62 55 63 0 0 0 T timer_walk_start
1 0.550 11 50 100 50 100 0 0 0 R vtysh_accept
0 112848.163 4441 25410 405489 25413 410127 0 0 0 E zserv_process_messages
0 0.007 1 7 7 7 7 0 0 0 E frr_config_read_in
0 0.005 1 5 5 5 5 0 0 0 T rib_sweep_route
1 573.589 4789 119 1567 120 1568 0 0 0 T wheel_timer_thread
347 30.848 97 318 1367 318 1366 0 0 0 T zebra_nhg_timer
0 0.005 1 5 5 6 6 0 0 0 T zebra_evpn_mh_startup_delay_exp_cb
0 5.404 521 10 38 10 70 0 0 0 T timer_walk_continue
1 1.669 9 185 219 186 219 0 0 0 R zserv_accept
1 0.174 18 9 53 10 53 0 0 0 R msg_conn_read
0 3.028 520 5 47 6 47 0 0 0 T if_zebra_speed_update
0 0.324 274 1 5 1 6 0 0 0 W msg_conn_write
1 24.661 2124 11 359 12 359 0 0 0 R kernel_read
0 73683.333 2964 24859 143223 24861 143239 0 0 0 TE work_queue_run
1 46.649 6789 6 424 7 424 0 0 0 R rtadv_read
0 52.661 85 619 2087 620 2088 0 0 0 R vtysh_read
0 42.660 18 2370 21694 2373 21695 0 0 0 E msg_conn_proc_msgs
0 0.034 1 34 34 35 35 0 0 0 E msg_client_connect_timer
0 2786.938 2300 1211 29456 1219 29555 0 0 0 E rib_process_dplane_results
Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 4875.670 200371 24 770 24 776 0 0 0 E dplane_thread_loop
0 0.059 1 59 59 76 76 0 0 0 E dplane_incoming_request
1 9.640 722 13 4510 15 5343 0 0 0 R dplane_incoming_read
Here are the post `show event cpu` results:
Event statistics for bgpd:
Showing statistics for pthread default
--------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 21297.497 3565 5974 57912 5981 57913 0 0 0 E bgp_process_packet
0 149.742 1068 140 1109 140 1110 0 0 0 R vtysh_read
0 0.013 1 13 13 14 14 0 0 0 E frr_config_read_in
0 0.459 86 5 104 5 105 0 0 0 T bgp_announce_route_timer_expired
0 0.139 81 1 20 2 21 0 0 0 T update_subgroup_merge_check_thread_cb
0 405.889 291687 1 179 1 450 0 0 0 T (bgp_generate_updgrp_packets)
0 0.682 618 1 6 1 9 0 0 0 W bgp_connect_check
0 3.888 103 37 81 38 82 0 0 0 T (bgp_start_timer)
0 0.074 1 74 74 458 458 0 0 0 T bgp_start_label_manager
0 0.000 1 0 0 1 1 0 0 0 T bgp_sync_label_manager
0 0.121 3 40 54 100 141 0 0 0 E bgp_process_conn_error
0 0.060 2 30 49 30 50 0 0 0 E zclient_connect
0 0.354 1 354 354 355 355 0 0 0 T bgp_config_finish
0 0.283 1 283 283 284 284 0 0 0 T subgroup_coalesce_timer
0 29365.962 1805 16269 99445 16273 99454 0 0 0 TE work_queue_run
0 185.532 3097 59 497 94 26107 0 0 0 E bgp_event
1 0.290 8 36 151 37 158 0 0 0 R vtysh_accept
2 9.462 548 17 320 17 322 0 0 0 R bgp_accept
2 40.219 8283 4 128 5 128 0 0 0 R zclient_read
0 0.322 1031 0 4 0 5 0 0 0 T (bgp_routeadv_timer)
0 356.812 637 560 3007 560 3007 0 0 0 E bgp_handle_route_announcements_to_zebra
Showing statistics for pthread BGP I/O thread
---------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
515 62.965 14335 4 103 5 181 0 0 0 R bgp_process_reads
0 1986.041 219813 9 213 9 315 0 0 0 W bgp_process_writes
Event statistics for zebra:
Showing statistics for pthread default
--------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 0.006 1 6 6 7 7 0 0 0 E frr_config_read_in
0 3673.365 2044 1797 259281 1800 261342 0 0 0 E zserv_process_messages
1 651.846 8041 81 1090 82 1233 0 0 0 T wheel_timer_thread
0 38.184 18 2121 21345 2122 21346 0 0 0 E msg_conn_proc_msgs
1 0.651 12 54 112 55 112 0 0 0 R vtysh_accept
0 0.102 2 51 55 51 56 0 0 0 T timer_walk_start
0 202.721 1577 128 29172 141 29226 0 0 0 E rib_process_dplane_results
1 41.650 6645 6 140 6 140 0 0 0 R rtadv_read
1 22.518 1969 11 106 12 154 0 0 0 R kernel_read
0 4.265 48 88 1465 89 1466 0 0 0 R vtysh_read
0 6099.851 650 9384 28313 9390 28314 0 0 0 TE work_queue_run
0 5.104 521 9 30 10 31 0 0 0 T timer_walk_continue
0 3.078 520 5 53 6 55 0 0 0 T if_zebra_speed_update
0 0.005 1 5 5 5 5 0 0 0 T rib_sweep_route
0 0.034 1 34 34 35 35 0 0 0 E msg_client_connect_timer
1 1.641 9 182 214 183 215 0 0 0 R zserv_accept
0 0.358 274 1 6 2 6 0 0 0 W msg_conn_write
1 0.159 18 8 54 9 54 0 0 0 R msg_conn_read
Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs CPU_Warn Wall_Warn Starv_Warn Type Event
0 301.404 7280 41 1878 41 1878 0 0 0 E dplane_thread_loop
0 0.048 1 48 48 49 49 0 0 0 E dplane_incoming_request
1 9.558 727 13 4659 14 5420 0 0 0 R dplane_incoming_read
Donald Sharp [Mon, 24 Mar 2025 18:11:35 +0000 (14:11 -0400)]
zebra: Limit reading packets when MetaQ is full
Currently Zebra is just reading packets off the zapi
wire and stacking them up for processing in zebra
in the future. When there is significant churn
in the network the size of zebra can grow without
bounds due to the MetaQ sizing constraints. This
ends up showing by the number of nexthops in the
system. Reducing the number of packets serviced
to limit the metaQ size to the packets to process
allieviates this problem.
Donald Sharp [Fri, 21 Mar 2025 11:48:50 +0000 (07:48 -0400)]
bgpd: Modify bgp to handle packet events in a FIFO
Current behavor of BGP is to have a event per connection. Given
that on startup of BGP with a high number of neighbors you end
up with 2 * # of peers events that are being processed. Additionally
once BGP has selected the connection this still only comes down
to 512 events. This number of events is swamping the event system
and in addition delaying any other work from being done in BGP at
all because the the 512 events are always going to take precedence
over everything else. The other main events are the handling
of the metaQ(1 event), update group events( 1 per update group )
and the zebra batching event. These are being swamped.
Modify the BGP code to have a FIFO of connections. As new data
comes in to read, place the connection on the end of the FIFO.
Have the bgp_process_packet handle up to 100 packets spread
across the individual peers where each peer/connection is limited
to the original quanta. During testing I noticed that withdrawal
events at very very large scale are taking up to 40 seconds to process
so I added a check for yielding to further limit the number of packets
being processed.
This change also allow for BGP to be interactive again on scale
setups on initial convergence. Prior to this change any vtysh
command entered would be delayed by 10's of seconds in my setup
while BGP was doing other work.
Donald Sharp [Fri, 21 Mar 2025 14:47:14 +0000 (10:47 -0400)]
tests: Expand hold timer to 60 seconds for high_ecmp
The hold timer is 5/20. At load with a very very
large number of routes, the tests are experiencing
some issues with this. Let's just give ourselves
some headroom associated with the receiving
of packets
Shbinging [Fri, 21 Mar 2025 02:57:28 +0000 (02:57 +0000)]
babeld: fix hello packets not sent with configured hello timer
Same issue occurring as previously addressed in https://github.com/FRRouting/frr/pull/9092. The root cause is: "Sending a Hello message before restarting the hello timer to avoid session flaps in case of larger hello interval configurations."
Mark Stapp [Mon, 24 Mar 2025 20:53:32 +0000 (16:53 -0400)]
bgpd: fix handling of configured RTs for l2vni, l3vni
Test for existing explicit config as part of validation of
route-target configuration: allow explicit config of generic/
default AS+VNI, for example, instead of rejecting it.
Modified the bgp_fsm code to dissallow the extension
of the hold time when the system is under extremely
heavy load. This was a attempt to remove the return
code but it was too aggressive and messed up this bit
of code.
zmw12306 [Mon, 24 Mar 2025 19:32:18 +0000 (15:32 -0400)]
babeld: Hop Count must not be 0.
According to RFC 8966:
Hop Count The maximum number of times that this TLV may be forwarded, plus 1. This MUST NOT be 0. Signed-off-by: zmw12306 <zmw12306@gmail.com>
Donald Sharp [Mon, 24 Mar 2025 12:07:02 +0000 (08:07 -0400)]
eigrpd: Cleanup memory issues on shutdown
a) EIGRP was having issues with the prefix created as part
of the topology destination. Make this just a part of the
topology data structure instead of allocating it.
b) EIGRP was not freeing up any memory associated with
the network table. Free it.
c) EIGRP was confusing zebra shutdown as part of the deletion
of the last eigrp data structure. This was inappropriate it
should be part of the `I'm just shutting down`.
d) The QOBJ was not being properly freed, free it.