summaryrefslogtreecommitdiff
path: root/bgpd/bgp_fsm.c
AgeCommit message (Collapse)Author
2025-04-03Merge pull request #18396 from pguibert6WIND/srv6l3vpn_to_bgp_vrf_redistributeRuss White
Add BGP redistribution in SRv6 BGP
2025-04-01Merge pull request #18450 from donaldsharp/bgp_packet_readsRuss White
Bgp packet reads conversion to a FIFO
2025-03-30bgpd: When shutting down do not clear self peersDonald Sharp
Commit: e0ae285eb8beeef7b43bdadc073d8ae346eaeb6c Modified the fsm state machine to attempt to not clear routes on a peer that was not established. The peer should be not a peer self. We do not want to ever clear the peer self. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25bgpd: Modify bgp to handle packet events in a FIFODonald Sharp
Current behavor of BGP is to have a event per connection. Given that on startup of BGP with a high number of neighbors you end up with 2 * # of peers events that are being processed. Additionally once BGP has selected the connection this still only comes down to 512 events. This number of events is swamping the event system and in addition delaying any other work from being done in BGP at all because the the 512 events are always going to take precedence over everything else. The other main events are the handling of the metaQ(1 event), update group events( 1 per update group ) and the zebra batching event. These are being swamped. Modify the BGP code to have a FIFO of connections. As new data comes in to read, place the connection on the end of the FIFO. Have the bgp_process_packet handle up to 100 packets spread across the individual peers where each peer/connection is limited to the original quanta. During testing I noticed that withdrawal events at very very large scale are taking up to 40 seconds to process so I added a check for yielding to further limit the number of packets being processed. This change also allow for BGP to be interactive again on scale setups on initial convergence. Prior to this change any vtysh command entered would be delayed by 10's of seconds in my setup while BGP was doing other work. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25Merge pull request #18483 from donaldsharp/holdtime_mistakeDonatas Abraitis
bgpd: Fix holdtime not working properly when busy
2025-03-24Merge pull request #18447 from donaldsharp/bgp_clear_batchRuss White
Bgp clear batch
2025-03-24bgpd: Fix holdtime not working properly when busyDonald Sharp
Commit: cc9f21da2218d95567eff1501482ce58e6600f54 Modified the bgp_fsm code to dissallow the extension of the hold time when the system is under extremely heavy load. This was a attempt to remove the return code but it was too aggressive and messed up this bit of code. Put the behavior back that was introduced in: d0874d195d0127009a7d9c06920c52c95319eff9 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-24bgpd: fix check validity of a VPN SRv6 route with modified nexthopPhilippe Guibert
When exporting a VPN SRv6 route, the path may not be considered valid if the nexthop is not valid. This is the case when the 'nexthop vpn export' command is used. The below example illustrates that the VPN path to 2001:1::/64 is not selected, as the expected nexthop to find in vrf10 is the one configured: > # show running-config > router bgp 1 vrf vrf10 > address-family ipv6 unicast > nexthop vpn export 2001::1 > # show bgp ipv6 vpn > [..] > Route Distinguisher: 1:10 > 2001:1::/64 2001::1@4 0 0 65001 i > UN=2001::1 EC{99:99} label=16 sid=2001:db8:1:1:: sid_structure=[40,24,16,0] type=bgp, subtype=5 The analysis indicates that the 2001::1 nexthop is considered. > 2025/03/20 21:47:53.751853 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0 > 2025/03/20 21:47:53.751855 BGP: [VWNP2-DNMFV] Found existing bnc 2001::1/128(0)(VRF vrf10) flags 0x82 ifindex 0 #paths 2 peer 0x0, resolved prefix UNK prefix > 2025/03/20 21:47:53.751856 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is not valid (in VRF vrf10) > 2025/03/20 21:47:53.751857 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change Actually, to check the nexthop validity, only the source path in the VRF has the correct nexthop. Fix this by reusing the source path information instead of the current one. > 2025/03/20 22:43:51.703521 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0 > 2025/03/20 22:43:51.703523 BGP: [VWNP2-DNMFV] Found existing bnc fe80::b812:37ff:fe13:d441/128(0)(VRF vrf10) flags 0x87 ifindex 0 #paths 2 peer 0x0, resolved prefix fe80::/64 > 2025/03/20 22:43:51.703525 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is valid (in VRF vrf10) > 2025/03/20 22:43:51.703526 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-03-17bgpd: Print the real reason why the peer is not accepted (incoming)Donatas Abraitis
If it's suppressed due to BFD down or unspecified connection, we never know the real reason and just say "no AF activated" which is misleading. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-03-12bgpd: batch peer connection error clearingMark Stapp
When peer connections encounter errors, attempt to batch some of the clearing processing that occurs. Add a new batch object, add multiple peers to it, if possible. Do one rib walk for the batch, rather than one walk per peer. Use a handler callback per batch to check and remove peers' path-infos, rather than a work-queue and callback per peer. The original clearing code remains; it's used for single peers. Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-02-28bgpd: Convert bgp_keepalive_send to use a connectionDonald Sharp
The peer is going to eventually have a incoming and outgoing connection. Let's send the data based upon the connection not the peer. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-28bgpd: Add connection direction to debug logsDonald Sharp
Currently the incoming and outgoing connections mix up their logs and there is absolutely no way to tell which way is being talked about when both are operating. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-06Merge pull request #17865 from donaldsharp/coverity_2024_new_hotnessJafar Al-Gharaibeh
Coverity 2024 new hotness
2025-01-29bgpd: Do not start BGP session if BGP identifier is not setDonatas Abraitis
If we have IPv6-only network and no IPv4 addresses at all, then by default 0.0.0.0 is created which is treated as malformed according to RFC 6286. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-19bgpd: Do not show "Waiting for OPEN" as last resetDonatas Abraitis
This is actually not reset, and should be ignored showing it as last reset under `show bgp neighbor`. Fixes: 1e91f1d1193003cb325a2bf595c8a9273740e2f0 ("bgpd: Update failed reason to distinguish some NHT scenario") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-17bgpd: Ensure ibuf count is protected by mutexDonald Sharp
Grab the count of streams in ibuf when it is protected by a mutex. Since this data is written to it in another pthread. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-10bgpd: Only update peer connection information when neededDonald Sharp
Currently bgp is repeatedly grabbing peer connection information. This is a bit overkill. There are two situations: a) Opening a connection to the peer In this case, we know the remote port/address a priori and can get the local information by just asking the OS. b) Peer opening a connection to us. In this case, we know the local port/address a priori and can get the remote information by just asking the OS. Modify the code to just grab this data at the appropriate time. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-10bgpd: su_remote and su_local are properties of the connectionDonald Sharp
su_local and su_remote in the peer can change based upon if we are initiating the remote connection or receiving it. As such we need to treat it as a property of the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-10bgpd: bgp_getsockanme is connection orientedDonald Sharp
Let's make it so. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-18Merge pull request #17599 from ↵Jafar Al-Gharaibeh
opensourcerouting/fix/reduce_default_connect_timer bgpd: Connect retry timer backoff
2024-12-12bgpd: When calling bgp_process, prevent infinite loopDonald Sharp
If we have this construct: for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { ... bgp_process(); } This can induce an infinite loop. This happens because bgp_process will move the unsorted items to the top of the list for handling, as such it is necessary to hold the next pointer to the side to actually look at each possible bgp_path_info. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-11bgpd: Implement connect retry backoffDonatas Abraitis
Instead of starting with a fairly high value of retry, let's try with a lower and increase with a backoff to reach what was a default value (120s). Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-11-26bgpd: peer_active is connection oriented, make it soDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-26bgpd: bgp_getsockname should use connectionDonald Sharp
Let's use the connection associated with the peer instead. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-26bgpd: Modify bgp_connect_in_progress_update_connection to use connectionDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-26bgpd: Modify bgp_udpatesockname to pass in a connectionDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-26bgpd: Fix pattern of usage in bgp_notify_config_changeDonald Sharp
if (BGP_IS_VALID_STATE_FOR_NOTIF(peer->connection->status)) peer_notify_config_change(peer->connection); else bgp_session_reset_safe(peer, &nnode); Let's add a bool return to peer_notify_config_change of whether or not it should call the peer session reset. This simplifies the code a bunch. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-26bgpd: Add `peer_notify_config_change()` functionDonald Sharp
We have about a bajillion tests of if we can notify the peer and then we send a config change notification. Let's just make a function that does this. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-24bgpd: Fix graceful-restart for peer-groupsDonatas Abraitis
Slipped somehow that peer-groups with GR is just completely broken, but it was working before. Strikes again, that we MUST have more and more topotests. Fixes: 15403f521a12b668e87ef8961c78e0ed97c6ff92 ("bgpd: Streamline GR config, act on change immediately") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-11-20bgpd: bgp_connect should return an `enum connect_result`Donald Sharp
This function when it is run by bgp_start is expected to return a `enum connect_result`. But instead the function returns a variety of values that are not really being checked for. Consolidate to a correct choice. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-11bgpd: Do not try to uninstall BFD session if the peer is not establishedDonatas Abraitis
Having something like: ``` neighbor 192.168.1.222 ebgp-multihop 32 neighbor 192.168.1.222 update-source 192.168.1.5 neighbor 192.168.1.222 bfd ``` Won't work and the result is (empty): ``` $ show bfd peers BFD Peers: ``` bgp_stop() is called in BGP FSM multiple times (even at startup) that causes intermediate session interruption when update-source/ebgp-multihop is triggered. With this fix, the ordering does not matter and the BFD session's parameters are updated correctly. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-11-07bgpd: Set LLGR stale routes for all the paths including addpathDonatas Abraitis
Without this patch we set only the first path for the route (if multiple exist) as LLGR stale and stop doing that for the rest of the paths, which is wrong. Fixes: 1479ed2fb35f4a5ae1017201a7ee37ba2727163a ("bgpd: Implement LLGR helper mode") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-10-24bgpd: Fix wrong pthread event cancellingDonald Sharp
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258 5 <signal handler called> 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470 14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355 15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610 16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152 17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639 18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111 19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252 20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214 21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180 22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) f 12 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 1428 assert(m->owner == pthread_self()); In this decode the attempt to cancel the connection's events from the wrong thread is causing the crash. Modify the code to create an event on the bm->master to cancel the events for the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-16*: clang-SA friendly switch-enum-return-stringDavid Lamparter
clang-19's SA complains about unused initializers for this kind of "switch (enum) { return string }" kind of code. Use direct string return values to avoid the issue. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2024-09-12 bgpd: fix 'nexthop_set failed' error message often displayedPhilippe Guibert
The 'nexthop_set failed, resetting connection - intf' log message is often seen when peering with BGP peers. This message has been displayed by introducing a recent fix that extracts the IP/port information of outgoing connections when peering is not yet established. Fix this by separating the update of the socket information from the call to bgp_zebra_nexthop_set(). Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-09-06bgpd: Reduce # of iterations when doing llgrDonald Sharp
Code was scanning a table then identifying a prefix that needed to be modified then calling code that reran bestpath on the entire table again. If you had multiple items that needed processing you would end up scanning and setting the entire table to be scanned multiple times. No bueno. a) We do not need to reprocess items that are not being modified. b) We do not need to walk the entire table multiple times, we have the data that is needed already. Modify the code to just call bgp_process on the interesting nodes. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-22bgpd: global_gr_mode does not need to be set twiceDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-07-31bgpd: Use bgp_session_reset_safe() for GR update all peersDonatas Abraitis
It might cause this use-after-free: ``` ==6523==ERROR: AddressSanitizer: heap-use-after-free on address 0x60300058d720 at pc 0x55f3ab62ab1f bp 0x7ffe5b95a0d0 sp 0x7ffe5b95a0c8 READ of size 8 at 0x60300058d720 thread T0 #0 0x55f3ab62ab1e in bgp_gr_update_mode_of_all_peers bgpd/bgp_fsm.c:2729 #1 0x55f3ab62ab1e in bgp_gr_update_all bgpd/bgp_fsm.c:2779 #2 0x55f3ab73557e in bgp_inst_gr_config_vty bgpd/bgp_vty.c:3037 #3 0x55f3ab74db69 in bgp_graceful_restart bgpd/bgp_vty.c:3130 #4 0x7fc5539a9584 in cmd_execute_command_real lib/command.c:1002 #5 0x7fc5539a98a3 in cmd_execute_command lib/command.c:1061 #6 0x7fc5539a9dcf in cmd_execute lib/command.c:1227 #7 0x7fc553ae493f in vty_command lib/vty.c:616 #8 0x7fc553ae4e92 in vty_execute lib/vty.c:1379 #9 0x7fc553aedd34 in vtysh_read lib/vty.c:2374 #10 0x7fc553ad8a64 in event_call lib/event.c:1995 #11 0x7fc553a0c429 in frr_run lib/libfrr.c:1232 #12 0x55f3ab57b78d in main bgpd/bgp_main.c:555 #13 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #14 0x7fc55342d304 in __libc_start_main_impl ../csu/libc-start.c:360 #15 0x55f3ab5799a0 in _start (/usr/lib/frr/bgpd+0x2e19a0) 0x60300058d720 is located 16 bytes inside of 24-byte region [0x60300058d710,0x60300058d728) freed by thread T0 here: #0 0x7fc553eb76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52 #1 0x7fc553a2b713 in qfree lib/memory.c:130 #2 0x7fc553a0e50d in listnode_free lib/linklist.c:81 #3 0x7fc553a0e50d in list_delete_node lib/linklist.c:379 #4 0x55f3ab7ae353 in peer_delete bgpd/bgpd.c:2796 #5 0x55f3ab7ae91f in bgp_session_reset bgpd/bgpd.c:141 #6 0x55f3ab62ab17 in bgp_gr_update_mode_of_all_peers bgpd/bgp_fsm.c:2752 #7 0x55f3ab62ab17 in bgp_gr_update_all bgpd/bgp_fsm.c:2779 #8 0x55f3ab73557e in bgp_inst_gr_config_vty bgpd/bgp_vty.c:3037 #9 0x55f3ab74db69 in bgp_graceful_restart bgpd/bgp_vty.c:3130 #10 0x7fc5539a9584 in cmd_execute_command_real lib/command.c:1002 #11 0x7fc5539a98a3 in cmd_execute_command lib/command.c:1061 #12 0x7fc5539a9dcf in cmd_execute lib/command.c:1227 #13 0x7fc553ae493f in vty_command lib/vty.c:616 #14 0x7fc553ae4e92 in vty_execute lib/vty.c:1379 #15 0x7fc553aedd34 in vtysh_read lib/vty.c:2374 #16 0x7fc553ad8a64 in event_call lib/event.c:1995 #17 0x7fc553a0c429 in frr_run lib/libfrr.c:1232 #18 0x55f3ab57b78d in main bgpd/bgp_main.c:555 #19 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 previously allocated by thread T0 here: #0 0x7fc553eb83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x7fc553a2ae20 in qcalloc lib/memory.c:105 #2 0x7fc553a0d056 in listnode_new lib/linklist.c:71 #3 0x7fc553a0d85b in listnode_add_sort lib/linklist.c:197 #4 0x55f3ab7baec4 in peer_create bgpd/bgpd.c:1996 #5 0x55f3ab65be8b in bgp_accept bgpd/bgp_network.c:604 #6 0x7fc553ad8a64 in event_call lib/event.c:1995 #7 0x7fc553a0c429 in frr_run lib/libfrr.c:1232 #8 0x55f3ab57b78d in main bgpd/bgp_main.c:555 #9 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 ``` Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-25bgpd: Keep the last reset reason before we reset the peerDonatas Abraitis
If we send a notification, there is no point setting the last_reset, because bgp_notify_send() sets last_reset to PEER_DOWN_NOTIFY_SEND (almost everywhere). Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-25bgpd: Set the last_reset if we change the password alsoDonatas Abraitis
``` donatas.net(config-router)# do show ip bgp summary failed IPv4 Unicast Summary: BGP router identifier 1.1.1.1, local AS number 65001 VRF default vrf-id 0 BGP table version 0 RIB entries 0, using 0 bytes of memory Peers 1, using 24 KiB of memory Neighbor EstdCnt DropCnt ResetTime Reason 127.0.0.1 2 2 00:02:02 Password config change (GoBGP/3.26.0) Displayed neighbors 1 Total number of neighbors 1 ``` Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-24bgpd: Pass a connection struct directly for EVENT_OFF()Donatas Abraitis
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-01bgpd: Refine restarter operation - R-bit & F-bitvivek
Introduce BGP-wide flags to denote if BGP has started gracefully and GR is in progress or not. Use this for setting of the R-bit in the GR capability, and not a timer which is set for any new instance creation. Mark graceful restart is complete when the deferred path selection has been done and route sync with zebra as well as deferred EOR advertisement has been initiated. Introduce a function to check on F-bit setting rather than just base it on configuration. Subsequent commits will extend these functionalities. Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2024-06-27bgpd: Streamline GR config, act on change immediatelyvivek
Streamline the BGP graceful-restart configuration at the global and peer level some more. Similar to many other neighbor capability parameters like MP and ENHE, reset the session immediately upon a change to the configuration. This will be more aligned with the transactional UI model also and will not require a separate 'clear' command to be executed. Note: Peer-group graceful-restart configuration is not yet supported. Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2024-06-26bgpd: avoid clearing routes for peers that were never establishedLoïc Sang
Under heavy system load with many peers in passive mode and a large number of routes, bgpd can enter an infinite loop. This occurs while processing timeout BGP_OPEN messages, which prevents it from accepting new connections. The following log entries illustrate the issue: >bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0 >bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224 >bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224 ... repeating The issue occurs when bgpd handles a massive number of routes in the RIB while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it fails to process these packets promptly, leading the remote peer to close the connection and resend BGP_OPEN packets. When bgpd eventually starts processing these timeout BGP_OPEN packets, it finds the TCP connection closed by the remote peer, resulting in "bgp_stop()" being called. For each timeout peer, bgpd must iterate through the routing table, which is time-consuming and causes new incoming BGP_OPEN packets to timeout, perpetuating the infinite loop. To address this issue, the code is modified to check if the peer has been established at least once before calling "bgp_clear_route_all()". This ensures that routes are only cleared for peers that had a successful session, preventing unnecessary iterations over the routing table for peers that never established a connection. With this change, BGP_OPEN timeout messages may still occur, but in the worst case, bgpd will stabilize. Before this patch, bgpd could enter a loop where it was unable to accpet any new connections. Signed-off-by: Loïc Sang <loic.sang@6wind.com>
2024-05-16bgpd: fix dynamic peer graceful restart race conditionLouis Scalbert
bgp_llgr topotest sometimes fails at step 8: > topo: STEP 8: 'Check if we can see 172.16.1.2/32 after R4 (dynamic peer) was killed' R4 neighbor is deleted on R2 because it fails to re-connect: > 14:33:40.128048 BGP: [HKWM3-ZC5QP] 192.168.3.1 fd -1 went from Established to Clearing > 14:33:40.128154 BGP: [MJ1TJ-HEE3V] 192.168.3.1(r4) graceful restart timer expired > 14:33:40.128158 BGP: [ZTA2J-YRKGY] 192.168.3.1(r4) graceful restart stalepath timer stopped > 14:33:40.128162 BGP: [H917J-25EWN] 192.168.3.1(r4) Long-lived stale timer (IPv4 Unicast) started for 20 sec > 14:33:40.128168 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 172.16.1.2/32 > 14:33:40.128220 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 192.168.3.0/24 > [...] > 14:33:41.138869 BGP: [RGGAC-RJ6WG] 192.168.3.1 [Event] Connect failed 111(Connection refused) > 14:33:41.138906 BGP: [ZWCSR-M7FG9] 192.168.3.1 [FSM] TCP_connection_open_failed (Connect->Active), fd 23 > 14:33:41.138912 BGP: [JA9RP-HSD1K] 192.168.3.1 (dynamic neighbor) deleted (bgp_connect_fail) > 14:33:41.139126 BGP: [P98A2-2RDFE] 192.168.3.1(r4) graceful restart stalepath timer stopped af8496af08 ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in") forgot to modify bgp_connect_fail() Do not delete the peer in bgp_connect_fail() if Non-Stop-Forwarding is in progress. Fixes: af8496af08 ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-05-07Merge pull request #15883 from opensourcerouting/fix/bgpd_gr_fsmRuss White
bgpd: Apply NOOP when doing negative commands for GR operations
2024-04-30bgpd: Print old/new states of graceful restart FSMDonatas Abraitis
To better debug what's going on before/after. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-04-29bgpd: fix covery ID 1585206Philippe Guibert
The return value of bgp_getsockname() should always be checked. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-04-15bgpd: fix addressing information of non established outgoing sessionsPhilippe Guibert
When trying to connect to a BGP peer that does not respons, the 'show bgp neighbors' command does not give any indication on the local and remote addresses used: > # show bgp neighbors > BGP neighbor is 192.0.2.150, remote AS 65500, local AS 65500, internal link > Local Role: undefined > Remote Role: undefined > BGP version 4, remote router ID 0.0.0.0, local router ID 192.0.2.1 > BGP state = Connect > [..] > Connections established 0; dropped 0 > Last reset 00:00:04, Waiting for peer OPEN (n/a) > Internal BGP neighbor may be up to 255 hops away. > BGP Connect Retry Timer in Seconds: 120 > Next connect timer due in 117 seconds > Read thread: off Write thread: off FD used: 27 The addressing information (address and port) are only available when TCP session is established, whereas this information is present at the system level: > root@ubuntu2204:~# netstat -pan | grep 192.0.2.1 > tcp 0 0 192.0.2.1:179 192.0.2.150:38060 SYN_RECV - > tcp 0 1 192.0.2.1:46526 192.0.2.150:179 SYN_SENT 488310/bgpd Add the display for outgoing BGP session, as the information in the getsockname() API provides information for connected streams. When getpeername() API does not give any information, use the peer configuration (destination port is encoded in peer->port). > # show bgp neighbors > BGP neighbor is 192.0.2.150, remote AS 65500, local AS 65500, internal link > Local Role: undefined > Remote Role: undefined > BGP version 4, remote router ID 0.0.0.0, local router ID 192.0.2.1 > BGP state = Connect > [..] > Connections established 0; dropped 0 > Last reset 00:00:16, Waiting for peer OPEN (n/a) > Local host: 192.0.2.1, Local port: 46084 > Foreign host: 192.0.2.150, Foreign port: 179 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-02-29bgpd: Send "Send Hold Timer Expired" on such events notificationDonatas Abraitis
This is required by the current (latest/-02 draft). IANA has registered code 8 for "Send Hold Timer Expired" in the "BGP Error (Notification) Codes" sub-registry under the "Border Gateway Protocol (BGP) Parameters" registry. https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-sendholdtimer Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>