summaryrefslogtreecommitdiff
path: root/bgpd
AgeCommit message (Collapse)Author
2025-03-15bgpd: Fixed crash upon bgp network import-check commandManpreet Kaur
BT: ``` 3 <signal handler called> 4 0x00005616837546fc in bgp_static_update (bgp=bgp@entry=0x5616865eac50, p=0x561686639e40, bgp_static=0x561686639f50, afi=afi@entry=AFI_IP6, safi=safi@entry=SAFI_UNICAST) at ../bgpd/bgp_route.c:7232 5 0x0000561683754ad0 in bgp_static_add (bgp=0x5616865eac50) at ../bgpd/bgp_table.h:413 6 0x0000561683785e2e in no_bgp_network_import_check (self=<optimized out>, vty=0x5616865e04c0, argc=<optimized out>, argv=<optimized out>) at ../bgpd/bgp_vty.c:4609 7 0x00007fdbcc294820 in cmd_execute_command_real (vline=vline@entry=0x561686663000, ``` The program encountered a SEG FAULT when attempting to access pi->extra->vrfleak->bgp_orig because pi->extra->vrfleak was NULL. ``` (gdb) p pi->extra->vrfleak $1 = (struct bgp_path_info_extra_vrfleak *) 0x0 (gdb) p pi->extra->vrfleak->bgp_orig Cannot access memory at address 0x8 ``` Added NOT NULL check on pi->extra->vrfleak before accessing pi->extra->vrfleak->bgp_orig to prevent the segmentation fault. Signed-off-by: Manpreet Kaur <manpreetk@nvidia.com> (cherry picked from commit bc1008b970541c090e36fc1d50c720df822fcb99)
2025-02-21bgpd: remove dmed check not required in bestpath selectionDonald Sharp
As part of the upstream master commit (f3575f61c7 bgpd: Sort the bgp_path_inf) the snippet of the code for dmed check condition left out, which leads to an issue of selecting incorrect bestpath. As an example: During the bestpath selection local route looses to another path due to dmed condition being hit. The snippet of the logs: 2025/02/20 03:06:20.131441 BGP: [JW7VP-K1YVV] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): Comparing path 27.0.0.7 flags Valid with path Static announcement flags Selected Valid Attr Changed Unsorted 2025/02/20 03:06:20.131445 BGP: [SYTDR-QV6X9] [2]:[0]:[48]:[00:92:00:00:00:10]: path 27.0.0.7 loses to path Static announcement as ES 03:44:38:39:ff:ff:02:00:00:01 is same and local 2025/02/20 03:06:20.131452 BGP: [JW7VP-K1YVV] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): Comparing path 27.0.0.8 flags Valid with path Static announcement flags Selected Valid Attr Changed Unsorted 2025/02/20 03:06:20.131456 BGP: [SYTDR-QV6X9] [2]:[0]:[48]:[00:92:00:00:00:10]: path 27.0.0.8 loses to path Static announcement as ES 03:44:38:39:ff:ff:02:00:00:01 is same and local 2025/02/20 03:06:20.131458 BGP: [WEWEC-8SE72] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): path Static announcement is the bestpath from AS 0 <<<< static is best 2025/02/20 03:06:20.131463 BGP: [Z3A78-GM3G5] bgp_best_selection: [2]:[0]:[48]:[00:92:00:00:00:10](VRF default) pi 27.0.0.7 dmed 2025/02/20 03:06:20.131467 BGP: [Z3A78-GM3G5] bgp_best_selection: [2]:[0]:[48]:[00:92:00:00:00:10](VRF default) pi 27.0.0.8 dmed 2025/02/20 03:06:20.131471 BGP: [N6CTF-2RSKS] [2]:[0]:[48]:[00:92:00:00:00:10](VRF default): After path selection, newbest is path 27.0.0.7 oldbest was Static announce Signed-off-by: Donald Sharp <sharpd@nvidia.com> (cherry picked from commit 83ad94694bc061e1ff5f43db42cba46320e0df73)
2025-02-19Revert "bgpd: release manual vpn label on instance deletion (backport #18121)"Donald Sharp
2025-02-18Merge pull request #18145 from FRRouting/mergify/bp/stable/10.1/pr-18079Russ White
bgpd: Fix crash in bgp_labelpool (backport #18079)
2025-02-18Merge pull request #18156 from FRRouting/mergify/bp/stable/10.1/pr-18121Russ White
bgpd: release manual vpn label on instance deletion (backport #18121)
2025-02-15bgpd: fix vty output of evpn route-target AS4Mark Stapp
evpn route-targets are decoded in ... multiple places; at least two have a bug where the AS4 form doesn't have its AS decoded. Signed-off-by: Mark Stapp <mjs@cisco.com> (cherry picked from commit 9943a08720ccbed87cd6938791066a0de94a92c6)
2025-02-14bgpd: When removing the prefix list drop the pointerDonald Sharp
We are very very rarely seeing this crash: 0 0x7f36ba48e389 in prefix_list_apply_ext lib/plist.c:789 1 0x55eff3fa4126 in subgroup_announce_check bgpd/bgp_route.c:2334 2 0x55eff3fa858e in subgroup_process_announce_selected bgpd/bgp_route.c:3440 3 0x55eff4016488 in subgroup_announce_table bgpd/bgp_updgrp_adv.c:808 4 0x55eff401664e in subgroup_announce_route bgpd/bgp_updgrp_adv.c:861 5 0x55eff40111df in peer_af_announce_route bgpd/bgp_updgrp.c:2223 6 0x55eff3f884cb in bgp_announce_route_timer_expired bgpd/bgp_route.c:5892 7 0x7f36ba4ec239 in event_call lib/event.c:2019 8 0x7f36ba41a22a in frr_run lib/libfrr.c:1295 9 0x55eff3e668b7 in main bgpd/bgp_main.c:557 10 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 11 0x7f36b9e2d304 in __libc_start_main_impl ../csu/libc-start.c:360 12 0x55eff3e64a30 in _start (/home/ci/cibuild.1407/frr-source/bgpd/.libs/bgpd+0x2fda30) 0x608000037038 is located 24 bytes inside of 88-byte region [0x608000037020,0x608000037078) freed by thread T0 here: 0 0x7f36ba8b76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52 1 0x7f36ba439bd7 in qfree lib/memory.c:131 2 0x7f36ba48d3a3 in prefix_list_free lib/plist.c:156 3 0x7f36ba48d3a3 in prefix_list_delete lib/plist.c:247 4 0x7f36ba48fbef in prefix_bgp_orf_remove_all lib/plist.c:1516 5 0x55eff3f679c4 in bgp_route_refresh_receive bgpd/bgp_packet.c:2841 6 0x55eff3f70bab in bgp_process_packet bgpd/bgp_packet.c:4069 7 0x7f36ba4ec239 in event_call lib/event.c:2019 8 0x7f36ba41a22a in frr_run lib/libfrr.c:1295 9 0x55eff3e668b7 in main bgpd/bgp_main.c:557 10 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 previously allocated by thread T0 here: 0 0x7f36ba8b83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 1 0x7f36ba4392e4 in qcalloc lib/memory.c:106 2 0x7f36ba48d0de in prefix_list_new lib/plist.c:150 3 0x7f36ba48d0de in prefix_list_insert lib/plist.c:186 4 0x7f36ba48d0de in prefix_list_get lib/plist.c:204 5 0x7f36ba48f9df in prefix_bgp_orf_set lib/plist.c:1479 6 0x55eff3f67ba6 in bgp_route_refresh_receive bgpd/bgp_packet.c:2920 7 0x55eff3f70bab in bgp_process_packet bgpd/bgp_packet.c:4069 8 0x7f36ba4ec239 in event_call lib/event.c:2019 9 0x7f36ba41a22a in frr_run lib/libfrr.c:1295 10 0x55eff3e668b7 in main bgpd/bgp_main.c:557 11 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Let's just stop trying to save the pointer around in the peer->orf_plist data structure. There are other design problems but at least lets stop the crash from possibly happening. Fixes: #18138 Signed-off-by: Donald Sharp <sharpd@nvidia.com> (cherry picked from commit 3d43d7b78971520854903c11b6aec23754fdca34)
2025-02-13bgpd: release manual vpn label on instance deletionLouis Scalbert
When a BGP instance with a manually assigned VPN label is deleted, the label is not released from the Zebra label registry. As a result, reapplying a configuration with the same manual label leads to VPN prefix export failures. For example, with the following configuration: > router bgp 65000 vrf BLUE > address-family ipv4 unicast > label vpn export <int> Release zebra label registry on unconfiguration. Fixes: d162d5f6f5 ("bgpd: fix hardset l3vpn label available in mpls pool") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> (cherry picked from commit d6363625c35a99933bf60c9cf0b79627b468c9f7) # Conflicts: # bgpd/bgpd.c
2025-02-13bgpd: Fix crash in bgp_labelpoolDonald Sharp
The bgp labelpool code is grabbing the vpn policy data structure. This vpn_policy has a pointer to the bgp data structure. If a item placed on the bgp label pool workqueue happens to sit there for the microsecond or so and the operator issues a `no router bgp...` command that corresponds to the vpn_policy bgp pointer, when the workqueue is run it will crash because the bgp pointer is now freed and something else owns it. Modify the labelpool code to store the vrf id associated with the request on the workqueue. When you wake up if the vrf id still has a bgp pointer allow the request to continue, else drop it. Signed-off-by: Donald Sharp <sharpd@nvidia.com> (cherry picked from commit 14eac319e8ae9314f5270f871106a70c4986c60c)
2025-02-13Merge pull request #18135 from FRRouting/mergify/bp/stable/10.1/pr-18120Donald Sharp
bgpd: fix incorrect JSON in bgp_show_table_rd (backport #18120)
2025-02-12bgpd: fix bfd with update-source in peer-groupLouis Scalbert
Fix BFD session not created when the peer is in update-group with the update-source option. Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-12bgpd: When bgp notices a change to shared_network inform bfd of itDonald Sharp
When bgp is started up and reads the config in *before* it has received interface addresses from zebra, shared_network can be set to false in this case. Later on once bgp attempts to reconnect it will refigure out the shared_network again( because it has received the data from zebra now ). In this case tell bfd about it. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-12bgpd: Allow bfd to work if peer known but interface address not yetDonald Sharp
If bgp is coming up and bgp has not received the interface address yet but bgp has knowledge about a bfd peering, allow it to set the peering data appropriately. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-12bgpd: Update source address for BFD sessionDonatas Abraitis
If BFD is down, we should try to detect the source automatically from the given interface. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-12bgpd: Reset BGP session only if it was a real BFD DOWN eventDonatas Abraitis
Without this patch we always see a double-reset, e.g.: ``` 2024/11/04 12:42:43.010 BGP: [VQY9X-CQZKG] bgp_peer_bfd_update_source: address [0.0.0.0->172.18.0.3] to [172.18.0.2->172.18.0.3] 2024/11/04 12:42:43.010 BGP: [X8BD9-8RKN4] bgp_peer_bfd_update_source: interface none to eth0 2024/11/04 12:42:43.010 BFD: [MSVDW-Y8Z5Q] ptm-del-dest: deregister peer [mhop:no peer:172.18.0.3 local:0.0.0.0 vrf:default cbit:0x00 minimum-ttl:255] 2024/11/04 12:42:43.010 BFD: [NYF5K-SE3NS] ptm-del-session: [mhop:no peer:172.18.0.3 local:0.0.0.0 vrf:default] refcount=0 2024/11/04 12:42:43.010 BFD: [NW21R-MRYNT] session-delete: mhop:no peer:172.18.0.3 local:0.0.0.0 vrf:default 2024/11/04 12:42:43.010 BGP: [P3D3N-3277A] 172.18.0.3 [FSM] Timer (routeadv timer expire) 2024/11/04 12:42:43.010 BFD: [YA0Q5-C0BPV] control-packet: no session found [mhop:no peer:172.18.0.3 local:172.18.0.2 port:11] 2024/11/04 12:42:43.010 BFD: [MSVDW-Y8Z5Q] ptm-add-dest: register peer [mhop:no peer:172.18.0.3 local:172.18.0.2 vrf:default cbit:0x00 minimum-ttl:255] 2024/11/04 12:42:43.011 BFD: [PSB4R-8T1TJ] session-new: mhop:no peer:172.18.0.3 local:172.18.0.2 vrf:default ifname:eth0 2024/11/04 12:42:43.011 BGP: [Q4BCV-6FHZ5] zclient_bfd_session_update: 172.18.0.2/32 -> 172.18.0.3/32 (interface eth0) VRF default(0) (CPI bit no): Down 2024/11/04 12:42:43.011 BGP: [MKVHZ-7MS3V] bfd_session_status_update: neighbor 172.18.0.3 vrf default(0) bfd state Up -> Down 2024/11/04 12:42:43.011 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 172.18.0.3 6/10 (Cease/BFD Down) 0 bytes 2024/11/04 12:42:43.011 BGP: [QFMSE-NPSNN] zclient_bfd_session_update: sessions updated: 1 2024/11/04 12:42:43.011 BGP: [ZWCSR-M7FG9] 172.18.0.3 [FSM] BGP_Stop (Established->Clearing), fd 22 ``` Reset is due to the source address change. With this patch, we reset the session only if it's a _REAL_ BFD down event, which means we trigger session reset if BFD session is established earlier than BGP. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-12bgpd: fix incorrect json in bgp_show_table_rdLouis Scalbert
In bgp_show_table_rd(), the is_last argument is determined using the expression "next == NULL" to check if the RD table is the last one. This helps ensure proper JSON formatting. However, if next is not NULL but is no longer associated with a BGP table, the JSON output becomes malformed. Updates the condition to also verify the existence of the next bgp_dest table. Fixes: 1ae44dfcba ("bgpd: unify 'show bgp' with RD with normal unicast bgp show") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> (cherry picked from commit cf0269649cdd09b8d3f2dd8815caf6ecf9cdeef9)
2025-02-11bgpd: don't reuse nexthop variable in loop/switchDavid Lamparter
While the loop is currently exited in all cases after using nexthop, it is a footgun to have "nh" around to be reused in another iteration of the loop. This would leave nexthop with partial data from the previous use. Make it local where needed instead. Signed-off-by: David Lamparter <equinox@opensourcerouting.org> (cherry picked from commit ce7f5b21221f0b3557d1f4a40793230d8bc4cf02)
2025-02-06Revert "bgpd: Do not ignore auto generated VRF instances when deleting"Donatas Abraitis
This reverts commit 0a923af56dbe43fdb4e9184c3525d0537740aef9.
2025-02-06Revert "bgpd: fix duplicate BGP instance created with unified config"Donatas Abraitis
This reverts commit aba588dd09aa098a88ba1355798c0e784e91ebc8.
2025-02-06Revert "bgpd: fix import vrf creates multiple bgp instances"Donatas Abraitis
This reverts commit 8c187fb4f838d8d8a21f8608c3a510136764b122.
2025-02-06Reapply "bgpd: fix duplicate BGP instance created with unified config"Donatas Abraitis
This reverts commit daa68852a2a78acf103e8ae1127953b2870c6772.
2025-02-06Revert "bgpd: fix duplicate BGP instance created with unified config"Donatas Abraitis
This reverts commit 3abd84ef5be1ef56b66f0e7617f8afab6da6c5cc.
2025-02-05bgpd: fix duplicate BGP instance created with unified configPhilippe Guibert
When running the bgp_evpn_rt5 setup with unified config, memory leak about a non deleted BGP instance happens. > root@ubuntu2204hwe:~/frr/tests/topotests/bgp_evpn_rt5# cat /tmp/topotests/bgp_evpn_rt5.test_bgp_evpn/r1.asan.bgpd.1164105 > > ================================================================= > ==1164105==ERROR: LeakSanitizer: detected memory leaks > > Indirect leak of 12496 byte(s) in 1 object(s) allocated from: > #0 0x7f358eeb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 > #1 0x7f358e877233 in qcalloc lib/memory.c:106 > #2 0x55d06c95680a in bgp_create bgpd/bgpd.c:3405 > #3 0x55d06c95a7b3 in bgp_get bgpd/bgpd.c:3805 > #4 0x55d06c87a9b5 in bgp_get_vty bgpd/bgp_vty.c:603 > #5 0x55d06c68dc71 in bgp_evpn_local_l3vni_add bgpd/bgp_evpn.c:7032 > #6 0x55d06c92989b in bgp_zebra_process_local_l3vni bgpd/bgp_zebra.c:3204 > #7 0x7f358e9e3feb in zclient_read lib/zclient.c:4626 > #8 0x7f358e98082d in event_call lib/event.c:1996 > #9 0x7f358e848931 in frr_run lib/libfrr.c:1232 > #10 0x55d06c60eae1 in main bgpd/bgp_main.c:557 > #11 0x7f358e229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Actually, a BGP VRF Instance is created in auto mode when creating the global BGP instance for the L3 VNI. And again, an other BGP VRF instance is created. Fix this by ensuring that a non existing BGP instance is not present. If it is present, and with auto mode or in hidden mode, then override the AS value. Fixes: f153b9a9b636 ("bgpd: Ignore auto created VRF BGP instances") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-05Revert "bgpd: fix duplicate BGP instance created with unified config"Donatas Abraitis
This reverts commit aba588dd09aa098a88ba1355798c0e784e91ebc8.
2025-02-04bgpd: fix add label support to EVPN AD routesPhilippe Guibert
When peering with an EVPN device from other vendor, FRR acting as route reflector is not able to read nor transmit the label value. Actually, EVPN AD routes completely ignore the label value in the code, whereas in some functionalities like evpn-vpws, it is authorised to carry and propagate label value. Fix this by handling the label value. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-02-04bgpd: Do not start BGP session if BGP identifier is not setDonatas Abraitis
If we have IPv6-only network and no IPv4 addresses at all, then by default 0.0.0.0 is created which is treated as malformed according to RFC 6286. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-04Merge pull request #17998 from FRRouting/mergify/bp/stable/10.1/pr-17992Jafar Al-Gharaibeh
bgpd: fix route-distinguisher in vrf leak json cmd (backport #17992)
2025-02-04bgpd: fix route-distinguisher in vrf leak json cmdChirag Shah
For auto configured value RD value comes as NULL, switching back to original change will ensure to cover for both auto and user configured RD value in JSON. tor-11# show bgp vrf blue ipv4 unicast route-leak json { "vrf":"blue", "afiSafi":"ipv4Unicast", "importFromVrfs":[ "purple" ], "importRts":"10.10.3.11:6", "exportToVrfs":[ "purple" ], "routeDistinguisher":"(null)", <<<<< "exportRts":"10.10.3.11:10" } Signed-off-by: Chirag Shah <chirag@nvidia.com> (cherry picked from commit 892704d07f5286464728720648ad392b485a9966)
2025-02-04bgpd: Do not ignore auto generated VRF instances when deletingDonatas Abraitis
When VRF instance is going to be deleted inside bgp_vrf_disable(), it uses a helper method that skips auto created VRF instances and that leads to STALE issue. When creating a VNI for a particular VRF vrfX with e.g. `advertise-all-vni`, auto VRF instance is created, and then we do `router bgp ASN vrf vrfX`. But when we do a reload bgp_vrf_disable() is called, and we miss previously created auto instance. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-04bgpd: fix import vrf creates multiple bgp instancesPhilippe Guibert
The more the vrf green is referenced in the import bgp command, the more there are instances created. The below configuration shows that the vrf green is referenced twice, and two BGP instances of vrf green are created. The below configuration: > router bgp 99 > [..] > import vrf green > exit > router bgp 99 vrf blue > [..] > import vrf green > exit > router bgp 99 vrf green > [..] > exit > > r4# show bgp vrfs > Type Id routerId #PeersCfg #PeersEstb Name > L3-VNI RouterMAC Interface > DFLT 0 10.0.3.4 0 0 default > 0 00:00:00:00:00:00 unknown > VRF 5 10.0.40.4 0 0 blue > 0 00:00:00:00:00:00 unknown > VRF 6 0.0.0.0 0 0 green > 0 00:00:00:00:00:00 unknown > VRF 6 10.0.94.4 0 0 green > 0 00:00:00:00:00:00 unknown Fix this at import command, by looking at an already present bgp instance. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-02-04bgpd: fix duplicate BGP instance created with unified configPhilippe Guibert
When running the bgp_evpn_rt5 setup with unified config, memory leak about a non deleted BGP instance happens. > root@ubuntu2204hwe:~/frr/tests/topotests/bgp_evpn_rt5# cat /tmp/topotests/bgp_evpn_rt5.test_bgp_evpn/r1.asan.bgpd.1164105 > > ================================================================= > ==1164105==ERROR: LeakSanitizer: detected memory leaks > > Indirect leak of 12496 byte(s) in 1 object(s) allocated from: > #0 0x7f358eeb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 > #1 0x7f358e877233 in qcalloc lib/memory.c:106 > #2 0x55d06c95680a in bgp_create bgpd/bgpd.c:3405 > #3 0x55d06c95a7b3 in bgp_get bgpd/bgpd.c:3805 > #4 0x55d06c87a9b5 in bgp_get_vty bgpd/bgp_vty.c:603 > #5 0x55d06c68dc71 in bgp_evpn_local_l3vni_add bgpd/bgp_evpn.c:7032 > #6 0x55d06c92989b in bgp_zebra_process_local_l3vni bgpd/bgp_zebra.c:3204 > #7 0x7f358e9e3feb in zclient_read lib/zclient.c:4626 > #8 0x7f358e98082d in event_call lib/event.c:1996 > #9 0x7f358e848931 in frr_run lib/libfrr.c:1232 > #10 0x55d06c60eae1 in main bgpd/bgp_main.c:557 > #11 0x7f358e229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Actually, a BGP VRF Instance is created in auto mode when creating the global BGP instance for the L3 VNI. And again, an other BGP VRF instance is created. Fix this by ensuring that a non existing BGP instance is not present. If it is present, and with auto mode or in hidden mode, then override the AS value. Fixes: f153b9a9b636 ("bgpd: Ignore auto created VRF BGP instances") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-01bgpd: With suppress-fib-pending ensure withdrawal is sentDonald Sharp
When you have suppress-fib-pending turned on it is possible to end up in a situation where the prefix is not withdrawn from downstream peers. Here is the timing that I believe is happening: a) have 2 paths to a peer. b) receive a withdrawal from 1 path, set BGP_NODE_FIB_INSTALL_PENDING and send the route install to zebra. c) receive a withdrawal from the other path. d) At this point we have a dest->flags set BGP_NODE_FIB_INSTALL_PENDING old_select the path_info going away, new_select is NULL e) A bit further down we call group_announce_route() which calls the code to see if we should advertise the path. It sees the BGP_NODE_FIB_INSTALL_PENDING flag and says, nope. f) the route is sent to zebra to withdraw, which unsets the BGP_NODE_FIB_INSTALL_PENDING. g) This function winds up and deletes the path_info. Dest now has no path infos. h) BGP receives the route install(from step b) and unsets the BGP_NODE_FIB_INSTALL_PENDING flag i) BGP receives the route removed from zebra (from step f) and unsets the flag again. We know if there is no new_select, let's go ahead and just unset the PENDING flag to allow the withdrawal to go out at the time when the second withdrawal is received. Signed-off-by: Donald Sharp <sharpd@nvidia.com> (cherry picked from commit 4e8eda74ec7d30ba84e7f53f077f4b896728505a)
2025-01-28Revert "bgpd: Handle Addpath capability using dynamic capabilities"Donatas Abraitis
This reverts commit 05cf9d03b345393b8d63ffe9345c42debd8362b6. TL;DR; Handling BGP AddPath capability is not trivial (possible) dynamically. When the sender is AddPath-capable and sends NLRIs encoded with AddPath ID, and at the same time the receiver sends AddPath capability "disable-addpath-rx" (flag update) via dynamic capabilities, both peers are out of sync about the AddPath state. The receiver thinks already he's not AddPath-capable anymore, hence it tries to parse NLRIs as non-AddPath, while they are actually encoded as AddPath. AddPath capability itself does not provide (in RFC) any mechanism on backward compatible way to handle NLRIs if they come mixed (AddPath + non-AddPath). This explains why we have failures in our CI periodically. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-24bgpd: Fix wrong pthread event cancellingDonald Sharp
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258 5 <signal handler called> 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470 14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355 15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610 16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152 17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639 18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111 19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252 20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214 21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180 22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) f 12 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 1428 assert(m->owner == pthread_self()); In this decode the attempt to cancel the connection's events from the wrong thread is causing the crash. Modify the code to create an event on the bm->master to cancel the events for the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-24bgpd: Fix deadlock in bgp_keepalive and master pthreadsDonald Sharp
(gdb) bt 0 futex_wait (private=0, expected=2, futex_word=0x5c438e9a98d8) at ../sysdeps/nptl/futex-internal.h:146 1 __GI___lll_lock_wait (futex=futex@entry=0x5c438e9a98d8, private=0) at ./nptl/lowlevellock.c:49 2 0x00007af16d698002 in lll_mutex_lock_optimized (mutex=0x5c438e9a98d8) at ./nptl/pthread_mutex_lock.c:48 3 ___pthread_mutex_lock (mutex=0x5c438e9a98d8) at ./nptl/pthread_mutex_lock.c:93 4 0x00005c4369c17e70 in _frr_mtx_lock (mutex=0x5c438e9a98d8, func=0x5c4369dc2750 <__func__.265> "bgp_notify_send_internal") at ./lib/frr_pthread.h:258 5 0x00005c4369c1a07a in bgp_notify_send_internal (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000', data=0x0, datalen=0, use_curr=true) at bgpd/bgp_packet.c:928 6 0x00005c4369c1a707 in bgp_notify_send (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_packet.c:1069 7 0x00005c4369bea422 in bgp_stop_with_notify (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1597 8 0x00005c4369c18480 in bgp_packet_add (connection=0x5c438e9a98c0, peer=0x5c438e9b6010, s=0x7af15c06bf70) at bgpd/bgp_packet.c:151 9 0x00005c4369c19816 in bgp_keepalive_send (peer=0x5c438e9b6010) at bgpd/bgp_packet.c:639 10 0x00005c4369bf01fd in peer_process (hb=0x5c438ed05520, arg=0x7af16bdffaf0) at bgpd/bgp_keepalives.c:111 11 0x00007af16dacd8e6 in hash_iterate (hash=0x7af15c000be0, func=0x5c4369bf005e <peer_process>, arg=0x7af16bdffaf0) at lib/hash.c:252 12 0x00005c4369bf0679 in bgp_keepalives_start (arg=0x5c438e0db110) at bgpd/bgp_keepalives.c:214 13 0x00007af16dac9932 in frr_pthread_inner (arg=0x5c438e0db110) at lib/frr_pthread.c:180 14 0x00007af16d694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 15 0x00007af16d726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) The bgp keepalive pthread gets deadlocked with itself and consequently the bgp master pthread gets locked when it attempts to lock the peerhash_mtx, since it is also locked by the keepalive_pthread The keepalive pthread is locking the peerhash_mtx in bgp_keepalives_start. Next the connection->io_mtx mutex in bgp_keepalives_send is locked and then when it notices a problem it invokes bgp_stop_with_notify which relocks the same mutex ( and of course the relock causes it to get stuck on itself ). This generates a deadlock condition. Modify the code to only hold the connection->io_mtx as short as possible. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-21bgpd: Fix for local interface MAC cache issue in 'bgp mac hash' tableKrishnasamy R
Issue: During FRR restart, we fail to add some of the local interface's MAC to the 'bgp mac hash'. Not having local MAC in the hash table can cause lookup issues while receiving EVPN RT-2. Currently, we have code to add local MAC(bgp_mac_add_mac_entry) while handling interface add/up events in BGP(bgp_ifp_up/bgp_ifp_create). But the code 'bgp_mac_add_mac_entry' in bgp_ifp_create is not getting invoked as it is placed under a specific check(vrf->bgp link check). Fix: We can skip this check 'vrf->bgp link existence' as the tenant VRF might not have BGP instance but still we want to cache the tenant VRF local MACs. So keeping this check in bgp_ifp_create inline with bgp_ifp_up. Ticket: #4204154 Signed-off-by: Krishnasamy R <krishnasamyr@nvidia.com> (cherry picked from commit 016528364e686fb3b23a688707bd6ae6c5ea5f41)
2025-01-14bgpd: fix memory leak in bgp_aggregate_install()Enke Chen
Potential memory leak with as-set and matching-MED-only config. Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org> (cherry picked from commit 94ca6ddfae959a08e84a7a5a070f44ddba70f156)
2025-01-14bgpd: apply route-map for aggregate before attribute comparisonEnke Chen
Currently when re-evaluating an aggregate route, the full attribute of the aggregate route is not compared with the existing one in the BGP table. That can result in unnecessary churns (un-install and then install) of the aggregate route when a more specific route is added or deleted, or when the route-map for the aggregate changes. The churn would impact route installation and route advertisement. The fix is to apply the route-map for the aggregate first and then compare the attribute. Here is an example of the churn: debug bgp aggregate prefix 5.5.5.0/24 ! route-map set-comm permit 10 set community 65004:200 ! router bgp 65001 address-family ipv4 unicast redistribute static aggregate-address 5.5.5.0/24 route-map set-comm ! Step 1: ip route 5.5.5.1/32 Null0 Jan 8 10:28:49 enke-vm1 bgpd[285786]: [J7PXJ-A7YA2] bgp_aggregate_install: aggregate 5.5.5.0/24, count 1 Jan 8 10:28:49 enke-vm1 bgpd[285786]: [Y444T-HEVNG] aggregate 5.5.5.0/24: installed Step 2: ip route 5.5.5.2/32 Null0 Jan 8 10:29:03 enke-vm1 bgpd[285786]: [J7PXJ-A7YA2] bgp_aggregate_install: aggregate 5.5.5.0/24, count 2 Jan 8 10:29:03 enke-vm1 bgpd[285786]: [S2EH5-EQSX6] aggregate 5.5.5.0/24: existing, removed Jan 8 10:29:03 enke-vm1 bgpd[285786]: [Y444T-HEVNG] aggregate 5.5.5.0/24: installed --- Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> (cherry picked from commit 22d95f4ba8444171944eab29e99dfa6087813d6f)
2025-01-14Revert "bgpd: Reinstall aggregated routes if using route-maps and it was ↵Enke Chen
changed" This reverts commit ee1986f1b5ae6b94b446b12e1b77cc30d8f5f46d. The fix is incomplete, and is no longer needed with the fix that applies the route-map for an aggregate and then compares the attribute. Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> (cherry picked from commit 74c9d89aaf3df1b583de341169c4cb77eaa1b3b4)
2025-01-10bgpd: use igpmetric in bgp_aigp_metric_total()Enke Chen
Use igpmetric from bgp_path_info in bgp_igp_metric_total() to be consistent with all other cases, e.g., as in bgp_path_info_cmp(). Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> (cherry picked from commit b89e66a3bcd5644278f34ec5899b071066e102a1)
2025-01-07bgpd: fix a bug in peer_allowas_in_set()Enke Chen
Fix a bug in peer_allowas_in_set() so that the config takes effect for peer-group members. Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> (cherry picked from commit bcd10177940223d86cbcfbe1818b2a0b29e0552b)
2024-12-23Merge pull request #17697 from FRRouting/mergify/bp/stable/10.1/pr-17586Donatas Abraitis
bgpd: Validate only affected RPKI prefixes instead of a full RIB (backport #17586)
2024-12-22Merge pull request #17679 from FRRouting/mergify/bp/stable/10.1/pr-17675Jafar Al-Gharaibeh
bgpd: Fix memory leak when creating BMP connection with a source interface (backport #17675)
2024-12-22bgpd: Fix `enforce-first-as` per peer-group removalDonatas Abraitis
If we do `no neighbor PG enforce-first-as`, it wasn't working because the flag was inherited incorrectly for the members of the peer-group. Fixes: 322462920e2a2c8b73191c6eb5157d64cf4a593e ("bgpd: Enable enforce-first-as by default") Closes: https://github.com/FRRouting/frr/issues/17702 Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-20bgpd: Validate only affected RPKI prefixes instead of a full RIBDonatas Abraitis
Before this fix, if rpki_sync_socket_rtr socket returns EAGAIN, then ALL routes in the RIB are revalidated which takes lots of CPU and some unnecessary traffic, e.g. if using BMP servers. With a full feed it would waste 50-80Mbps. Instead we should try to drain an existing pipe (another end), and revalidate only affected prefixes. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org> (cherry picked from commit b0800bfdf04b4fcf48504737ebfe4ba7f05268d3)
2024-12-20bgpd: fix memory leak when reconfiguring a route distinguisherPhilippe Guibert
A memory leak happens when reconfiguring an already configured route distinguisher on an L3VPN BGP instance. Fix this by freeing the previous route distinguisher. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> (cherry picked from commit 0dd96287dda22b79ef6d7424f4e1a8dc92959f92)
2024-12-19bgpd: Fix memory leak when creating BMP connection with a source interfaceDonatas Abraitis
Testing done with: ``` for x in $(seq 1 100000); do vtysh -c 'conf' -c 'router bgp' -c 'bmp targets test' -c 'bmp connect localhost port 123 min-retry 100 max-retry 100 source-interface lo'; done ``` Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org> (cherry picked from commit 7d19cb59cf5b129f61f3c568899343b3f031f9b4)
2024-12-17bgpd: Fix evpn bestpath calculation when path is not establishedDonald Sharp
If you have a bestpath list that looks something like this: <local evpn mac route> <learned from peer out swp60> <learned from peer out swp57> And a network event happens that causes the peer out swp60 to not be in an established state, yet we still have the path_info for the destination for swp60, bestpath will currently end up with this order: <learned from peer out swp60> <local evpn mac route> <learned from peer out swp57> This causes the local evpn mac route to be deleted in zebra( Wrong! ). This is happening because swp60 is skipped in bestpath calculation and not considered to be a path yet it stays at the front of the list. Modify bestpath calculation such that when pulling the unsorted_list together to pull path info's into that list when they are also not in a established state. Signed-off-by: Donald Sharp <sharpd@nvidia.com> (cherry picked from commit 9f88cb56dc0fe7a4ce864f672c6ca420fcd420c2)
2024-12-11bgpd: Fix bgp core with a possible Intf deleteRajasekar Raja
Although trigger unknown, based on the backtrace in one of the internal testing, we do see some delete in the Intf where we can have the peer ifp pointer null and we try to dereference it while trying to install the route leading to a crash Skip updating the ifindex in such cases and since the nexthop is not properly updated, BGP skips sending it to zebra. BackTrace: 0 0x00007faef05e7ebc in ?? () from /lib/x86_64-linux-gnu/libc.so.6 1 0x00007faef0598fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 2 0x00007faef09900dc in core_handler (signo=11, siginfo=0x7ffdde8cb4b0, context=<optimized out>) at lib/sigevent.c:274 3 <signal handler called> 4 0x00005560aad4b7d8 in update_ipv6nh_for_route_install (api_nh=0x7ffdde8cbe94, is_evpn=false, best_pi=0x5560b21187d0, pi=0x5560b21187d0, ifindex=0, nexthop=0x5560b03cb0dc, nh_bgp=0x5560ace04df0, nh_othervrf=0) at bgpd/bgp_zebra.c:1273 5 bgp_zebra_announce_actual (dest=dest@entry=0x5560afcfa950, info=0x5560b21187d0, bgp=0x5560ace04df0) at bgpd/bgp_zebra.c:1521 6 0x00005560aad4bc85 in bgp_handle_route_announcements_to_zebra (e=<optimized out>) at bgpd/bgp_zebra.c:1896 7 0x00007faef09a1c0d in thread_call (thread=thread@entry=0x7ffdde8d7580) at lib/thread.c:2008 8 0x00007faef095a598 in frr_run (master=0x5560ac7e5190) at lib/libfrr.c:1223 9 0x00005560aac65db6 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:557 (gdb) f 4 4 0x00005560aad4b7d8 in update_ipv6nh_for_route_install (api_nh=0x7ffdde8cbe94, is_evpn=false, best_pi=0x5560b21187d0, pi=0x5560b21187d0, ifindex=0, nexthop=0x5560b03cb0dc, nh_bgp=0x5560ace04df0, nh_othervrf=0) at bgpd/bgp_zebra.c:1273 1273 in bgpd/bgp_zebra.c (gdb) p pi->peer->ifp $26 = (struct interface *) 0x0 Ticket :#4203904 Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-12-05bgpd: fix unconfigure asdot neighborPhilippe Guibert
The below command is not successfull on an existing as dot peer > no neighbor 10.0.0.2 remote-as 1.1 > % Create the peer-group or interface first Handle the case where the remote-as argument can be an ASNUM. Fixes: 8079a4138d61 ("lib, bgp: add initial support for asdot format") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>