summaryrefslogtreecommitdiff
path: root/bgpd/bgp_mpath.c
AgeCommit message (Collapse)Author
2024-10-03bgpd: Print debug message about reaching maximum allowed multi pathsDonatas Abraitis
Fixes: 421cf856ef86db250a86be01437d0a668b463dcc ("bgpd: Cleanup multipath figuring out in bgp") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-10-02bgpd: Remove unused bgp_mp_dmed_deselect functionDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-01bgpd: Remove bgp_path_info_mpath_dequeueDonald Sharp
This function is no doing any work. Let's remove. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-01bgpd: Cleanup multipath figuring out in bgpDonald Sharp
Currently bgp multipath has these properties: a) mp_info may or may not be on a single path, based upon path perturbations in the past. b) mp_info->count started counting at 0( meaning 1 ). As that the bestpath path_info was never included in the count c) The first mp_info in the list held the multipath data associated with the multipath. As such if you were at any other node that data was not filled in. d) As such the mp_info's that are not first on the list basically were just pointers to the corresponding bgp_path_info that was in the multipath. e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was created. f) This linklist was passed in to a comparison function that took the old mpinfo list and compared it item by item to the linklist and doing magic to figure out how to create a new mp_info list. g) the old mp_info and the link list had to be memory managed and freed up. h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the multipath. This is really complicated. Let's change the algorithm to this: a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as BGP_PATH_MULTIPATH_NEW. b) When running multipath, just walk the list of bgp_path_info's and if it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH. If we run out of space to put in the ecmp, clear the flag on the rest. c) Clean up the counting of sometimes adding 1 to the mpath count. d) Only allocate a mpath_info node for the bestpath. Clean it up when done with it. e) remove the unneeded list management associated with the linklist and the mp_list. This greatly simplifies multipath computation for bgp and reduces memory load for large scale deployments. 2 full feeds in work_queue_run prior: 0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run BGP multipath info : 1941844 48 110780992 1941844 110780992 2 full feeds in work_queue_run after change: 1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run BGP multipath info : 970860 32 38836880 970866 38837120 Aproximately 4 seconds of saved cpu time for convergence and ~75 mb smaller run time. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-01bgpd: Ensure mpath data is only on bestpathDonald Sharp
The mpath data structure has data that is only relevant for the first mpath in the list. It is not being used anywhere else. Let's document that a bit more. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-01bgpd: Use CHECK_FLAG to remain consistent for mp_flagsDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-06-11bgpd: fix do not skip paths with same nexthopPhilippe Guibert
Under a setup where two BGP prefixes are available from multiple sources, if one of the two prefixes is recursive over the other BGP prefix, then it will not be considered as multipath. The below output shows the two prefixes 192.0.2.24/32 and 192.0.2.21/32. The 192.0.2.[5,6,8] are the known IP addresses visible from the IGP. > # show bgp ipv4 192.0.2.24/32 > *>i 192.0.2.24/32 192.0.2.21 0 100 0 i > * i 192.0.2.21 0 100 0 i > * i 192.0.2.21 0 100 0 i > # show bgp ipv4 192.0.2.21/32 > *>i 192.0.2.21/32 192.0.2.5 0 100 0 i > *=i 192.0.2.6 0 100 0 i > *=i 192.0.2.8 0 100 0 i The bgp best selection algorithm refuses to consider the paths to '192.0.2.24/32' as multipath, whereas the BGP paths which use the BGP peer as nexthop are considered multipath. > ... has the same nexthop as the bestpath, skip it ... Previously, this condition has been added to prevent ZEBRA from installing routes with same nexthop: > Here you can see the two paths with nexthop 210.2.2.2 > superm-redxp-05# show ip route 2.23.24.192/28 > Routing entry for 2.23.24.192/28 > Known via "bgp", distance 20, metric 0, best > Last update 00:32:12 ago > * 210.2.2.2, via swp3 > * 210.2.0.2, via swp1 > * 210.2.1.2, via swp2 > * 210.2.2.2, via swp3 > [..] But today, ZEBRA knows how to handle it. When receiving incoming routes, nexthop groups are used. At creation, duplicated nexthops are identified, and will not be installed. The below output illustrate the duplicate paths to 172.16.0.200 received by an other peer. > r1# show ip route 172.18.1.100 nexthop-group > Routing entry for 172.18.1.100/32 > Known via "bgp", distance 200, metric 0, best > Last update 00:03:03 ago > Nexthop Group ID: 75757580 > 172.16.0.200 (recursive), weight 1 > * 172.31.0.3, via r1-eth1, label 16055, weight 1 > * 172.31.2.4, via r1-eth2, label 16055, weight 1 > * 172.31.0.3, via r1-eth1, label 16006, weight 1 > * 172.31.2.4, via r1-eth2, label 16006, weight 1 > * 172.31.8.7, via r1-eth4, label 16008, weight 1 > 172.16.0.200 (duplicate nexthop removed) (recursive), weight 1 > 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16055, weight 1 > 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16055, weight 1 > 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16006, weight 1 > 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16006, weight 1 > 172.31.8.7, via r1-eth4 (duplicate nexthop removed), label 16008, weight 1 Fix this by proposing to let ZEBRA handle this duplicate decision. Fixes: 7dc9d4e4e360 ("bgp may add multiple path entries with the same nexthop") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-04-22bgpd: Include IPv6 extended community into multipath considerationsDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-04-22bgpd: Convert 32-bit to 64-bit link bandwidth variable (link_bw)Donatas Abraitis
This is needed to implement and use larger bandwidths rather than limiting only to theoretical 34Gbps max bandwidth. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-22bgpd: move mp_nexthop_prefer_global boolean attribute to nh_flagsLouis Scalbert
Move mp_nexthop_prefer_global boolean attribute to nh_flags. It does not currently save memory because of the packing. Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2023-11-13bgpd: Used %pBD instead of %pRNDonald Sharp
Let's use the natural data structure in bgp for the prefix display instead of a bunch of places where we call a translator function. The %pBD does this and actually ensures data is correct. Also fix a few spots in bgp_zebra.c where the cast to a NULL pointer causes the catcher functionality to not work and fix the resulting crash that resulted. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-08-08bgpd: bgp_path_info_extra memory optimizationValerian_He
Even if some of the attributes in bgp_path_info_extra are not used, their memory is still allocated every time. It cause a waste of memory. This commit code deletes all unnecessary attributes and changes the optional attributes to pointer storage. Memory will only be allocated when they are actually used. After optimization, extra info related memory is reduced by about half(~400B -> ~200B). Signed-off-by: Valerian_He <1826906282@qq.com>
2023-02-09*: auto-convert to SPDX License IDsDavid Lamparter
Done with a combination of regex'ing and banging my head against a wall. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-01-13Revert "Merge pull request #11127 from louis-6wind/bgp-leak"Donald Sharp
This reverts commit 16aa1809e7c8caad37e8edd4e5aaac4f344bc7d3, reversing changes made to f616e716089b16d9a678846282a6ac5c55e31a56.
2022-12-16bgpd: move mp_nexthop_prefer_global boolean attribute to nh_flagLouis Scalbert
Previous commits have introduced a new 8 bits nh_flag in the attr struct that has increased the memory footprint. Move the mp_nexthop_prefer_global boolean in the attr structure that takes 8 bits to the new nh_flag in order to go back to the previous memory utilization. Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2022-12-02bgpd: Do not print cumulated bandwidth prefixed with `u`Donatas Abraitis
This seems just a mistake, drop `u` prefix. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2022-11-28bgpd: Null checking is not needed on failureDonald Sharp
Memory allocations that fail crash the program. Checking for NULL is not going to do anything. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-05-12bgpd: Change single value bitfield to a boolDonald Sharp
The maxpaths same_clusterlen value was a uint16_t with a single bit being used. No other values are being stored. Let's remove the bitfield and simplify to a bool. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-04-12bgpd: Fix styling, drop braces for single statement blockDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2022-04-12bgpd: Reuse bgp_attr_set_ecommunity() for setting attribute flagsDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2022-04-12bgpd: Reuse bgp_attr_set_[l]community() for setting attribute flagsDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2022-02-25bgpd: Reuse get/set helpers for attr->communityDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2022-02-10bgpd: Use get/set helpers for attr->lcommunityDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2022-02-04bgpd: Use bgp_attr_[sg]et_ecommunity for struct ecommunityDonatas Abraitis
This is an extra work before moving attr->ecommunity to attra_extra struct. Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-11-19bgpd: VRF-Lite fix best path selectionKantesh Mundaragi
Description: Incorrect behavior during best path selection for the imported routes. Imported routes are always treated as eBGP routes. Change is intended for fixing the issues related to bgp best path selection for leaked routes: - FRR does ecmp for the imported routes, even without any ecmp related config. If the same prefix is imported from two different VRFs, then we configure the route with ecmp even without any ecmp related config. - Locally imported routes are preferred over imported eBGP routes. If there is a local route and eBGP learned route for the same prefix, if we import both the routes, imported local route is selected as best path. - Same route is imported from multiple tenant VRFs, both imported routes point to the same VRF in nexthop. - When the same route with same nexthop in two different VRFs is imported from those two VRFs, route is not installed as ecmp, even though we had ecmp config. - During best path selection, while comparing the paths for imported routes, we should correctly refer to the original route i.e. the ultimate path. - When the same route is imported from multiple VRF, use the correct VRF while installing in the FIB. - When same route is imported from two different tenant VRFs, while comparing bgp path info as part of bgp best path selection, we should ideally also compare corresponding VRFs. See-also: https://github.com/FRRouting/frr/files/7169555/FRR.and.Cisco.VRF-Lite.Behaviour.pdf Co-authored-by: Santosh P K <sapk@vmware.com> Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com> Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
2021-11-12bgpd: Add vrf information to best path debuggingDonald Sharp
When debugging issues for routes in multiple vrf's. It would be extremely useful if the debug output had which vrf we are acting on. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-08-30bgpd: Add `neighbor PEER link-bw-encoding-ieee`Donatas Abraitis
This is to avoid breaking changes between existing deployments of extended community for bandwidth encoding. By default FRR uses uint32 to encode bandwidth, which is not as the draft requires (IEEE floating-point). This switch enables the required encoding per-peer. Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-06-29bgpd: Avoid more assignments within checks (round 2)Donatas Abraitis
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-03-09bgpd: Convert remaining string output to our internal typesDonald Sharp
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-02-09*: remove more sprintf()Quentin Young
Should be just a couple non-development, non-test occurrences of this function left now. Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
2020-11-02bgpd: Multipath is always being allocatedDonald Sharp
The multipath arrays are always being allocated, irrelevant if we actually have multipath information for a prefix. This is because the link bandwidth code was always adding the data structure. We should not be allocated multipath information unless we actually have multipath information Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-07-14*: un-split strings across linesDavid Lamparter
Remove mid-string line breaks, cf. workflow doc: .. [#tool_style_conflicts] For example, lines over 80 characters are allowed for text strings to make it possible to search the code for them: please see `Linux kernel style (breaking long lines and strings) <https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_ and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_. Scripted commit, idempotent to running: ``` python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'` ``` Signed-off-by: David Lamparter <equinox@diac24.net>
2020-06-23bgp: rename bgp_node to bgp_destDonald Sharp
This is the bulk part extracted from "bgpd: Convert from `struct bgp_node` to `struct bgp_dest`". It should not result in any functional change. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2020-03-30bgpd: Implement options for link bandwidth handlingvivek
Support configurable options to control how link bandwidth is handled by the receiver. The default behavior is to automatically honor the link bandwidths received and use it to perform a weighted ECMP BUT only if all paths in the multipath have associated link bandwidth; if one or more paths do not have link bandwidth, normal ECMP is performed among the multipaths. This behavior is as recommended by https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth. The additional options available are to (a) completely ignore any link bandwidth (i.e., weighted ECMP is effectively disabled), (b) skip paths in the multipath which do not have link bandwidth and perform weighted ECMP among the other paths (if at least some paths have the bandwidth) or (c) use a default weight (value chosen is 1) for the paths which do not have link bandwidth. The command syntax is bgp bestpath bandwidth <ignore|skip-missing|default-weight-for-missing> Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
2020-03-30bgpd: Additional options for generating link bandwidthvivek
Implement the code to handle the other route-map options to generate the link bandwidth, namely, to use the cumulative bandwidth or to base this on the number of multipaths. In the latter case, a reference bandwidth is internally chosen - the implementation uses a value of 1 Gbps. These additional options mean that the prefix may need to be advertised if there is a link bandwidth change, which is a new criteria. Define a new path (change) flag to support this and implement the advertisement. Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
2020-03-30bgpd: Install multipath routes with weightsvivek
Perform weighted ECMP if the multipaths have link bandwidth. This involves assigning weights to each of the next hops associated with the prefix based on the link bandwidth of the corresponding path as a factor of the total (cumulative) link bandwidth for the prefix. The weight values used are between 1 and 100. Weights are assigned only if all paths in the multipath have link bandwidth, otherwise any bandwidths are ignored and regular ECMP is performed. This is as recommended in https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth A subsequent commit will implement additional (user-configurable) behaviors. Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
2020-03-30bgpd: Track link bandwidth during multipath calcvivek
During multipath update, track the cumulative link bandwidth as well as update flags appropriately. Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
2020-03-30bgpd: Add link-bandwidth fields for multipath calcvivek
Introduce fields in the multipath structure for link bandwidth handling. In the process, the mp_count field is changed to a uint16 as that is the value set anyway. Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
2020-03-26bgpd: Convert users of `rn->p` to use accessor functionDonald Sharp
Add new function `bgp_node_get_prefix()` and modify the bgp code base to use it. This is prep work for the struct bgp_dest rework. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2020-03-24bgpd: Make bgp_debug_bestpath take a `struct bgp_node`Donald Sharp
Defer the grabbing of the prefix for as long as is possible. This is a long term rework of how we access the `struct bgp_node` to only use accessor functions. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2020-02-06bgpd: Replace bgp_flag_* to [UN]SET/CHECK_FLAG macrosDonatas Abraitis
Most of the code uses macros, thus let's keep the code unified. Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-02-06*: Replace s_addr 0 => INADDR_ANYDonatas Abraitis
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-02-03*: don't null after XFREE; XFREE does this itselfQuentin Young
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2019-12-05bgpd: remove bgp_attr_dupQuentin Young
yeah Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2019-10-25bgpd: bgp_path_info_mpath_next only returns valuesDonald Sharp
Since we don't set a value from the return of bgp_path_info_mpath_next it is impossible for this function to do anything as such the if statement is dead code as well. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2019-06-06lib,bgpd,babeld,ripngd,nhrpd,bfdd: clean up SA warningsMark Stapp
Clean up several SA warnings. Signed-off-by: Mark Stapp <mjs@voltanet.io>
2019-02-25*: compare pointer types to NULL, not 0Quentin Young
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-10-23bgpd:Fixing the signature of community_free functionSri Mohana Singamsetty
community_free, lcommunity_free and ecommunity_free are similar type of functions. Most of the places, these three are called together. The signature of community_free is different from other two functions. Modified the community_free API signature to align with other two functions to avoid any confusion. There is no functionality impact with this and this is just to avoid any confusion. Testing: manual testing and show commands Signed-off-by: Sri Mohana Singamsetty msingamsetty@vmware.com
2018-10-09bgpd: Rename various variable names to something more appropriateDonald Sharp
ri -> pi bi -> bpi info -> path info -> rmap_path ( for routemap applications ) Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-10-09bgpd: Convert binfo to pathDonald Sharp
Convert the binfo variable to path. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>