| Age | Commit message (Collapse) | Author |
|
Clean up variable-shadowing compiler warnings.
Signed-off-by: Mark Stapp <mjs@cisco.com>
|
|
zebra: Fix reinstalling nexthops in NHGs upon interface flaps
|
|
Trigger:
Imagine a route utilizing an NHG with six nexthops (Intf swp1-swp6).
If interfaces swp1-swp4 flaps, the NHG remains the same but now only
references two nexthops (swp5-6) instead of all six. This behavior
occurs due to how NHGs with recursive nexthops are managed within Zebra.
In the scenario below, NHG 370 has all six nexthops installed in the
kernel. However, Zebra maintains a list of recursive NHGs that NHG 370
references i.e., Depends: (371), (372), (373) which are not directly
installed in the kernel.
- When an interface comes up, its nexthop and corresponding dependents
are installed.
- These dependents (counterparts to 371-373) are non-recursive and
are installed as well.
- However, when attempting to install the recursive ones in
zebra_nhg_install_kernel(), they resolve to the already installed
counterparts, resulting in a NO-OP.
Fixing this by iterating all dependents of the recursively resolved
NHGs and reinstalling them.
Trigger: Flap swp1 to swp4
Before Fix:
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 370 proto bgp metric 20
ip -d next show
id 337 via 2000:1:0:1:0:f:0:9 dev swp6 scope link proto zebra
id 339 via 2000:1:0:1:0:e:0:9 dev swp5 scope link proto zebra
id 341 via 2000:1:0:1:0:8:0:8 dev swp4 scope link proto zebra
id 343 via 2000:1:0:1:0:7:0:8 dev swp3 scope link proto zebra
id 346 via 2000:1:0:1:0:1:0:7 dev swp2 scope link proto zebra
id 348 via 2000:1:0:1::7 dev swp1 scope link proto zebra
id 370 group 346/348/341/343/337/339 scope global proto zebra
After Trigger:
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 370 proto bgp metric 20
root@leaf-11:mgmt:/var/home/cumulus# ip -d next show
id 337 via 2000:1:0:1:0:f:0:9 dev swp6 scope link proto zebra
id 339 via 2000:1:0:1:0:e:0:9 dev swp5 scope link proto zebra
id 370 group 337/339 scope global proto zebra
After Fix:
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 432 proto bgp metric 20
ip -d next show
id 432 group 395/397/400/402/405/407 scope global proto zebra
After Trigger
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 432 proto bgp metric 20
root@leaf-11:mgmt:/var/home/cumulus# ip -d next show
id 432 group 395/397/400/402/405/407 scope global proto zebra
Ticket :#
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:
> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100
Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).
When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.
Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.
Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.
> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
> #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
> #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
> #2 0x7fa32474d783 in route_node_get lib/table.c:283
> #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
> #4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
> #5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
> #6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
> #7 0x7fa32476689c in event_call lib/event.c:1996
> #8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #9 0x55b0e4e6c32a in main zebra/main.c:526
> #10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
> #11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
> #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
> #1 0x7fa324668d8f in qfree lib/memory.c:130
> #2 0x7fa32474c421 in route_table_free lib/table.c:126
> #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
> #4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
> #5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
> #6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
> #7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
> #8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
> #9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
> #10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
> #11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
> #12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
> #13 0x7fa32476689c in event_call lib/event.c:1996
> #14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #15 0x55b0e4e6c32a in main zebra/main.c:526
> #16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
> #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> #1 0x7fa324668c4d in qcalloc lib/memory.c:105
> #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
> #3 0x7fa32474e73c in route_table_init lib/table.c:512
> #4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
> #5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
> #6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
> #7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
> #8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
> #9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
> #10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
> #11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
> #12 0x7fa32476689c in event_call lib/event.c:1996
> #13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #14 0x55b0e4e6c32a in main zebra/main.c:526
> #15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
|
|
Ensure that the nhg hash comparison function includes all
nexthops, including recursive-resolving nexthops.
Signed-off-by: Mark Stapp <mjs@cisco.com>
|
|
|
|
Currently FRR when installing a nexthop group, the installation can fail.
The assumption with the code was that the current nexthop group was
not already installed. This leaves a problem state where if the
users of the nexthop group are removed, the nexthop group will be
removed possibly leaving a orphaned nexthop group in the data plane.
FRR on a nexthop group installation does not actually know the status
of the nexthop group in the kernel. It's possible that a earlier
version of the nexthop group is left in play. It's possible that
there is no nexthop group in the kernel at all. Leaving the
Installed flag alone allows upon Zebra removing the nexthop
group when it is removed from zebra.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Currently if you have an interface down event, Zebra
sets the nexthop(s) as !ACTIVE that use it. On
interface up events the singleton nexthops are not being
set as ACTIVE. Due to timing events it is sometimes
possible to end up with a route that is using a singleton
Change singleton nexthops to set the nexthop to ACTIVE.
This will allow the nexthop to be reinstalled appropriately
as well.
I was able to easily reproduce this using sharpd since
it does not attempt to reinstall the routes when a interface
goes up/down.
Before:
D>* 10.0.0.0/32 [150/0] via 192.168.102.34, dummy2, weight 1, 00:00:01
sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up
D> 10.0.0.0/32 [150/0] (350) via 192.168.102.34, dummy2 inactive, weight 1, 00:00:10
After code change:
D>* 10.0.0.0/32 [150/0] (73) via 192.168.102.34, dummy2, weight 1, 00:00:14
sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up
D>* 10.0.0.0/32 [150/0] (73) via 192.168.102.34, dummy2, weight 1, 00:00:21
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
In some cases, the old_re nhe and the newnhe is same and there is no
point in comparing them both since they are the same. Skip comparing in
such cases.
Ex:
2025/01/09 23:49:27.489020 ZEBRA: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489021 ZEBRA: [ZH3FQ-TE9NV] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 new id: 44 old id: 44
2025/01/09 23:49:27.489021 ZEBRA: [YB8HE-Z86GN] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 NEW 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489023 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489024 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489025 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
2025/01/09 23:49:27.489026 ZEBRA: [ZM3BX-HPETZ] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 OLD 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489027 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
|
|
If you have this series of events:
a) Decision to install a NHG is made in zebra, enqueue to DPLANE
b) Changes to NHG are made and we remove it in the master pthread
Since this NHG is not marked as installed it is not removed
but the NHG data structure is deleted
c) DPLANE installs the NHG
In the end the NHG stays installed but ZEBRA has lost track of it.
Modify the removal code to check to see if the NHG is queued.
There are 2 cases:
a) NHG is kept around for a bit before being deleted. In this case
just see that the NHG is Queued and keep it around too.
b) NHG is not kept around and we are just removing it. In this case
check to see if it is queued and send another deletion event.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
During route processing in zebra, Zebra will create a nexthop
group that matches the nexthops passed down from the routing
protocol. Then Zebra will look to see if it can re-use a
nhe from a previous version of the route entry( say a interface
goes down ). If Zebra decides to re-use an nhe it was just dropping
the route entry created. Which led to nexthop group's that had
a refcount of 0 and in some cases these nexthop groups were installed
into the kernel.
Add a bit of code to see if the returned entry is not being used
and it has no reference count and if so, properly dispose of it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Upon if_down, we don't reset the valid flag for dependents
and unset the INSTALLED flag.
So when its time for the NHG to be deleted (routes dereferenced),
zebra deletes it since refcnt goes to 0, but stale NHG remains in kernel.
Ticket :#4200788
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
|
|
The kernel routes are wrongly selected even the nexthop interface is linkdown.
Use `ip link set dev <interface> down` on the other box to set the box's
nexthop interface linkdown. The kernel routes will be kept as `linkdown`,
but are still with active nexthop in `zebra`.
Add three changes/commits for kernel routes in this PR:
1) The active nexthop should be the operative interface.
2) Don't uninstall the kernel routes from `zebra` even no active nexthops.
(It doesn't affect the kernel routes' deletion from kernel netlink messages.)
3) Update the kernel routes when the nexthop interface becomes up.
Before: (during nexthop interface is linkdown)
```
K>* 3.3.3.3/32 [0/0] via 88.88.88.1, enp2s0, weight 1, 00:00:14
```
After: (during nexthop interface is linkdown, with all three changes)
```
K 3.3.3.3/32 [0/0] via 88.88.88.1, enp2s0 inactive, weight 1, 00:00:07
```
This commit is 1st change:
Improve the judgment for "active" nexthop to be more accurate, the active
nexthop should be the operative interface.
Signed-off-by: anlan_cs <anlan_cs@126.com>
|
|
Currently FRR is limiting the nexthop count to a uint8_t not a
uint16_t. This leads to issues when the nexthop count is 256
which results in the count to overflow to 0 causing problems
in the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Nhg reuse intf down up
|
|
zebra_nhg_install_kernel takes a route type. We don't
know it at that particular spot but we should not be passing
in `true`. Let's use ZEBRA_ROUTE_MAX to indicate we do not
know, so that the correct thing is done.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
The previous commit modified zebra to reinstall the singleton
nexthops for a nexthop group when a interface event comes up.
Now let's modify zebra to attempt to reuse the nexthop group
when this happens and the upper level protocol resends the
route down with that. Only match if the protocol is the same
as well as the instance and the nexthop groups would match.
Here is the new behavior:
eva(config)# do show ip route 9.9.9.9/32
Routing entry for 9.9.9.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:08 ago
* 192.168.99.33, via dummy1, weight 1
* 192.168.100.33, via dummy2, weight 1
* 192.168.101.33, via dummy3, weight 1
* 192.168.102.33, via dummy4, weight 1
eva(config)# do show ip route nexthop-group 9.9.9.9/32
% Unknown command: do show ip route nexthop-group 9.9.9.9/32
eva(config)# do show ip route 9.9.9.9/32 nexthop-group
Routing entry for 9.9.9.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:54 ago
Nexthop Group ID: 57
* 192.168.99.33, via dummy1, weight 1
* 192.168.100.33, via dummy2, weight 1
* 192.168.101.33, via dummy3, weight 1
* 192.168.102.33, via dummy4, weight 1
eva(config)# exit
eva# conf
eva(config)# int dummy3
eva(config-if)# shut
eva(config-if)# no shut
eva(config-if)# do show ip route 9.9.9.9/32 nexthop-group
Routing entry for 9.9.9.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:08 ago
Nexthop Group ID: 57
* 192.168.99.33, via dummy1, weight 1
* 192.168.100.33, via dummy2, weight 1
* 192.168.101.33, via dummy3, weight 1
* 192.168.102.33, via dummy4, weight 1
eva(config-if)# exit
eva(config)# exit
eva# exit
sharpd@eva ~/frr1 (master) [255]> ip nexthop show id 57
id 57 group 37/43/50/58 proto zebra
sharpd@eva ~/frr1 (master)> ip route show 9.9.9.9/32
9.9.9.9 nhid 57 proto 196 metric 20
nexthop via 192.168.99.33 dev dummy1 weight 1
nexthop via 192.168.100.33 dev dummy2 weight 1
nexthop via 192.168.101.33 dev dummy3 weight 1
nexthop via 192.168.102.33 dev dummy4 weight 1
sharpd@eva ~/frr1 (master)>
Notice that we now no longer are creating a bunch of new
nexthop groups.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
If a interface down event caused a nexthop group to remove
one of the entries in the kernel, have it be reinstalled
when the interface comes back up. Mark the nexthop as
usable.
new behavior:
eva# show nexthop-group rib 181818168
ID: 181818168 (sharp)
RefCnt: 1
Uptime: 00:00:23
VRF: default(bad-value)
Valid, Installed
Depends: (35) (38) (44) (51)
via 192.168.99.33, dummy1 (vrf default), weight 1
via 192.168.100.33, dummy2 (vrf default), weight 1
via 192.168.101.33, dummy3 (vrf default), weight 1
via 192.168.102.33, dummy4 (vrf default), weight 1
eva# conf
eva(config)# int dummy3
eva(config-if)# shut
eva(config-if)# do show nexthop-group rib 181818168
ID: 181818168 (sharp)
RefCnt: 1
Uptime: 00:00:44
VRF: default(bad-value)
Depends: (35) (38) (44) (51)
via 192.168.99.33, dummy1 (vrf default), weight 1
via 192.168.100.33, dummy2 (vrf default), weight 1
via 192.168.101.33, dummy3 (vrf default) inactive, weight 1
via 192.168.102.33, dummy4 (vrf default), weight 1
eva(config-if)# no shut
eva(config-if)# do show nexthop-group rib 181818168
ID: 181818168 (sharp)
RefCnt: 1
Uptime: 00:00:53
VRF: default(bad-value)
Valid, Installed
Depends: (35) (38) (44) (51)
via 192.168.99.33, dummy1 (vrf default), weight 1
via 192.168.100.33, dummy2 (vrf default), weight 1
via 192.168.101.33, dummy3 (vrf default), weight 1
via 192.168.102.33, dummy4 (vrf default), weight 1
eva(config-if)# exit
eva(config)# exit
eva# exit
sharpd@eva ~/frr1 (master) [255]> ip nexthop show id 181818168
id 181818168 group 35/38/44/51 proto 194
sharpd@eva ~/frr1 (master)>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Current code when a link is set down is to just mark the
nexthop group as not properly setup. Leaving situations
where when an interface goes down and show output is
entered we see incorrect state. This is true for anything
that would be checking those flags at that point in time.
Modify the interface down nexthop group code to notice the
nexthops appropriately ( and I mean set the appropriate flags )
and to allow a `show ip route` command to actually display
what is going on with the nexthops.
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:06 ago
* 192.168.44.33, via dummy1, weight 1
* 192.168.45.33, via dummy2, weight 1
sharpd@eva:~/frr1$ sudo ip link set dummy2 down
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:12 ago
* 192.168.44.33, via dummy1, weight 1
192.168.45.33, via dummy2 inactive, weight 1
Notice now that the 1.0.0.0/32 route now correctly
displays the route for the nexthop group entry.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Currently the FRR code will receive both kernel and
connected routes that do not actually have an underlying
nexthop group at all. Zebra turns around and creates
a `matching` nexthop hash entry and installs it.
For connected routes, this will create 2 singleton
nexthops in the dplane per interface (v4 and v6).
For kernel routes it would just create 1 singleton
nexthop that might be used or not.
This is bad because the dplane has a limited amount
of space available for nexthop entries and if you
happen to have a large number of interfaces then
all of a sudden you have 2x(# of interfaces) singleton
nexthops.
Let's modify the code to delay creation of these singleton
nexthops until they have been used by something else in the
system.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
A blackhole nexthop, according to the linux kernel,
can be v4 or v6. A v4 blackhole nexthop cannot be
used on a v6 route, but a v6 blackhole nexthop can
be used with a v4 route. Convert all blackhole
singleton nexthops to v6 and just use that.
Possibly reducing the number of active nexthops by 1.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
zebra: be consistent about v6 nexthops for v4 routes
|
|
Currently FRR when it has two nexthop groups:
A
nexthop 1 weight 5
nexthop 2 weight 6
nexthop 3 weight 7
B
nexthop 1 weight 3
nexthop 2 weight 4
nexthop 3 weight 5
We end up with 5 singleton nexthops and two groups:
ID: 181818168 (sharp)
RefCnt: 1
Uptime: 00:04:52
VRF: default
Valid, Installed
Depends: (69) (70) (71)
via 192.168.119.1, enp13s0 (vrf default), weight 182
via 192.168.119.2, enp13s0 (vrf default), weight 218
via 192.168.119.3, enp13s0 (vrf default), weight 255
ID: 181818169 (sharp)
RefCnt: 1
Uptime: 00:02:08
VRF: default
Valid, Installed
Depends: (71) (127) (128)
via 192.168.119.1, enp13s0 (vrf default), weight 127
via 192.168.119.2, enp13s0 (vrf default), weight 170
via 192.168.119.3, enp13s0 (vrf default), weight 255
id 69 via 192.168.119.1 dev enp13s0 scope link proto 194
id 70 via 192.168.119.2 dev enp13s0 scope link proto 194
id 71 via 192.168.119.3 dev enp13s0 scope link proto 194
id 127 via 192.168.119.1 dev enp13s0 scope link proto 194
id 128 via 192.168.119.2 dev enp13s0 scope link proto 194
id 181818168 group 69,182/70,218/71,255 proto 194
id 181818169 group 71,255/127,127/128,170 proto 194
This is not a desirable state to be in. If you have a
link flapping in the network and weights are changing
rapidly you end up with a large number of singleton
nexthops that are being used by the nexthop groups.
This fills up asic space and clutters the table.
Additionally singleton nexthops cannot have any weight
and the fact that you attempt to create a singleton
nexthop with different weights means nothing to the
linux kernel( or any asic dplane ). Let's modify
the code to always create the singleton nexthops
without a weight and then just creating the
NHG's that use the singletons with the appropriate
weight.
ID: 181818168 (sharp)
RefCnt: 1
Uptime: 00:00:32
VRF: default
Valid, Installed
Depends: (22) (24) (28)
via 192.168.119.1, enp13s0 (vrf default), weight 182
via 192.168.119.2, enp13s0 (vrf default), weight 218
via 192.168.119.3, enp13s0 (vrf default), weight 255
ID: 181818169 (sharp)
RefCnt: 1
Uptime: 00:00:14
VRF: default
Valid, Installed
Depends: (22) (24) (28)
via 192.168.119.1, enp13s0 (vrf default), weight 153
via 192.168.119.2, enp13s0 (vrf default), weight 204
via 192.168.119.3, enp13s0 (vrf default), weight 255
id 22 via 192.168.119.1 dev enp13s0 scope link proto 194
id 24 via 192.168.119.2 dev enp13s0 scope link proto 194
id 28 via 192.168.119.3 dev enp13s0 scope link proto 194
id 181818168 group 22,182/24,218/28,255 proto 194
id 181818169 group 22,153/24,204/28,255 proto 194
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
The function zebra_nhg_hash_equal is only used
as a hash function for storage of NHG's and retrieval.
If you have say two nhg's:
31 (25/26)
32 (25/26)
This function would return them as being equal. Which
of course leads to the problem when you attempt to
hash_release 32 but release 31 from the hash. Then later
when you attempt to do hash comparisons 32 has actually
been freed leaving to use after free situations and shit
goes down hill fast.
This hash is only used as part of the hash comparison
function for nexthop group storage. Since this is so
let's always return the 31/32 nhg's are not equal at all.
We possibly have a different problem where we are creating
31 and 32 ( when 31 should have just been used instead of 32 )
but we need to prevent any type of hash release problem at all.
This supercedes any other issue( that should be tracked down
on it's own ). Since you can have use after free situation
that leads to a crash -vs- some possible nexthop group duplication
which is very minor in comparison.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Treat TYPE_IPV6 and TYPE_IPV6_IFINDEX nexthops the same way when
processing v4 (RFC 5549) routes.
Signed-off-by: Mark Stapp <mjs@cisco.com>
|
|
Current code when a link is set down is to just mark the
nexthop group as not properly setup. Leaving situations
where when an interface goes down and show output is
entered we see incorrect state. This is true for anything
that would be checking those flags at that point in time.
Modify the interface down nexthop group code to notice the
nexthops appropriately ( and I mean set the appropriate flags )
and to allow a `show ip route` command to actually display
what is going on with the nexthops.
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:06 ago
* 192.168.44.33, via dummy1, weight 1
* 192.168.45.33, via dummy2, weight 1
sharpd@eva:~/frr1$ sudo ip link set dummy2 down
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:12 ago
* 192.168.44.33, via dummy1, weight 1
192.168.45.33, via dummy2 inactive, weight 1
Notice now that the 1.0.0.0/32 route now correctly
displays the route for the nexthop group entry.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
If using weighted ECMP, the weight for non-recursive next-hop should be
inherited from recursive next-hop.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|
In zebra_interface_nhg_reinstall zebra is checking that the
nhg is a singleton and not a blackhole nhg. This was originally
done with checking that the nexthop is a NEXTHOP_TYPE_IFINDEX,
NEXTHOP_TYPE_IPV4_IFINDEX and NEXTHOP_TYPE_IPV6_IFINDEX. This
was excluding NEXTHOP_TYPE_IPV4 and NEXTHOP_TYPE_IPV6. These
were both possible to be received and maintained from the upper
level protocol for when a route is being recursively resolved.
If we have gotten to this point in zebra_interface_nhg_reinstall
the nexthop group has already been installed at least once
and we *know* that it is actually a valid nexthop. What the
test is really trying to do is ensure that we are not reinstalling
a blackhole nexthop group( Which is not possible to even be
here by the way, but safety first! ). So let's change
to test for that instead.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
these changes are for improving the code maintainability and readability
Signed-off-by: sri-mohan1 <sri.mohan@samsung.com>
|
|
The current code is unsetting the fact that the
NHG is installed. It is installed but we are
reinstalling it. Let's note this in the code
appropriately as REINSTALL and not remove the
INSTALLED FLAG.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
These functions provided a level of abstraction that forced
us to call multiple functions when a simple data structure
change was all that is needed. Let's consolidate down
and make things a bit simpler.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
The nexthop group is marked as valid/invalid and then
installed. Not installed and then marked valid.
This is just a bit of code removed that might be covering
up other problems that need to be sorted.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Convert the dplane results function for nhg's over to
using a switch for the result enum. Let's specifically
call out the unexpected state and also set the nexthop
group as not installed when installation fails.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Basically the same function two times. Let's consolidate.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
When installing a NHG via dplane_nexthop_add, it can only return
REQUEST_QUEUED or REQUEST_FAILURE. There is no way SUCCESS can
be returned with the way the dplane works at this point in time.
Remove the code that attempts to set the NHE state appropriately
as it is impossible.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
This is not complicated code and if zebra is allocating
a new one. Zebra does not need to inform the operator
about the process during debugs.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
When debugging NHG detail there is a whole bunch
of lines surrounding the nexthop group. Let's
clean these up since they are extremely chatty and
spawn several lines.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
There is a goto statement that would be better served
with a break statement. Let's try to minimize this
in the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
*: Introduce Local Host Routes to FRR
|
|
Display a specific log message when the rt_nhe parameter is not
set at all.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
|
|
Replace several switch blocks that contain every dplane opcode
with simpler sets of if()s. In these cases the code only
uses a couple of opcodes.
Signed-off-by: Mark Stapp <mjs@labn.net>
|
|
Create Local routes in FRR:
S 0.0.0.0/0 [1/0] via 192.168.119.1, enp39s0, weight 1, 00:03:46
K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:03:51
O 192.168.119.0/24 [110/100] is directly connected, enp39s0, weight 1, 00:03:46
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:03:51
L>* 192.168.119.224/32 is directly connected, enp39s0, 00:03:51
O 192.168.119.229/32 [110/100] via 0.0.0.0, enp39s0 inactive, weight 1, 00:03:46
C>* 192.168.119.229/32 is directly connected, enp39s0, 00:03:46
Create ability to redistribute local routes.
Modify tests to support this change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Zebra currently does a shortest prefix match for
resolving nexthops for a prefix. This is typically
an ok thing to do but fails in several specific scenarios.
If a nexthop matches to a route that is not usable, nexthop
resolution just gives up and refuses to use that particular
route. For example if zebra currently has a covering prefix
say a 10.0.0.0/8. And about the same time it receives a
10.1.0.0/16 ( a more specific than the /8 ) and another
route A, who's nexthop is 10.1.1.1. Imagine the 10.1.0.0/16
is processed enough to know we want to install it and the
prefix is sent to the dataplane for installation( it is queued )
and then route A is processed, nexthop resolution will fail
and the route A will be left in limbo as uninstallable.
Let's modify the nexthop resolution code in zebra such that
if a nexthop's most specific match is unusable, continue looking
up the table till we get to the 0.0.0.0/0 route( if it's even
installed ). If we find a usable route for the nexthop accept
it and use it.
The bgp_default_originate topology test is frequently failing
with this exact problem:
B>* 0.0.0.0/0 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
B 1.0.1.17/32 [200/0] via 192.168.0.1 inactive, weight 1, 00:00:21
B>* 1.0.2.17/32 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
C>* 1.0.3.17/32 is directly connected, lo, 00:02:00
B>* 1.0.5.17/32 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
B>* 192.168.0.0/24 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
B 192.168.1.0/24 [200/0] via 192.168.1.1 inactive, weight 1, 00:00:21
C>* 192.168.1.0/24 is directly connected, r2-r1-eth0, 00:02:00
C>* 192.168.2.0/24 is directly connected, r2-r3-eth1, 00:02:00
B>* 192.168.3.0/24 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
B 198.51.1.1/32 [200/0] via 192.168.0.1 inactive, weight 1, 00:00:21
B>* 198.51.1.2/32 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
Notice that the 1.0.1.17/32 route is inactive but the nexthop
192.168.0.1 is covered by both the 192.168.0.0/24 prefix( shortest match )
*and* the 0.0.0.0/0 route ( longest match ). When looking at the logs
the 1.0.1.17/32 route was not being installed because the matching
route was not in a usable state, which is because the 192.168.0.0/24
route was in the process of being installed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Also:
- replace all /* fallthrough */ comments with portable fallthrough;
pseudo keyword to accomodate both gcc and clang
- add missing break; statements as required by older versions of gcc
- cleanup some code to remove unnecessary fallthrough
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
|
|
This reverts commit 1642a68d60cfade4b2fce72aaef58dea700c65c3.
|
|
zebra: be more careful removing 'installed' flag from nhgs
|
|
When interface addresses change, we examine nhgs associated
with the interface in case they need to be reinstalled. As
part of that, we may need to reinstall ecmp nhgs that use the
interface being examined - but not always.
Signed-off-by: Mark Stapp <mjs@labn.net>
|
|
bgpd: add basic support of BGP Link-State RFC7752
|
|
Append zebra and lib to use muliple SRv6 segs SIDs, and keep one
seg SID for bgpd and sharpd.
Note: bgpd and sharpd compilation relies on the lib and zebra files,
i.e if we separate this: lib or zebra or bgpd or sharpd in different
commits - this will not compile.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
|
|
Register BGP Link-State AFI/SAFI values from RFC7752.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
|