Donald Sharp [Thu, 11 Aug 2022 01:49:37 +0000 (21:49 -0400)]
ospfd: When a neighbor goes down clear the oi->obuf if we can
When a neighbor goes down on an interface and that interface
has no more neighbors in a viable state where packets should
be being sent, then let's clear up the oi->obuf associated
with the interface the neighbor is on.
Donald Sharp [Wed, 10 Aug 2022 20:15:29 +0000 (16:15 -0400)]
ospfd: Increase packets sent at one time in ospf_write
ospf_write pulls an interface off the ospf->oi_write_q
then writes one packet and places it back on the queue,
keeping track of the first one sent. Then it will
stop sending packets even if we get back to the first
interface written too but before we have sent the full
pkt_count. I do not believe this makes a whole bunch
of sense and is very restrictive of how much data can
be sent especially if you have a limited number of peers
but large amounts of data. Why be so restrictive?
Quentin Young [Fri, 11 Jun 2021 23:01:42 +0000 (19:01 -0400)]
bgpd: don't adv conditionally withdrawn routes
If we have conditional advertisement enabled, and conditionally withdrew
some prefixes, and then we do a 'clear bgp', those routes were getting
advertised again, and then withdrawn the next time the conditional
advertisement scanner executed.
When we go to advertise check the prefix against the conditional
advertisement status so we don't do that.
Quentin Young [Tue, 15 Jun 2021 23:49:19 +0000 (19:49 -0400)]
bgpd: apply cond-adv policy to update group
The new outbound filter to apply conditional advertisement policy was
not working properly due to complications with update groups. The two
routemaps were properly copied into the update group peer filter but not
the conditional advertisement state.
Signed-off-by: Quentin Young <qlyoung@nvidia.com> Signed-off-by: Mark Stapp <mstapp@nvidia.com>
anlan_cs [Tue, 9 Aug 2022 00:41:22 +0000 (20:41 -0400)]
ospf6d: fix missing cost change
After all needed interfaces ( for example: interface "a1", vrf "vrf1", and
"a1" is binded to "vrf1" ) are ready/created, then restart/start frr. zebra
at startup will call `netlink_interface()` to process all interfaces and notify
all clients, but its calling `get_iflink_speed()` maybe fails for unexpected
order of the coming interfaces: when processing "a1", "vrf1" maybe is unknown
at that time. `if_zebra_speed_update()` timer is introduced to deal with this
order problem.
Currently only ospfd and ospf6d deal with this speed change to recalculated
route cost. ospfd can deal with this change, but ospf6d will wrongly missed it.
Since both `ipv6 ospf6 cost COST` and `auto-cost reference-bandwidth COST` are
not set, cost of this ospf6 interface should be calculated with interface
speed, but it is wrongly kept to `10`, which is based on interface speed being
`0` for it missed speed change. Further, ECMP function becomes invalid after
restart frr, beacuse some ospf6 interfaces of one ECMP are wrongly with cost
`10`.
To avoid missing, recalculate cost for ospf6 interfaces based on potentially
changed speed.
Donald Sharp [Tue, 9 Aug 2022 14:23:35 +0000 (10:23 -0400)]
zebra: System routes should be processed the same time as kernel
For whatever reason. ZEBRA_ROUTE_SYSTEM routes were being processed
last. Since a system route is just another kernel route type. Let's
just switch it to be processed the same time as kernel routes.
Donald Sharp [Tue, 9 Aug 2022 13:22:43 +0000 (09:22 -0400)]
zebra: Explicitly call out the correct queue name
There were more than a few places where the NHG meta
queue was not being explicitly called out. Let's
be consistent and use the same nomenclature as much
as possible when talking about metaQ's.
Donald Sharp [Mon, 8 Aug 2022 19:41:42 +0000 (15:41 -0400)]
staticd: When changing the underlying nh ensure it is reinstalled
There exists some nexthop attributes that are not the unique
part of the nexthop and if you change the static route
with those values then the route is not being updated
in zebra with the new values:
Example of brokenness:
eva# conf
eva(config)# ip route 1.2.3.9/32 192.168.119.1 enp39s0 label 16020
eva(config)# do show ip route 1.2.3.9
Routing entry for 1.2.3.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:05 ago
* 192.168.119.1, via enp39s0, label 16020, weight 1
eva(config)# ip route 1.2.3.9/32 192.168.119.1 enp39s0 label 16040
eva(config)# do show ip route 1.2.3.9
Routing entry for 1.2.3.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:12 ago
* 192.168.119.1, via enp39s0, label 16020, weight 1
Fixed behavior:
eva# conf
eva(config)# ip route 1.2.3.10/32 192.168.119.1 enp39s0 label 16020
eva(config)# do show ip route 1.2.3.10
Routing entry for 1.2.3.10/32
Known via "static", distance 1, metric 0, best
Last update 00:00:04 ago
* 192.168.119.1, via enp39s0, label 16020, weight 1
eva(config)# ip route 1.2.3.10/32 192.168.119.1 enp39s0 label 16040
eva(config)# do show ip route 1.2.3.10
Routing entry for 1.2.3.10/32
Known via "static", distance 1, metric 0, best
Last update 00:00:01 ago
* 192.168.119.1, via enp39s0, label 16040, weight 1
I've gone through most of the items in staticd that can change
the nexthop Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Mon, 8 Aug 2022 16:32:15 +0000 (12:32 -0400)]
zebra: Don't install connected routes multiple times into FRR
When moving an interface between vrf's we do not need
to install the connected routes multiple times.
eva# show ip route vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF BLUE:
C>* 4.5.6.7/32 is directly connected, dummy7, 00:00:10
VRF default:
K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:00:10
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:00:10
eva# exit
sharpd@eva ~/f/t/topotests (multiple_connected_installs)> sudo ip link add GREEN type vrf table 11000
sharpd@eva ~/f/t/topotests (multiple_connected_installs)> sudo ip link set GREEN up
sharpd@eva ~/f/t/topotests (multiple_connected_installs)> sudo ip link set dummy7 master GREEN
sharpd@eva ~/f/t/topotests (multiple_connected_installs)> vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
eva# show ip route vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF GREEN:
C>* 4.5.6.7/32 is directly connected, dummy7, 00:00:05
VRF default:
K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:01:03
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:01:03
eva# exit
sharpd@eva ~/f/t/topotests (multiple_connected_installs)> sudo ip link set dummy7 nomaster
sharpd@eva ~/f/t/topotests (multiple_connected_installs)> sudo vtysh -c "show ip route vrf all"
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
Donald Sharp [Tue, 28 Jun 2022 18:58:55 +0000 (14:58 -0400)]
zebra: Notice when an interface is turned on w/ mpls and enable mpls subsystem
Currently when FRR starts up it queries the kernel to see if mpls is turned on.
If not FRR does not enable zebra's mpls subsection. If at a later time mpls
is turned on, let's notice that an interface now is enabled for mpls( thus
implying that all the bits and bobs in the kernel are now setup properly ).
a) convert mpls_enabled to a bool
b) abstract a new function zebra_mpls_turned_on and call it
when FRR notices that an interface now has mpls enabled.
c) mpls_processq_init cannot fail, so actually notice that
and don't have special code to detect a failure.
New results:
sharpd@eva ~> vtysh -c "show zebra"
OS Linux(5.10.0-12-amd64)
ECMP Maximum 128
v4 Forwarding On
v6 Forwarding On
MPLS Off
EVPN Off
Kernel socket buffer size 90000000
VRF l3mdev Available
ASIC offload Unavailable
RA Compiled in
RFC 5549 BGP is not using
Kernel NHG Available
v4 All LinkDown Routes Off
v4 Default LinkDown Routes Off
v6 All LinkDown Routes Off
v6 Default LinkDown Routes Off
v4 All MC Forwarding On
v4 Default MC Forwarding Off
v6 All MC Forwarding On
v6 Default MC Forwarding Off
Route Route Neighbor LSP LSP
VRF Installs Removals Updates Installs Removals
default 26 7 0 0 0
<turn on mpls_iptunnel and mpls_router modules in the kernel and then do this>:
sharpd@eva ~> sudo sysctl -w net.mpls.conf.enp39s0.input=1
[sudo] password for sharpd:
net.mpls.conf.enp39s0.input = 1
sharpd@eva ~> vtysh -c "show zebra"
OS Linux(5.10.0-12-amd64)
ECMP Maximum 128
v4 Forwarding On
v6 Forwarding On
MPLS On
EVPN Off
Kernel socket buffer size 90000000
VRF l3mdev Available
ASIC offload Unavailable
RA Compiled in
RFC 5549 BGP is not using
Kernel NHG Available
v4 All LinkDown Routes Off
v4 Default LinkDown Routes Off
v6 All LinkDown Routes Off
v6 Default LinkDown Routes Off
v4 All MC Forwarding On
v4 Default MC Forwarding Off
v6 All MC Forwarding On
v6 Default MC Forwarding Off
I am doing this work because FRR keeps having operators not know about how
to properly use mpls. Let's make FRR behave a bit better in this weird edge
case.
Donald Sharp [Sat, 6 Aug 2022 01:40:11 +0000 (21:40 -0400)]
bfdd: Some interfaces don't have mac addresses
When an interface does not have a mac address, don't
try to retrieve the mac address ( for it to just fail ).
Example interface:
sharpd@eva [2]> ip link show tun100
21: tun100@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ipip 192.168.119.224 peer 192.168.119.120
Let's just notice that there is a NOARP flag and abort the call.
Fixes: #11733 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Thu, 4 Aug 2022 11:05:46 +0000 (07:05 -0400)]
zebra: Fix memory leaks and use after frees in nhg's on shutdown
Fixup both memory leaks as well as use after free's in nhg's
on shutdown.
This approach is effectively just iterating through all the
hash items and directly just freeing the memory instead
of handling ref counts or cross references.
Donald Sharp [Tue, 2 Aug 2022 19:43:46 +0000 (15:43 -0400)]
zebra: When saving nhg for later stop processing
Commit 35729f38fa5713b introduced the idea of
holding a nexthop group for a small amount of time
before removing it from the system. When this code
was introduced the nexthop group entry was saved
and a timer started, except instead of stopping
processing at that point in time, zebra was
continuing on and deleting nexthop group entries
that that entry depended on as well. This
should not be done until the timer pops.
Fixes: #11596 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
RFC 4760 states we SHOULD ignore the NEXT_HOP attribute for BGP Update
messages carrying only MP_REACH_NLRI attributes. Thus we should use the
Network Address of Next Hop field of the MP_REACH_NLRI as the nexthop.
Instead of always looking for BGP_ATTR_NEXT_HOP, this commit ensures:
1) we set mp_nexthop_len to BGP_ATTR_NHLEN_IPV4 for v4 bgp_static routes
2) we check mp_nexthop_len when choosing the nexthop to use for nht
3) we check mp_nexthop_len when choosing the nexthop to send to zebra
4) we check mp_nexthop_len when picking the nexthop to shown by vtysh
Reported-by: Binon Gorbutt <binon@aervivo.com> Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Pdoijode [Thu, 4 Aug 2022 18:28:33 +0000 (11:28 -0700)]
bgpd: addition of vxlanFlooding field to show output
Instead of changing the value of 'BUM flooding' field in
'show bgp l2vpn evpn vni' vty and JSON command from
'Head-end replication' to 'enabled', adding a new field named
'vxlanFlooding' to 'show bgp l2vpn evpn vni' vty and JSON output.
This is done to maintain backward compatibility.'BUM flooding' field
in vty and JSON output will be deprecated later.
Move the logic to check the mp_nexthop_len against v6 lengths into its
own macro so we can apply that logic elsewhere on its own without always
checking for presence of BGP_ATTR_NEXT_HOP.
pimd, pim6d: Send register msg via register socket
The problem here is when the same node is FHR as well as RP,
then the node keeps on sending the register packet.
Register-stop is not sent as well.
This problem has occurred because the RP is the same node
and there is no socket created on loopback interface, so the
packet is never send out and never received back on the same
node, so register recv could not be processed on the node and
hence no register-stop is sent.
Since register packets are unicast packets, its better to handle
the send of register packet via a separate register socket.
This fixes the problem mentioned above as well.
Trey Aspelund [Thu, 4 Aug 2022 01:43:31 +0000 (01:43 +0000)]
bgpd: fix show bgp l2vpn evpn route rd crashes
bgpd was crashing every time `show bgp l2vpn evpn route rd` was issued
with an RD that didn't match "all". This was introduced by 9b01d289883
which changed how argv_find() is handled in various vtysh commands, but
the new changes forgot a "!". So let's re-add the "!".
Before:
```
ub20# show bgp l2vpn evpn route rd 399672:100
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
ub20#
ub20# show bgp l2vpn evpn route rd 399672:100 mac 11:11:11:11:11:11
vtysh: error reading from bgpd: Resource temporarily unavailable (11)Warning: closing connection to bgpd because of an I/O error!
ub20#
```
bgpd: Add `show bgp access-list` command to filter routes by ACL
The same as with prefix-list/route-maps/etc.
```
donatas-pc# show ip access-list spine
ZEBRA:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
BGP:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
PIM:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
BABELD:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
donatas-pc# show bgp ipv4 unicast access-list
ACCESSLIST_NAME Access-list name
spine
donatas-pc# show bgp ipv4 unicast access-list spine
BGP table version is 9, local router ID is 172.17.0.3, vrf id 0
Default local pref 100, local AS 1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 200.200.200.200/32
enp3s0 0 0 65000 3456 ?
Displayed 1 routes and 10 total paths
donatas-pc#
```
anlan_cs [Mon, 1 Aug 2022 07:30:07 +0000 (03:30 -0400)]
zebra: fix bond down for evpn-mh
The test case is with `redirect-off` in evpn multi-homing environment:
```
evpn mh redirect-off
```
After the environment is setup, do the following steps:
1) Let one member of ES learn one mac:
```
2e:52:bb:bb:2f:46 dev ae1 vlan 100 master bridge0 static
```
Now everything is ok and the mac can be synced to other ES peers.
2) Shutdown bond1. At this time, zebra will get three netlink messages,
not one as current code expected. Like:
```
e4:f0:04:89:b6:46 dev vxlan10030 vlan 30 master bridge0 static <-A
e4:f0:04:89:b6:46 dev vxlan10030 nhid 536870913 self extern_learn <-B
e4:f0:04:89:b6:46 dev vxlan10030 vlan 30 self <-C
```
With A), zebra will wrongly remove this mac again:
```
ZEBRA: dpAdd remote MAC e4:f0:04:89:b6:46 VID 30
ZEBRA: Add/update remote MAC e4:f0:04:89:b6:46 intf vxlan10030(26) VNI 10030 flags 0xa01 - del local
ZEBRA: Send MACIP Del f None MAC e4:f0:04:89:b6:46 IP (null) seq 0 L2-VNI 10030 ESI - to bgp
```
With C), zebra will wrongly add this mac again:
```
ZEBRA: Rx RTM_NEWNEIGH AF_BRIDGE IF 26 VLAN 30 st 0x2 fl 0x2 MAC e4:f0:04:89:b6:46 nhg 0
ZEBRA: dpAdd remote MAC e4:f0:04:89:b6:46 VID 30
```
zebra should skip the two messages with `vid`. Otherwise, it will send many
*wrong* messages to bgpd, and the logic is wrong.
`nhg/dst` is in 2nd message without `vid`, it is useful to call
`zebra_evpn_add_update_local_mac()`. But it will fail with "could not find EVPN"
warning for no `vid`, can't call `zebra_evpn_add_update_local_mac()`:
With B):
```
ZEBRA: Rx RTM_NEWNEIGH AF_BRIDGE IF 26 st 0x2 fl 0x12 MAC e4:f0:04:89:b6:46 nhg 536870913
ZEBRA: dpAdd local-nw-MAC e4:f0:04:89:b6:46 VID 0
ZEBRA: Add/Update MAC e4:f0:04:89:b6:46 intf ae1(18) VID 0, could not find EVPN
```
Here, we can get `vid` from vxlan interface instead of from netlink message.
In summary, `zebra_vxlan_dp_network_mac_add()` will process the three messages
wrongly expecting only one messsage, so its logic is wrong. Just skip the two
unuseful messages with `vid`.