tests: increase ping attempts in EVPN MH to fix failures on ARM
There are two fixes to handle slow convergence on ARM -
1. Ping on every re-try attempt to account for initial packet loss
2. Handle incomplete show outputs gracefully
When a local ES is in LACP bypass state BGP doesn't advertise
reachability to it i.e. the Type-1/EAD-per-ES routes and Type-4
route for the ES is not advertised. This is the equivalent of
oper-down handling.
zebra: flush macs linked to the bond when it moves out of bypass
When a ES-bond is in bypass state MACs learnt on it are linked to the
access port instead of the ES. When LACP converges on the bond it moves
out of bypass and the MACs previously learnt on it are flushed to force
a re-learn on new traffic.
zebra: link local MACs to destination port for efficient lacp-bypass processing
When an ES-bond comes out of bypass FRR needs to flush the local MACs learnt
while the bond was in bypass. To do that efficiently local MACs are linked
to the dest-access port. This only happens if the access-port is in
LACP-bypass or if it is non-ES.
Feature overview:
=================
A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable
servers to pxe boot without a LACP license i.e. allows the bond to go oper
up (with a single link) without LACP converging.
If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES
bond. This involves the following special handling -
1. If the bond is in a bypass-state the associated ES is placed in a
bypass state.
2. If an ES is in a bypass state -
a. DF election is disabled (i.e. assumed DF)
b. SPH filter is not installed.
3. MACs learnt via the host bond are advertised with a zero ESI.
When the ES moves out of "bypass" the MACs are moved from a zero-ESI to
the correct non-zero id. This is treated as a local station move.
Implementation:
===============
When (a) an ES is detached from a hostbond or (b) an ES-bond goes into
LACP bypass zebra deletes all the local macs (with that ES as destination)
in the kernel and its local db. BGP re-sends any imported MAC-IP routes
that may exist with this ES destination as remote routes i.e. zebra can
end up programming a MAC that was perviously local as remote pointing
to a VTEP-ECMP group.
When an ES is attached to a hostbond or an ES-bond goes
LACP-up (out of bypss) zebra again deletes all the local macs in the
kernel and its local db. At this point BGP resends any imported MAC-IP
routes that may exist with this ES destination as sync routes i.e.
zebra can end up programming a MAC that was perviously remote
as local pointing to an access port.
Igor Ryzhov [Wed, 27 Jan 2021 23:41:07 +0000 (02:41 +0300)]
ospfd: don't rely on instance existence in vty
Store instance index at startup and use it when processing vty commands.
The instance itself may be created and deleted by the user in runtime
using `[no] router ospf X` command.
lynne [Tue, 23 Feb 2021 17:07:28 +0000 (12:07 -0500)]
ospf6d: fix display of unknown LSAs in show ipv6 ospf6 database command
When an unknown LSA is in the database and the user issues the
"show ipv6 ospf6 database" command there is a crash. The code currently
doesn't properly handle display of unknown LSAs.
Igor Ryzhov [Tue, 16 Feb 2021 09:57:01 +0000 (12:57 +0300)]
lib: register dependency between control plane protocol and vrf nb nodes
When the control plane protocol is created, the vrf structure is
allocated, and its address is stored in the northbound node.
The vrf structure may later be deleted by the user, which will lead to
a stale pointer stored in this node.
Instead of this, allow daemons that use the vrf pointer to register the
dependency between the control plane protocol and vrf nodes. This will
guarantee that the nodes will always be created and deleted together, and
there won't be any stale pointers.
Donald Sharp [Thu, 18 Feb 2021 00:58:19 +0000 (19:58 -0500)]
lib: Reduce getrusage/monotime for event handling
When handling a large number of events at one time
FRR will call monotime and getrusage 2 times for each
event. With this change modify the code to change
this to (X events / 2) + 1 calls of getrusage and monotime
David Schweizer [Mon, 22 Feb 2021 09:31:57 +0000 (10:31 +0100)]
tests: JSON comparison command for LabN topotests
The changes add the "jsoncmp_pass" and the "jsoncmp_fail" commands to
compare VTY shell's JSON output to an expected JSON object during
topotests using the LabN testing framework. This helps to eliminate
false negative test results (i.e. due to routes beeing out of order
after convergence or cosmetic changes in VTY shell's text output).
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
zebra: fix problem with SVI MAC not being sent to BGP
For MH the SVI MAC is advertised to prevent flooding of ARP replies.
But because of a bug the SVI MAC was being added to the zebra database
but not sent to bgpd for advertising.
zebra: drop the SVI MAC cleanup done as a part of interface delete
As a part of FRR shutdown interfaces are force flushed (in an arbitary
order). Interfaces are already down at that point i.e. resources like
SVI-MAC have already been released. Attempting to clean it up again
as a part of the force-flush was resulting in access of freed up memory -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
==26457== Thread 1:
==26457== Invalid read of size 8
==26457== at 0x1AE6B0: zebra_evpn_acc_bd_svi_set (zebra_evpn_mh.c:606)
==26457== by 0x1B1460: zebra_evpn_if_cleanup (zebra_evpn_mh.c:1040)
==26457== by 0x13CA69: if_zebra_delete_hook (interface.c:244)
==26457== by 0x48A0E34: hook_call_if_del (if.c:59)
==26457== by 0x48A0E34: if_delete_retain (if.c:290)
==26457== by 0x48A2F94: if_delete (if.c:313)
==26457== by 0x48A3169: if_terminate (if.c:1217)
==26457== by 0x48E0024: vrf_delete (vrf.c:254)
==26457== by 0x48E0024: vrf_delete (vrf.c:225)
==26457== by 0x48E02FE: vrf_terminate (vrf.c:551)
==26457== by 0x1442E1: sigint (main.c:203)
==26457== by 0x1442E1: sigint (main.c:141)
==26457== by 0x48CF862: quagga_sigevent_process (sigevent.c:103)
==26457== by 0x48DD324: thread_fetch (thread.c:1404)
==26457== by 0x48A926A: frr_run (libfrr.c:1122)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(gdb) bt
(gdb) fr 5
1037 zebra/zebra_evpn_mh.c: No such file or directory.
(gdb) p zif->ifp->name
$2 = "vlan131", '\000' <repeats 12 times>
(gdb) p zif->link->info
$5 = (void *) 0x1
(gdb) p/x zif->ifp->flags
$7 = 0x1002
(gdb)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Chirag Shah [Tue, 24 Nov 2020 20:22:49 +0000 (12:22 -0800)]
zebra: prevent crash in evpn if cleanup
zebra crash is seen while cleaning up evpn interface
during shutdown event.
evpn interface clean up is called from vrf_delete callback
(gdb) frame 4
(is_up=false, br_zif=0x0, vlan_zif=0x557f31fb36f0) at zebra/zebra_evpn_mh.c:614
614 zebra/zebra_evpn_mh.c: No such file or directory.
(gdb) p tmp_br_zif
$1 = (struct zebra_if *) 0x0
(gdb) p vlan_zif->link
$2 = (struct interface *) 0x557f31fb2d40
(gdb) p vlan_zif->link->info
$3 = (void *) 0x0
(gdb) p zebra_if->ifp->name
No symbol "zebra_if" in current context.
(gdb) p vlan_zif->ifp->name
$4 = "peerlink-3.4094\000\000\000\000"
zebra: changes to advertise SVI mac by default if evpn-mh is enabled
Added support for advertising SVI MAC if EVPN-MH is enabled.
In the case of EVPN MH arp replies from an attached server can be sent to
the ES-peer. To prevent flooding of the reply the SVI MAC needs to be
advertised by default.
Note:
advertise-svi-ip could have been used as an alternate way to advertise
SVI MAC. However that config cannot be turned on if SVI IPs are
re-used (which is done to avoid wasting IP addresses in a subnet).
zebra: fix problem with SVI IP being advertised even if disabled
SVI IP is being advertised unconditionally i.e. even if disabled (and
that is the default config). This can be problematic when the SVI address
is re-used across racks.
Added the user config condition in all the relevant places where the
SVI advertisement is triggered.
Philippe Guibert [Fri, 19 Feb 2021 13:17:05 +0000 (14:17 +0100)]
bgpd: add attribute-unchanged attribute to flowspec
flowspec address family can now use attribute-unchanged attribute.
This parameter is necessary when it comes to play with
route-server-client, as that latter command forces to change
attribute-unchanged nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
lynne [Wed, 17 Feb 2021 14:21:43 +0000 (09:21 -0500)]
ospf6d: Update logs that indicate why ospf6 adjacency is not coming up.
Add more details to these logs to help make it easier to determine why
ospf6 adjacency is not coming up. Also make these logs show up without
having to turn on debug logging, again making it easier to debug the
misconfiguration.
Mark Stapp [Thu, 18 Feb 2021 14:37:09 +0000 (09:37 -0500)]
lib: don't awaken from poll for every timer
Only ask the event-loop poll() to awaken if a newly-added timer
actually might have changed the required timeout. Also compute
timer deadline outside of mutex locks.
Donald Sharp [Thu, 18 Feb 2021 11:55:29 +0000 (06:55 -0500)]
bgpd: Fix crash when we don't have a nexthop
Recent changes to allow bgpd to handle v6 LL slightly
differently in the nexthop tracking code has not
interacted well with the blackhole nexthop change
for peers. Modify the code to do the right thing
Pat Ruddy [Sun, 14 Feb 2021 12:09:55 +0000 (12:09 +0000)]
bgpd, lib: add oid2in6_addr utility and use it
The existing code was using the oid2in_addr API to copy IPv6
addresses passing an IPv6 length. Create a utility to do this
properly and avoid annoying coverity with type checking.
Donald Sharp [Thu, 17 Dec 2020 14:46:30 +0000 (09:46 -0500)]
bgpd: Switch LL nexthop tracking to be interface based
bgp is currently registering v6 LL as nexthops to be tracked
from zebra. This presents several problems.
a) zebra does not properly track multiple prefixes that match
the same route properly at this point in time.
b) BGP was receiving nexthops that were just incorrect because
of (a).
c) When a nexthop changed that really didn't affect the v6 LL
we were responding incorrectly because of this
Modify the code such that bgp nexthop tracking notices that
we are trying to register a v6 LL. When we do so, shortcut
and watch interface up/down events for this v6 LL and do
the work when an interface goes up / down for this type
of tracking.
Igor Ryzhov [Wed, 17 Feb 2021 12:06:20 +0000 (15:06 +0300)]
staticd: fix vrf enabling
When enabling the VRF, we should not install the nexthops that rely on
non-existent VRF.
For example, if we have route "1.1.1.0/24 2.2.2.2 vrf red nexthop-vrf blue",
and VRF red is enabled, we should not install it if VRF blue doesn't exist.
Igor Ryzhov [Wed, 17 Feb 2021 11:19:40 +0000 (14:19 +0300)]
staticd: fix nexthop creation and installation
Currently, staticd creates a VRF for the nexthop it is trying to install.
Later, when this nexthop is deleted, the VRF stays in the system and can
not be deleted by the user because "no vrf" command doesn't work for this
VRF because it was not created through northbound code.
There is no need to create the VRF. Just set nh_vrf_id to VRF_UNKNOWN
when the VRF doesn't exist.
Donald Sharp [Tue, 16 Feb 2021 20:54:08 +0000 (15:54 -0500)]
zebra: use AF_INET for protocol family
When looking up the conversion from kernel protocol to
internal protocol family make sure we use the correct
AF_INET( what the kernel uses ) instead of AFI_IP (which
is what FRR uses ).
Routes from OSPF will show up from the kernel as OSPF6 instead of
OSPF. Which will cause mayhem
Ticket: CM-33306 Signed-off-by: Donald Sharp <sharpd@nvidia.com>