Olivier Dugeon [Tue, 26 May 2020 12:51:02 +0000 (14:51 +0200)]
lib: Add Link State Database
Define new models for Link State Database a.k.a TED
and functions to manipulate the new database as well as exchange Link State
information through ZAPI Opaque message.
G. Paul Ziemba [Sat, 19 Dec 2020 19:41:05 +0000 (11:41 -0800)]
tests: topotests/lib/lutil.py: add timestamp to log of test start
Timestamps in test logs are needed for correlation with messages in
routing protocol log files. Vox populi indicates preference for
timestamp at beginning of line.
Duncan Eastoe [Tue, 22 Dec 2020 21:23:43 +0000 (21:23 +0000)]
zebra: zebra2proto() handle kernel/connect type
When dplane_fpm_nl is used the "Please add this protocol(n) to proper
rt_netlink.c handling" debug message is emitted for any route of type
kernel or connected.
This severely reduces performance of dplane_fpm_nl when large numbers
of these routes are present in the RIB.
The messages are not observed when using the original fpm module since
this uses a custom function, netlink_proto_from_route_type().
zebra2proto() now returns RTPROT_KERNEL for ZEBRA_ROUTE_CONNECT and
ZEBRA_ROUTE_KERNEL. This should only impact dplane_fpm_nl's use of
the common netlink routines since these routes generally ignored via
checking of RSYSTEM_ROUTE().
zebra: skip EVI setup if an ES is applied to a pseudo interface
zebra maintains pseudo interface for hanging off user config after
the interface is deleted in the kernel. If an user tried to config
an ES against such an interface zebra would crash with the following
call stack -
at zebra/zebra_evpn_mh.c:2095
sysmac=sysmac@entry=0x55cfbadd3160) at zebra/zebra_evpn_mh.c:2258
at zebra/zebra_evpn_mh.c:3222
argv=<optimized out>, es_lid_str=<optimized out>, es_lid=1, no=0x0, vty=0x55cfbaf4c7b0)
at zebra/zebra_evpn_mh.c:3222
argv=<optimized out>) at ./zebra/zebra_evpn_mh_clippy.c:202
vty=vty@entry=0x55cfbaf4c7b0, cmd=cmd@entry=0x0, filter=FILTER_RELAXED)
at lib/command.c:1073
zebra: accept bgp remote mac-ip update if the higher-seq-local mac is not bgp-ready
If a local-MAC or local-neigh is not active locally it is not sent to BGP.
At this point if BGP rxes a remote route it accepts it and installs in
zebra. Zebra was rejecting BGP's update if it had a higher seq local (inactive)
entry. This would result in bgp and zebra falling out of sync.
In some cases zebra would delete the local-inactive entries in sometime (as
a part of the dplane/kernel garbage collection). This would leave zebra
with missing remote entries (which were still present in bgpd).
This change allows lower-seq BGP updates to overwrite zebra's local entry if
that entry happens to be local-inactive.
Note: This logic was already in use for sync-mac-ip updates. Extended the
same logic to remote-mac-ip updates.
zebra: clean zevpn references in the access bd database when the VNI is deleted
When an VNI was deleted as a part of FRR/zebra shutdown the zevpn entry
was being freed without removing its reference in the access vlan
entry (i.e. without clearing the VLAN->VNI mapping) used by MH.
If a netlink/dp notification is rxed for a neigh without the peer-sync
flag FRR re-installs the entry with the right flags. This change is
needed to handle cases where the dataplane and FRR may fall out of
sync because of neigh learning on the network ports (i.e. via
the VxLAN).
Ticket: CM-30693
The problem was found during VM mobility "torture" tests where 100s
of extended VM moves were done.
zebra: fix a problem with local MAC pointing to a remote ES
If a remote MAC update is rxed from BGP with a lower sequence number than
the local one zebra ignores the MAC update. This typically happens if
there is a race condition (where updates are in flight from zebra to BGP).
There was a bug in zebra because of which the dest ES was being updated
before this check. This left the local MAC pointing to a remote ES.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Relevant Dumps:
===============
root@leaf21:mgmt:~# net show evpn mac vni 101101 mac 00:93:00:00:00:01
MAC: 00:93:00:00:00:01
ESI: 03:00:00:00:77:01:03:00:00:0d
Intf: - VLAN: 101
Sync-info: neigh#: 1 peer-proxy
Local Seq: 3 Remote Seq: 0
Neighbors:
21.1.13.1 Active
root@leaf21:mgmt:~# net sho evpn es
Type: L local, R remote, N non-DF
ESI Type ES-IF VTEPs
03:00:00:00:77:01:02:00:00:0c R - 6.0.0.10,6.0.0.11
03:00:00:00:77:01:03:00:00:0d R - 6.0.0.10,6.0.0.11,6.0.0.12
03:00:00:00:77:01:04:00:00:0e R - 6.0.0.10,6.0.0.11,6.0.0.12,6.0.0.13
03:00:00:00:77:02:02:00:00:16 LR bondP2-H2 6.0.0.15
03:00:00:00:77:02:03:00:00:17 LR bondP2-H3 6.0.0.15,6.0.0.16
03:00:00:00:77:02:04:00:00:18 LR bondP2-H4 6.0.0.15,6.0.0.16,6.0.0.17
root@leaf21:mgmt:~#
Relevant logs:
===============
2020/07/29 15:41:27.110846 ZEBRA: Recv MACIP ADD VNI 101101 MAC 00:93:00:00:00:01 IP 21.1.13.1 flags 0x0 seq 2 VTEP 0.0.0.0 ESI 03:00:00:00:77:01:03:00:00:0d from bgp
2020/07/29 15:41:27.110867 ZEBRA: Ignore remote MACIP ADD VNI 101101 MAC 00:93:00:00:00:01 IP 21.1.13.1 as existing MAC has higher seq 3 flags 0x401
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
zebra: advertise stale neighs if EVPN-MH is not enabled
With EVPN-MH, Type-2 routes are also used for MAC-IP syncing between
ES peers so a change was done to only treat REACHABLE local neigh
entries as local-active and advertise them as Type-2 routes i.e. STALE
neigh entries are no longer advertised as Type-2s.
This however exposed some unexpected problems with MLAG where a
secondary reboot followed by a primary reboot left a lot of neighs
in STALE state (on the primary) resulting in them not being
advertised. And remote routed traffic to those hosts being
blackholed in a sym-IRB setup.
This commit is a workaround to fix the regression (it doesn't fix
the underlying problems with entries not becoming REACHABLE; which
maybe a day-1 problem). The workaround is to continue advertising
STALE neighbors if EVPN-MH is not enabled.
GalaxyGorilla [Sat, 19 Dec 2020 21:06:24 +0000 (21:06 +0000)]
pathd: un-guard clippy files
The relevant clippy machinery in python/makevars.py assumes to get
'raw' Makefile text containing all `clippy_scan` variables. If those
files in the `clippy_scan` variable are later on used in the
compilation process does not matter.
Donald Sharp [Fri, 18 Dec 2020 19:22:09 +0000 (14:22 -0500)]
lib: Fix dependency of match types in route-map code
Route-maps contain a hash of hash's that contain the
container type name ( say community or access list or whatever )
and then it has a hash of route-maps that this maps too
Suppose you have this:
!
frr version 7.3.1
frr defaults traditional
hostname eva
log stdout
!
debug route-map
!
router bgp 239
neighbor 192.168.161.2 remote-as external
!
address-family ipv4 unicast
neighbor 192.168.161.2 route-map foo in
exit-address-family
!
bgp community-list standard 7000:40002 permit 7000:40002
bgp community-list standard 7000:40002 permit 7000:40003
!
route-map foo deny 20
match community 7000:40002
!
route-map foo permit 10
!
line vty
!
end
You have a community hash which has an
7000:40002 entry
This entry has a hash of routemaps that are referencing it. In this above
example it would have `foo` as the single entry.
Given the above config if you do this:
eva# conf
eva(config)# route-map foo deny 20
eva(config-route-map)# match community 7000:4003
eva(config-route-map)#
We would expect the `7000:40002` community hash to no longer have
a reference to the `foo` routemap. Instead we see the code doing this:
2020/12/18 13:47:12 BGP: bgpd 7.3.1 starting: vty@2605, bgp@<all>:179
2020/12/18 13:47:47 BGP: Add route-map foo
2020/12/18 13:47:47 BGP: Route-map foo add sequence 10, type: permit
2020/12/18 13:47:57 BGP: Route-map foo add sequence 20, type: deny
2020/12/18 13:48:05 BGP: Adding dependency for filter 7000:40002 in route-map foo
2020/12/18 13:48:05 BGP: route_map_print_dependency: Dependency for 7000:40002: foo
2020/12/18 13:48:41 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.161.2 in vrf default
2020/12/18 13:49:19 BGP: Deleting dependency for filter 7000:4003 in route-map foo
2020/12/18 13:49:19 BGP: Adding dependency for filter 7000:4003 in route-map foo
2020/12/18 13:49:19 BGP: route_map_print_dependency: Dependency for 7000:4003: foo
Note how the code attempts to remove the dependency for `7000:4003` instead of the
dependency for `7000:40002`. Then we create a new hash for `7000:4003` and then
install the routemap name in it.
This is wrong. We should remove the `7000:40002` dependency and then install
a dependency for `7000:4003`.
Stephen Worley [Thu, 17 Dec 2020 21:16:26 +0000 (16:16 -0500)]
zebra: derive rule family from src->dst->ipv4
Derive the rule family from src if available, otherwise
dst if available, otherwise assume ipv4. We only support
ipv4/ipv6 currently so it we cant tell from the src/dst
it must be ipv4 and likely a dsfield match.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Sebastien Merle [Fri, 16 Oct 2020 14:55:51 +0000 (16:55 +0200)]
pathd: Add optional support for PCEP to pathd
This new dynamic module makes pathd behave as a PCC for dynamic candidate path
using the external library pcpelib https://github.com/volta-networks/pceplib .
The candidate paths defined as dynamic will trigger computation requests to the
configured PCE, and the PCE response will be used to update the policy.
It supports multiple PCE. The one with smaller precedence will be elected
as the master PCE, and only if the connection repeatedly fails, the PCC will
switch to another PCE.
Duncan Eastoe [Fri, 18 Dec 2020 14:35:21 +0000 (14:35 +0000)]
zebra: reduce atomic ops in fpm_process_queue()
Maintain the count of contexts which have been processed in a local
variable, and perform a single atomic update after we have consumed
all queued contexts.
Generally this results in at least one less atomic operation per
context.
Duncan Eastoe [Fri, 18 Dec 2020 14:29:36 +0000 (14:29 +0000)]
zebra: local var in fpm_process_queue() sched cond
Don't use an atomic operation to determine whether fpm_process_queue()
needs to be re-scheduled. Instead we can simply use a local variable
to determine if we stopped processing because we ran out of buffers.
In the case where we would have re-scheduled due to new context objects
in the queue (enqueued after we stopped processing), fpm_nl_process()
will schedule us (or will have done already).
This new daemon manages Segment-Routing Traffic-Engineering
(SR-TE) Policies and installs them into zebra. It provides
the usual yang support and vtysh commands to define or change
SR-TE Policies.
In a nutshell SR-TE Policies provide the possibility to steer
traffic through a (possibly dynamic) list of Segment Routing
segments to the endpoint of the policy. This list of segments
is part of a Candidate Path which again belongs to the SR-TE
Policy. SR-TE Policies are uniquely identified by their color
and endpoint. The color can be used to e.g. match BGP
communities on incoming traffic.
There can be multiple Candidate Paths for a single
policy, the active Candidate Path is chosen according to
certain conditions of which the most important is its
preference. Candidate Paths can be explicit (fixed list of
segments) or dynamic (list of segment comes from e.g. PCEP, see
below).
Configuration example:
segment-routing
traffic-eng
segment-list SL
index 10 mpls label 1111
index 20 mpls label 2222
!
policy color 4 endpoint 10.10.10.4
name POL4
binding-sid 104
candidate-path preference 100 name exp explicit segment-list SL
candidate-path preference 200 name dyn dynamic
!
!
!
There is an important connection between dynamic Candidate
Paths and the overall topic of Path Computation. Later on for
pathd a dynamic module will be introduced that is capable
of communicating via the PCEP protocol with a PCE (Path
Computation Element) which again is capable of calculating
paths according to its local TED (Traffic Engineering Database).
This dynamic module will be able to inject the mentioned
dynamic Candidate Paths into pathd based on calculated paths
from a PCE.
Reduce code in the critical sections of fpm_nl_process() and
fpm_process_queue() to the bare minimum - basically only enqueue
and dequeue operations on the shared ctxqueue.
Donald Sharp [Thu, 17 Dec 2020 21:49:20 +0000 (16:49 -0500)]
bgpd: Remove awful test of strmatch + get_afi_safi_str
Remove awful test of a strmatch against a call to get_afi_safi_str.
These are the easy ones as that the real decision point is/was
underneath this test. This is just duplicate expensive testing.
Stephen Worley [Thu, 17 Dec 2020 21:14:38 +0000 (16:14 -0500)]
pbrd: disallow ipv4/ipv6 mismatching in match src/dst
Disallow mismatching of ipv4/ipv6 matching in src/dst.
Doesn't make a lot of sense to allow this based on how
IP Headers work. The kernel does not allow it at all
obviously.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Donald Sharp [Tue, 15 Dec 2020 19:21:56 +0000 (14:21 -0500)]
lib, vtysh: Modify start/end configuration commands to be more hidden
There exists a world where some people have put `end` in their
configuration. Then vtysh will command search for it and find
it and then bad things happen.
Ticket: CM-32665 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When a new ES is created it is held in a non-DF state for 3 seconds
as specified by RFC7432. This allows the switch time to import
the Type-4 routes from the peers. And the peers time to rx the new
Type-4 route.
zebra: restart start-up delay timer when the first uplink comes up
When all the uplinks go down the VTEP is disconnected from the
VxLAN overlay and this was handled by proto-downing the ES bonds. When
the uplinks come up again we need to re-enable the ES bonds but that
needs to be done after a delay to allow the EVPN network to converge.
And that is done by firing off the startup-delay timer on first
uplink-up.
zebra: re-sync protodown state with the dplane on new ES add
1. When a bond is associated with an ES we may need to re-sync
the dplane protodown state (which maybe stale/set by some other
app).
2. Also change the uplink state display to avoid confusion with
protodown reason code (both used to show uplink-up).
protodown state is a combination of the dplane and zebra states.
protodown reason is maintained exclusively by zebra. Display this
information on two separate lines to make that ownership clearer.
Also display n/a for bonds as the dplane doesn't support protodowning
the bond device.