Donald Sharp [Tue, 15 Dec 2020 19:21:56 +0000 (14:21 -0500)]
lib, vtysh: Modify start/end configuration commands to be more hidden
There exists a world where some people have put `end` in their
configuration. Then vtysh will command search for it and find
it and then bad things happen.
Ticket: CM-32665 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When a new ES is created it is held in a non-DF state for 3 seconds
as specified by RFC7432. This allows the switch time to import
the Type-4 routes from the peers. And the peers time to rx the new
Type-4 route.
zebra: restart start-up delay timer when the first uplink comes up
When all the uplinks go down the VTEP is disconnected from the
VxLAN overlay and this was handled by proto-downing the ES bonds. When
the uplinks come up again we need to re-enable the ES bonds but that
needs to be done after a delay to allow the EVPN network to converge.
And that is done by firing off the startup-delay timer on first
uplink-up.
zebra: re-sync protodown state with the dplane on new ES add
1. When a bond is associated with an ES we may need to re-sync
the dplane protodown state (which maybe stale/set by some other
app).
2. Also change the uplink state display to avoid confusion with
protodown reason code (both used to show uplink-up).
protodown state is a combination of the dplane and zebra states.
protodown reason is maintained exclusively by zebra. Display this
information on two separate lines to make that ownership clearer.
Also display n/a for bonds as the dplane doesn't support protodowning
the bond device.
zebra: re-sync protodown state when a port/mbr is linked to an ES-bond
The code for this was already there but was not kicking in because of a
zebra local reason-code dup check. Even if the reason-code is the same,
if the dplane and zebra disagree about the protodown state zebra will
need to re-program the dplane.
Fixed a couple of spelling errors in the protodown logs to make greps
easy.
At source have match vni plus set statement in route-map.
Validate the origin of the route's outbound correctly sets
the 'set' statment based on match vni filter.
At origin:
route-map RM-EVPN-TE-Matches permit 10
match evpn vni 4001
set large-community 10:10:119
Receiving end:
Route [5]:[0]:[24]:[78.41.1.0] VNI 4001
5550
27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
Origin incomplete, metric 0, valid, external, bestpath-from-AS 5550, best (First path received)
Extended Community: RT:5550:4001 ET:8 Rmac:00:02:00:00:00:4d
Large Community: 10:10:119 <--- Large community stamped
Last update: Thu Dec 10 22:19:26 2020
Duncan Eastoe [Fri, 11 Dec 2020 11:07:59 +0000 (11:07 +0000)]
zebra: routes stuck with 'q' when using dplane FPM
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.
After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.
Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.
The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.
This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.
An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!
We can observe the above behaviour in these detailed dplane logs.
11:24:47 zebra[7282]: dplane: incoming new work counter: 2
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:47 zebra[7282]: dplane provider 'Kernel': processing
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main
2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.
11:24:58 zebra[7282]: dplane: incoming new work counter: 2
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:58 zebra[7282]: dplane provider 'Kernel': processing
11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.
Indeed the new 5.5.5.5/32 route is marked as queued:
O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19
This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:
11:26:41 zebra[7282]: dplane: incoming new work counter: 2
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:26:41 zebra[7282]: dplane provider 'Kernel': processing
11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS
We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.
As expected, the 5.5.5.5/32 route is no longer marked as queued:
O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06
But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:
C>q 10.10.10.10/32 is directly connected, lo, 00:26:05
Duncan Eastoe [Fri, 11 Dec 2020 11:03:53 +0000 (11:03 +0000)]
zebra: dplane API to get provider output q length
Returns the current number of (completed) contexts in the provider's
output queue (dp_ctx_out_q), allowing access to this data from the
provider itself.
Donald Sharp [Thu, 3 Dec 2020 20:48:59 +0000 (15:48 -0500)]
bgpd: Add global `bgp suppress-fib-pending` command
On top of the recent `bgp suppress-fib-pending which
was at a BGP_NODE level, add this command at the CONFIG_NODE
level as well and allow the command to apply to all instances
of bgp running.
Hiroki Shirokura [Thu, 10 Dec 2020 00:33:29 +0000 (09:33 +0900)]
zebra: unexpose label-manager util-funcs as static
Following functions which is a piece of label-maanager implementation
isn't called from out side of its file. And all lines of label-manager
are coded on zebra/label_manager.c at this time. So these functions
should be unexposed.
Renato Westphal [Sun, 6 Dec 2020 00:44:41 +0000 (21:44 -0300)]
ldpd: fix printfrr format specifiers in the child processes
In ldpd, the child processes send IPC messages to the main process to
perform logging in their behalf (access to the file descriptor used
for logging needs to be serialized). This commit fixes a problem
that was preventing the printfrr format specifiers from working in
the child processes, since vsnprintf() was being used instead of
vsnprintfrr() before sending the log messages to the parent process.
Karen Schoener [Tue, 8 Dec 2020 14:44:27 +0000 (09:44 -0500)]
isisd, ospfd: IGPs detect LDP down via zapi client close message
When ldp-sync is configured, IGPs take action if the LDP process goes down.
Currently, IGPs detect the LDP process is down if they do not receive a
periodic 'hello' message from LDP within 1 second.
Intermittently, this heartbeat mechanism causes false topotest failures.
When the failure occurs, LDP is busy receiving messages from zebra for a
few seconds. During this time, LDP does not send the expected periodic
message.
With this change, IGPs detect LDP down via zapi client close message.
zebra: anticipate zns creation at vrf creation when backend is vrf-lite
in the case the namespace pointer is already available, feed it at vrf
creation. this prevents from crashing if the netlink parsing already
began, and the vrf-lite is not enabled yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Following functions is using writen to dispatch message
into socket, but another function uses zserv_send_message.
This commit does tiny unification for zapi's socket messaging.
Kaushik [Mon, 9 Nov 2020 15:51:29 +0000 (07:51 -0800)]
ospf6d : Code refactoring for route redistribution.
1. Created new ospf6_redist structure.
2. Moved the 'route_map' structure from structure 'ospf6' to
structure 'ospf6_redist'.
3. Added new message type OSPF6_REDISTRIBUTE.
4. Added the placeholder for metric option in structure ospf6_redist
for redistribute.
5. Added few API's for route redistribute lookup, add & del.
kuldeepkash [Tue, 8 Dec 2020 15:50:16 +0000 (15:50 +0000)]
tests: Enhanced auto-rd verification as per changes done in #7652
1. As per recent changes done in PR #7652, we have modified the auto-rd verification logic link: https://github.com/FRRouting/frr/pull/7652 Signed-off-by: kuldeepkash <kashyapk@vmware.com>
Donald Sharp [Sat, 5 Dec 2020 20:34:59 +0000 (15:34 -0500)]
bgpd, zebra: Add ability for bgp to send AS-Path information to zebra
Add a bit of code to allow bgp to send the AS-Path associated with
the route being installed to zebra so it can be displayed and
used as part of the `show ip route A` command in zebra.
eva# show ip route 20.0.0.0/11
Routing entry for 20.0.0.0/11
Known via "bgp", distance 20, metric 0, best
Last update 00:00:00 ago
* 192.168.161.1, via enp39s0, weight 1
AS-Path: 60000 64539 15096 6939 8075
Donald Sharp [Sat, 5 Dec 2020 00:40:22 +0000 (19:40 -0500)]
lib: Add encode/decode of opaque data
Add a bit of code that allows for opaque data to be
sent from an upper level protocol to zebra. This is just
pass through data that will be used as part of displaying
useful data about a route in a `show ip route` command
in future commits.
Chirag Shah [Wed, 2 Dec 2020 00:02:36 +0000 (16:02 -0800)]
bgpd: fix distance for aggregate route
bgp aggregate address installs route with self peer which
can have peer->su of unspecifed type.
bgp_distance_apply bailed out as it fails to parse
sockunion2hostprefix for af type unspec.
Donald Sharp [Fri, 4 Dec 2020 13:01:31 +0000 (08:01 -0500)]
bgpd: Let's actually track if the nh was updated
In bgp_zebra_announce when iterating over multipath
we were checking to ensure that the nexthop was updated
but never initially clearing the nh_updated variable.
Thus leading to a situation where we could crash.