Reuben Dowle [Mon, 7 Dec 2020 03:35:13 +0000 (16:35 +1300)]
nhrpd: Cleanup resources when interface is deleted
Currently when an interface is deleted from configuration, associated
resources are not freed. This causes memory leaks and crashes.
To reproduce this issue:
* Connect to a DMVPN hub
* Outside of frr, delete the underlying GRE interface
* Use 'no interface xxx' to delete the interface containing nhrp configurations
Gaurav Goyal [Fri, 27 Nov 2020 02:41:09 +0000 (15:41 +1300)]
nhrpd: Only create one child sa
In some circumstances, especicially when GRE tunnel interface does not exist,
repeated child sa requests are sent. Prevent this by only sending another
request if the child sa does not exist
Gaurav Goyal [Sun, 1 Nov 2020 23:11:19 +0000 (12:11 +1300)]
nhrpd: Create route to private spoke-spoke network correctly
Currently when the first traffic to a private network causes a shortcut, an
on-link route to the private network is created on the gre interface, along
with the cache entry.
When connecting to a second IP in the same network, the kernel tries to resolve
the public IP for this private network via query to NHRP. nhrpd sees no entry
in the cache, so the packet is dropped.
The fix to this solution can be instead of creating an on-link route, create an
off-link route to private network, with the next-hop being the remote tunnel's
gre IP address.
Timo Teräs [Sun, 19 Jul 2020 15:07:31 +0000 (18:07 +0300)]
nhrpd: change ipsec SA count to 32-bit
Under certain misconfigurations, the SA count can be unusually high
and wrap 8-bit counter. That leads to premature free, and crash.
Make the count 32-bit to avoid crash in these rare conditions.
Amol Lad [Sat, 17 Oct 2020 05:32:19 +0000 (11:02 +0530)]
nhrpd: Set correct prefix length in nhrp registration
RFC2332 section 5.2.1 states (regarding the uniqueness bit) that:
Note that when this bit is set in an NHRP Registration Request, only a
single CIE may be specified in the NHRP Registration Request and that
CIE must have the Prefix Length field set to 0xFF. the prefix length is
the widest acceptable destination protocol address prefix. However, if
"Uniqueness" bit is set then it must be 255
This patch implements this requirement, which fixes interoperability with Cisco
NHRP hub routers.
Duncan Eastoe [Fri, 11 Dec 2020 11:07:59 +0000 (11:07 +0000)]
zebra: routes stuck with 'q' when using dplane FPM
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.
After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.
Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.
The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.
This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.
An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!
We can observe the above behaviour in these detailed dplane logs.
11:24:47 zebra[7282]: dplane: incoming new work counter: 2
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:47 zebra[7282]: dplane provider 'Kernel': processing
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main
2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.
11:24:58 zebra[7282]: dplane: incoming new work counter: 2
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:58 zebra[7282]: dplane provider 'Kernel': processing
11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.
Indeed the new 5.5.5.5/32 route is marked as queued:
O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19
This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:
11:26:41 zebra[7282]: dplane: incoming new work counter: 2
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:26:41 zebra[7282]: dplane provider 'Kernel': processing
11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS
We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.
As expected, the 5.5.5.5/32 route is no longer marked as queued:
O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06
But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:
C>q 10.10.10.10/32 is directly connected, lo, 00:26:05
Duncan Eastoe [Fri, 11 Dec 2020 11:03:53 +0000 (11:03 +0000)]
zebra: dplane API to get provider output q length
Returns the current number of (completed) contexts in the provider's
output queue (dp_ctx_out_q), allowing access to this data from the
provider itself.
Hiroki Shirokura [Thu, 10 Dec 2020 00:33:29 +0000 (09:33 +0900)]
zebra: unexpose label-manager util-funcs as static
Following functions which is a piece of label-maanager implementation
isn't called from out side of its file. And all lines of label-manager
are coded on zebra/label_manager.c at this time. So these functions
should be unexposed.
Renato Westphal [Sun, 6 Dec 2020 00:44:41 +0000 (21:44 -0300)]
ldpd: fix printfrr format specifiers in the child processes
In ldpd, the child processes send IPC messages to the main process to
perform logging in their behalf (access to the file descriptor used
for logging needs to be serialized). This commit fixes a problem
that was preventing the printfrr format specifiers from working in
the child processes, since vsnprintf() was being used instead of
vsnprintfrr() before sending the log messages to the parent process.
Karen Schoener [Tue, 8 Dec 2020 14:44:27 +0000 (09:44 -0500)]
isisd, ospfd: IGPs detect LDP down via zapi client close message
When ldp-sync is configured, IGPs take action if the LDP process goes down.
Currently, IGPs detect the LDP process is down if they do not receive a
periodic 'hello' message from LDP within 1 second.
Intermittently, this heartbeat mechanism causes false topotest failures.
When the failure occurs, LDP is busy receiving messages from zebra for a
few seconds. During this time, LDP does not send the expected periodic
message.
With this change, IGPs detect LDP down via zapi client close message.
zebra: anticipate zns creation at vrf creation when backend is vrf-lite
in the case the namespace pointer is already available, feed it at vrf
creation. this prevents from crashing if the netlink parsing already
began, and the vrf-lite is not enabled yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Following functions is using writen to dispatch message
into socket, but another function uses zserv_send_message.
This commit does tiny unification for zapi's socket messaging.
Kaushik [Mon, 9 Nov 2020 15:51:29 +0000 (07:51 -0800)]
ospf6d : Code refactoring for route redistribution.
1. Created new ospf6_redist structure.
2. Moved the 'route_map' structure from structure 'ospf6' to
structure 'ospf6_redist'.
3. Added new message type OSPF6_REDISTRIBUTE.
4. Added the placeholder for metric option in structure ospf6_redist
for redistribute.
5. Added few API's for route redistribute lookup, add & del.
kuldeepkash [Tue, 8 Dec 2020 15:50:16 +0000 (15:50 +0000)]
tests: Enhanced auto-rd verification as per changes done in #7652
1. As per recent changes done in PR #7652, we have modified the auto-rd verification logic link: https://github.com/FRRouting/frr/pull/7652 Signed-off-by: kuldeepkash <kashyapk@vmware.com>
Donald Sharp [Sat, 5 Dec 2020 20:34:59 +0000 (15:34 -0500)]
bgpd, zebra: Add ability for bgp to send AS-Path information to zebra
Add a bit of code to allow bgp to send the AS-Path associated with
the route being installed to zebra so it can be displayed and
used as part of the `show ip route A` command in zebra.
eva# show ip route 20.0.0.0/11
Routing entry for 20.0.0.0/11
Known via "bgp", distance 20, metric 0, best
Last update 00:00:00 ago
* 192.168.161.1, via enp39s0, weight 1
AS-Path: 60000 64539 15096 6939 8075
Donald Sharp [Sat, 5 Dec 2020 00:40:22 +0000 (19:40 -0500)]
lib: Add encode/decode of opaque data
Add a bit of code that allows for opaque data to be
sent from an upper level protocol to zebra. This is just
pass through data that will be used as part of displaying
useful data about a route in a `show ip route` command
in future commits.
Chirag Shah [Wed, 2 Dec 2020 00:02:36 +0000 (16:02 -0800)]
bgpd: fix distance for aggregate route
bgp aggregate address installs route with self peer which
can have peer->su of unspecifed type.
bgp_distance_apply bailed out as it fails to parse
sockunion2hostprefix for af type unspec.
Donald Sharp [Fri, 4 Dec 2020 13:01:31 +0000 (08:01 -0500)]
bgpd: Let's actually track if the nh was updated
In bgp_zebra_announce when iterating over multipath
we were checking to ensure that the nexthop was updated
but never initially clearing the nh_updated variable.
Thus leading to a situation where we could crash.
Karen Schoener [Thu, 3 Dec 2020 16:23:59 +0000 (11:23 -0500)]
isisd, ospfd: increase timeout to fix intermittent LDP Sync test failure
Currently, IGPs are coded to receive a 'hello' message from LDP every second.
Intermittently, LDP Sync topotests are failing because the IGPs fail to
receive this 'hello' message every second.
When the LDP Sync topotests fail, LDP logs show that LDP is processing
zapi messages for 1-2 seconds.
This is a shortterm fix, in order to prevent CI pipeline failures.
The longterm fix is in progress.