kuldeepkash [Tue, 8 Dec 2020 15:50:16 +0000 (15:50 +0000)]
tests: Enhanced auto-rd verification as per changes done in #7652
1. As per recent changes done in PR #7652, we have modified the auto-rd verification logic link: https://github.com/FRRouting/frr/pull/7652 Signed-off-by: kuldeepkash <kashyapk@vmware.com>
Chirag Shah [Wed, 2 Dec 2020 00:02:36 +0000 (16:02 -0800)]
bgpd: fix distance for aggregate route
bgp aggregate address installs route with self peer which
can have peer->su of unspecifed type.
bgp_distance_apply bailed out as it fails to parse
sockunion2hostprefix for af type unspec.
Donald Sharp [Fri, 4 Dec 2020 13:01:31 +0000 (08:01 -0500)]
bgpd: Let's actually track if the nh was updated
In bgp_zebra_announce when iterating over multipath
we were checking to ensure that the nexthop was updated
but never initially clearing the nh_updated variable.
Thus leading to a situation where we could crash.
Karen Schoener [Thu, 3 Dec 2020 16:23:59 +0000 (11:23 -0500)]
isisd, ospfd: increase timeout to fix intermittent LDP Sync test failure
Currently, IGPs are coded to receive a 'hello' message from LDP every second.
Intermittently, LDP Sync topotests are failing because the IGPs fail to
receive this 'hello' message every second.
When the LDP Sync topotests fail, LDP logs show that LDP is processing
zapi messages for 1-2 seconds.
This is a shortterm fix, in order to prevent CI pipeline failures.
The longterm fix is in progress.
ckishimo [Tue, 24 Nov 2020 14:53:22 +0000 (06:53 -0800)]
ospfd: fix cosmetic show ip ospf when NSSA
When executing the following command to change the NSSA translator role
from OSPF_NSSA_ROLE_ALWAYS to OSPF_NSSA_ROLE_NEVER
r2(config-router)# area 1 nssa translate-never
During the time the `ospf_abr_nssa_check_status()` function is not executed,
we are in a situation where the role is OSPF_NSSA_ROLE_NEVER (just configured)
but the NSSATranslatorState is still ENABLED
During this time the output of "show ip ospf" displays the following:
r2# show ip ospf
Area ID: 0.0.0.1 (NSSA)
Shortcutting mode: Default, S-bit consensus: no
Number of interfaces in this area: Total: 1, Active: 1
It is an NSSA configuration.
Elected NSSA/ABR performs type-7/type-5 LSA translation.
We are an ABR and Number of fully adjacent neighbors in this area: 1 (**)
Basically the case TranslatorState=ENABLED && TranslatorRole=ROLE_NEVER is not
covered in `ospf_vty.c`
This PR adds the case TranslatorState=ENABLED and TranslatorRole=ROLE_NEVER
which should only happen for a small period of time
Donald Sharp [Wed, 2 Dec 2020 12:26:33 +0000 (07:26 -0500)]
tests: Wait for convergence first
The test_bgp_multi_vrf_topo2.py script had a bunch
of places where it would change an interface status
or add delete routes that would affect bgp convergence
but it was never ensuring that convergence had happened
before the test verified the bgp rib. I believe this
was leading to many intermittant ci failures in
testing for other PR's to be accepted. Modify
the code to wait for bgp convergence if we just
made a change to the topology
Donald Sharp [Tue, 1 Dec 2020 20:37:03 +0000 (15:37 -0500)]
ospfd: Set Curr_mtu to when we get the mtu
Currently if you start ospfd, bring up neighbors and then issue
a tcpdump on a interface ospf is peering over, this causes the neighbor
relationship to be restarted:
root@spectrum301(mlx-4600c-01):mgmt:~# tcpdump -i vlan402
2020-11-13T21:25:38.059671+00:00 spectrum301 ospfd[29953]: AdjChg: Nbr 202.0.0.3(default) on vlan402:200.0.3.1: Full -> Deleted (KillNbr)
2020-11-13T21:25:38.065520+00:00 spectrum301 ospfd[29953]: ospfTrapNbrStateChange: trap sent: 200.0.3.2 now Deleted/DROther
2020-11-13T21:25:38.065922+00:00 spectrum301 ospfd[29953]: ospfTrapIfStateChange: trap sent: 200.0.3.1 now Down
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan402, link-type EN10MB (Ethernet), capture size 262144 bytes
21:25:38.072330 IP 200.0.3.1 > igmp.mcast.net: igmp v3 report, 1 group record(s)
2020-11-13T21:25:38.080430+00:00 spectrum301 ospfd[29953]: ospfTrapIfStateChange: trap sent: 200.0.3.1 now Point-To-Point
2020-11-13T21:25:38.080654+00:00 spectrum301 ospfd[29953]: SPF Processing Time(usecs): 9734
2020-11-13T21:25:38.080829+00:00 spectrum301 ospfd[29953]: SPF Time: 6422
2020-11-13T21:25:38.080991+00:00 spectrum301 ospfd[29953]: InterArea: 1572
2020-11-13T21:25:38.081152+00:00 spectrum301 ospfd[29953]: Prune: 67
2020-11-13T21:25:38.081329+00:00 spectrum301 ospfd[29953]: RouteInstall: 1396
2020-11-13T21:25:38.081548+00:00 spectrum301 ospfd[29953]: Reason(s) for SPF: N, S, ABR, ASBR
21:25:38.092510 IP 200.0.3.1 > ospf-all.mcast.net: OSPFv2, Hello, length 44
This is happening because the curr_mtu is not being properly stored. It was being set
on interface creation( but we have not actually read in the mtu part of the interface data, so
it is still 0 ).
Modify the code to store the curr_mtu at a point in interface creation *After* we have read
in interface data.
Ticket: CM-32276 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Igor Ryzhov [Wed, 2 Dec 2020 00:36:10 +0000 (03:36 +0300)]
ospf: fix instance initialization when using multi-instance mode
OSPF instance initialization was moved from "router ospf" vty command to
ospf_get function some time ago but the same thing must be done in
ospf_get_instance function used when multi-instance mode is enabled.
Duncan Eastoe [Thu, 26 Nov 2020 17:34:09 +0000 (17:34 +0000)]
ospfd: vlink auth type not shown in running config
The following virtual-link configuration was not represented in the
running configuration:
area <area> virtual-link <ip> authentication [null|message-digest]
zebra: debug logs to detect incorrect mac deletions
A MAC entry cannot be deleted while a neigh is referencing it. It seems
there is some race condition where this may be happening. The log is
to help identify those cases.
zebra: allocate one nexthop id per-VTEP instead of one per-ES-VTEP
This is an optimization to reduce the number of L2 nexthops. A
l2 or fdb nexthop simply provides the dataplane with a nexthop ip-
torm-12:mgmt:~# ip nexthop
id 268435461 via 27.0.0.20 scope link fdb
id 268435463 via 27.0.0.20 scope link fdb
id 268435465 via 27.0.0.20 scope link fdb
So there is no need to allocate a nexthop per-ES/per-VTEP. There
can be 100+ ESs per-VTEP so this change cuts the scale down by a
factor of 100.
zebra: support for slow-failover of local MACs on an ES
When a local ES flaps there are two modes in which the local
MACs are failed over -
1. Fast failover - A backup NHG (ES-peer group) is programmed in the
dataplane per-access port. When a local ES flaps the MAC entries
are left unaltered i.e. pointing to the down access port. And the
dataplane redirects traffic destined to the oper-down access port
via the backup NHG.
2. Slow failover - This mode needs to be turned on to allow dataplanes
not capable of re-directing traffic. In this mode local MAC entries
on a down local ES are re-programmed to point to the ES-peers'
NHG. And vice-versa i.e. when the ES comes up the MAC entries
are re-programmed with the access port as dest.
Fast failover is on by default. Slow failover can be enabled via the
following config -
evpn mh redirect-off
zebra: on local mac add from the dplane a re-install maybe need as static
As a part of extended MM handing a MAC can be updated from local
to remote while being referenced by SYNC neighs (this is really a
temporary/small window). During this window if the MAC transitions
back to local again we need to re-inforce the previous SYNC flags
(based on the sync-neigh count) as subsequent SYNC updates to the
MAC will be de-duped and ignored.
zebra: set inactive bit when zebra re-installs the MAC on dplane del
When a local mac is deleted by the dataplane zebra can re-install it
if the MAC is a SYNC MAC (learned from ES peers). The "local_inactive"
bit must be set as a part of the re-install to prevent zebra turning
around and advertising the MAC as locally active.
Also fixed up some debug logs in the slow-fail path to include the VNI.
zebra: remove FDB entries before de-activating a L2-NHG
NHG is activated i.e. programmed in the dataplane only if there
are active-VTEPs associated with it. When a NHG is de-activated
all the remote-mac entries associated with it need to be removed
before the NHG is removed.
The lookup for non default VRFs was always using a tableId; if not
provided, we were defaulting to RT_TABLE_MAIN. This is fine for the
default VRF but not for others. As a result, the command was silently
failing for non-default VRFs unless we also specified the correct tableId.
Fix this by only performing the lookup using the tableId if it is
provided; else use zebra_vrf_table.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Stephen Worley [Tue, 1 Dec 2020 17:04:30 +0000 (12:04 -0500)]
zebra: make a couple NHG errors debugs
A couple NHG messages we were logging as errors are a bit spammy
in usecases where you routinely add/remove interfaces (VM heavy
deployments). Its not really an error a user cares about and
more for a developer to know what went wrong after the fact so
it makes more sense for these to be under a debug rather than
an error since seeing them does not implicitly mean error during
those usecases.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Donald Sharp [Tue, 1 Dec 2020 12:57:45 +0000 (07:57 -0500)]
pimd: Remove pim_version.c it is never used
The pim_version.[c|h] files are never used and we are getting
warnings about PIM_VERSION changing pointer sizes from
newer versions of the compiler. I see no reason to keep this
Javier Garcia [Tue, 1 Dec 2020 11:28:39 +0000 (12:28 +0100)]
tools: Fix run folder permissions
In the case of some linux distros the /var/run dir is mounted
with tmpfs so in every reboot it's removed.
Then the frrcommon.sh will recreate it without 'x' perm
So no pid file cannot be created in /var/run/frr
Rafael Zalamena [Tue, 1 Dec 2020 11:01:37 +0000 (08:01 -0300)]
bfdd: session specific command type checks
Replace the unclear error message:
```
% Failed to edit configuration.
YANG error(s):
Schema node not found.
YANG path: /frr-bfdd:bfdd/bfd/sessions/single-hop[dest-addr='192.168.253.6'][interface=''][vrf='default']/minimum-ttl
```
With:
```
frr(config-bfd-peer)# minimum-ttl 250
% Minimum TTL is only available for multi hop sessions.
! or
frr(config-bfd-peer)# echo
% Echo mode is only available for single hop sessions.
frr(config-bfd-peer)# echo-interval 300
% Echo mode is only available for single hop sessions.
```
Reported-by: Trae Santiago Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Philippe Guibert [Mon, 23 Nov 2020 14:29:40 +0000 (14:29 +0000)]
bgpd: add peer description for each afi/safi line in show summary
For each afi/safi of 'show bgp summary', display the peer description
each time needed. This information is useful, for instance in the case
of a device connected with multiple peers.
The topotest all_protocol_startup is changed accordingly.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Donald Sharp [Tue, 1 Dec 2020 00:37:53 +0000 (19:37 -0500)]
zebra: Reduce warn -> debug
During times of network trauma and when we are at large network scale
the process_remote_macip_add function can issue a zlog_warn for
a common occurrence. Modify the code to be a debug statement.
This behavior is the same now as the process_remote_macip_del function
Mark Stapp [Mon, 30 Nov 2020 21:42:18 +0000 (16:42 -0500)]
zebra: add an api to process/clean the pending dplane queue
Add an api that allows a caller in the zebra main pthread to
process the queue of pending dplane updates. The caller supplies
a function to call to test each pending context. Selected
contexts are dequeued, and freed without being processed.
Igor Ryzhov [Mon, 30 Nov 2020 15:50:51 +0000 (18:50 +0300)]
vtysh: fix incorrect memory statistics
As code comment states, 1 count of MTYPE_COMPLETION is leaked for each
autocompleted token. Let's manually decrement the counter before passing
the pointer to readline.
Donald Sharp [Sat, 28 Nov 2020 20:35:18 +0000 (15:35 -0500)]
ospfd: Restore POINTOMULTIPOINT to working order
Commit: 1d376ff539508f336cb5872c5592b780e3db180b removed
the code to find nexthops for the POINTOMULTIPOINT and
replaced it with a generic bit of code that was
supposed to handle both POINTOPOINT and POINTOMULTIPOINT
the problem is that the ospf rfc states that the
network mask on point to multipoint should be /32
which will not allow you to properly do a prefix match
on it against the network.
Restore original behavior as much as possible and leave
the new POINTOPOINT code alone.
Fixes: #7624 Signed-off-by: Donald Sharp <sharpd@nvidia.com>