This is specifically needed to allow pim-evpn mroutes in the MLAG setup -
(36.0.0.11, 239.1.1.100) Iif: peerlink.4094 Oifs: uplink-1, peerlink.4094
I could have gone the other way and disabled PIM_ENFORCE_LOOPFREE_MFC but
that opens the door too wide. Relaxing the checks for mlag-specific mroutes
seemed like the safer choice.
This commit provides the infrastructure to relax checks on a per-mroute
basis.
pimd: provide a mechanism to pin the IIF for an SG entry
In the case of vxlan origination entries IIF is set to -
1. lo for single VTEPs
2. MLAG-ISL for VTEPs multihomed via MLAG.
This commit creates the necessary infrastructure by -
1. allowing the IIF to be set statically (without RPF lookup)
2. and by preventing next-hop-tracking registration
PS: Note that I have skipped additional checks in pim_upstream_del
intentionally i.e. an attempt will be made to remove nexthop-tracking
for the upstream entry (with STATIC_IIF) which will fail because of the
up-entry not being in the nh's hash table. Ideally we should maintain
a nh pointer in the up-entry to prevent this unnecessary processing.
In the abscence of that I wanted to avoid spraying STATIC_IIF checks
all over.
pimd: provide an api to force stop kat on an upstream entry
In the case of pim vxlan we create and keep upstream entries alive
in the abscence of traffic. So we need a mechanism to purge entries
abruptly on vxlan SG delete without having to wait for the entry
to age out.
These are again just the infrastructure changes needed for it.
pimd: provide an upstream control to prevent KAT expiry
For vxlan BUM MDTs we prime the pump and register the local-VTEP-ip
as source even before the first BUM packet is rxed. This commit provides
the infrastructure changes needed for that.
zebra sends (S, G) and (*, G) entries for BUM mcast groups to pimd. This
commit includes the changes to handle the notifications and trigger the
creation of (S, G) base cache in pimd.
pimd: initial infrastructure to maintain VxLAN SG database
These entries will be used over the subsequent commits for
1. vxlan-tunnel-termination handling - setup MDT to rx VxLAN encapsulated
BUM traffic.
2. vxlan-tunnel-origination handling - register local-vtep-ip as a
multicast source.
lib, zebra: changes to propagate vxlan mcast SG entries to pimd
These updates act as triggers to pimd to -
1. join the MDT for rxing VxLAN encapsulated BUM traffic
2. register the local-vtep-ip as a source for the MDT
zebra: maintain mcast tunnel origination and termination SG entries
Each multicast tunnel is associated with a -
1. Tunnel origination mroute that is used for forwarding the
VxLAN encapsulated flow -
S - local VTEP-IP
G - BUM mcast-group
2. And a tunnel termination entry -
S - * (any remote VTEP)
G - BUM mcast-group
Multiple L2 VNIs can share the same BUM mcast group (and local-VTEP-IP).
Zebra maintains an mcast (SG) hash table to pass this info to pimd for
subsequent MDT setup.
zebra: install flood FDB entry only if the remote VTEP asked for HER
Remote VTEPs advertise the flood mode via IMET and the ingress VTEP
needs to perform head-end-replication of BUM packets to it only if the
PMSI tunnel type is set to ingress-replication. If a type-3 route is not
rxed or rxed with a mode other than ingress-replication we can skip
installation of the flood fdb entry for that L2-VNI. In that case the
remote VTEP is either not interested in BUM traffic or is using a
"static-config" based replication mode like PIM.
Sample output with HER -
=======================
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
Remote VTEPs for this VNI:
27.0.0.8 flood: HER
root@TORS1:~#
Sample output with PIM-SM -
=========================
root@TORS2:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
Remote VTEPs for this VNI:
27.0.0.7 flood: -
root@TORS2:~#
bgpd: propagate flood mode to zebra based on the tunnel-type in the IMET route
IMET/type-3 routes are used by VTEPs to advertise the flood mode for BUM
traffic via the PMSI tunnel attribute. If a type-3 route is not rxed from
a remote-VTEP we default to "no-head-end-rep" for that remote-VTEP. In such
cases static-config such as PIM is likely used for BUM flooding.
bgpd: suppress IMET route generation if flood mode is PIM-SM
IMET route is optional if the flood mode is PIM-SM and serves
no functional purpose. So this change limits type-3 route generation
to flood-mode=head-end-replication.
If PIM-SM if used for BUM flooding the multicast group address can be
configured per-vxlan-device. BGP receives this config from zebra via
the L2 VNI add/update.
zebra: header changes for l2 vni bum-mcast-grp handling
The multicast group ip address for BUM traffic is configurable per-l2-vni.
One way to configure that is to setup a vxlan device that per-l2-vni and
specify the address against that vxlan device -
root@TORS1:~# vtysh -c "show interface vx-1000" |grep -i vxlan
Interface Type Vxlan
VxLAN Id 1000 VTEP IP: 27.0.0.15 Access VLAN Id 1000 Mcast 239.1.1.100
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep Mcast
Mcast group: 239.1.1.100
root@TORS1:~#
Don Slice [Mon, 15 Apr 2019 18:27:00 +0000 (18:27 +0000)]
zebra: stop sending invalid nexthops to clients
Found that zebra_rnh_apply_nht_rmap would set the
NEXTHOP_FLAG_ACTIVE if not blocked by the route-map, even
if the flag was not active prior to the check. This fix
changes the flag used to denote the nexthop is filtered so
that proper active state can be retained. Additionally,
found two cases where we would send invalid nexthops via
send_client, which would also cause this crash. All three
fixed in this commit.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Donald Sharp [Tue, 16 Apr 2019 13:07:12 +0000 (09:07 -0400)]
zebra: Double check is not necessary in nexthop_active_update
The nexthop_active_update command looks at each individual
nexthop and decides if it has changed. If any nexthop
has changed we will set the re->status to ROUTE_ENTRY_CHANGED
and ROUTE_ENTRY_NEXTHOPS_CHANGED.
Additionally the test for old_nh_num != curr_active
makes no sense because suppose we have several events
we are processing at the same time and a total ecmp
of 16 but 14 are active at the start and 14 are active
at the end but different interfaces are up or down.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 16 Apr 2019 12:37:24 +0000 (08:37 -0400)]
lib, zebra: Remove unused flag
The NEXTHOP_FLAG_FILTERED went away when we started treating
static routes like every other route in the system. This was
a special case for handling static route code that just didn't
get finished cleaning up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 16 Apr 2019 12:09:56 +0000 (08:09 -0400)]
zebra: nexthop_active_update does not need set
We are effectively calling nexthop_active_update() on every
route entry being processed for installation at least 2 times.
This is a bit ridiculous. We need to resolve the nexthops
when we know a route has changed in some manner, so do so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 18 Apr 2019 00:47:44 +0000 (20:47 -0400)]
tests: bgp_l3vpn_to_bgp_vrf were bailing to quickly
The tests are not coming up consistently on my test box. Add a bit of wait
time to test to allow normal bgp when the first attempt doesn't come up.
Especially since bgp timeouts are 120 seconds with non datacenter compiles.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Certain operations, like removing non-presence containers or
modifying list keys, are not considered to be valid from the
perspective of the northbound layer. This is because we want to
implement a minimum set of northbound configuration callbacks and
use them to process all possible configuration changes.
The removal of a np-container [1], for example, can be processed by
calling the "delete" callback of all of its child nodes (recursion
is used for np-container child nodes). Similarly, the modification
of a list key can be processed as if the corresponding list entry
was removed and readded with updated key values. This strategy saves
us the burden of implementing lots of extra configuration callbacks.
That said, the nb_operation_is_valid() function shouldn't be used
for anything other than checking which callbacks are valid for
which YANG nodes. Using it in the nb_candidate_edit() function
is inappropriate as we want as much flexibility as possible when
editing a candidate configuration. We should allow CLI commands,
for example, to remove np-containers (the northbound layer will then
figure out which callbacks need to be called when this candidate
is committed). Remove the check.
[1] We can't do the same for presence containers since they have a
"create" callback associated with them.
lib: move zlog() prototype back to the public logging API
zlog() should be part of the public logging API as it's useful in
the cases where the logging priority isn't known at compile time
(i.e. it depends on a variable).
lib: don't initialize the northbound database in the unit tests
Move call to nb_db_init() from nb_init() to frr_init() so that only
the FRR daemons will initialize the northbound database. This should
fix a few warnings when running some unit tests.
David Lamparter [Tue, 16 Apr 2019 19:33:06 +0000 (21:33 +0200)]
ospfd: make ECMP nexthop order deterministic
The order of ECMP nexthops currently depends on whatever order the
pqueue code returns the vertices in, which is essentially random since
they compare as equal. While this shouldn't cause issues normally, it
is nondeterministic and causes the ldp-topo1 test to fail when the
ordering comes up different. Also, nondeterministic behaviour is not a
nice thing to have here in general.
Just sort by nexthop address; realistic numbers of ECMP nexthops should
hopefully not make this a performance issue. (Also, nexthops should be
hot in the caches here.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donald Sharp [Wed, 17 Apr 2019 03:15:56 +0000 (23:15 -0400)]
ospf6d: listhead returns a listnode *
The ospf6_route_get_first_nh_index function call calls
listhead which returns a (listnode *) but we are casting
it to a (struct ospf6_nexthop *) and away we go.
Fixes: #4142
Found By: Kwind Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Quentin Young [Fri, 29 Mar 2019 23:24:08 +0000 (19:24 -0400)]
bgpd: add support for MD5 auth on listen ranges
Co-authored-by: Donald Sharp <sharpd@cumulusnetworks.com> Co-authored-by: Quentin Young <qlyoung@cumulusnetworks.com> Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Mon, 1 Apr 2019 18:16:54 +0000 (18:16 +0000)]
lib: add support for extended TCP MD5 auth
MD5 auth on TCP is supported for prefixes in recent versions of Linux;
add complementary support for FRR.
This is a reworked version of Donald's commit to keep library
compatibility and obviate the need for changes in daemons that don't
need to support this themselves.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
doc: update libyang build instructions to enable compiler optimizations
libyang defaults CMAKE_BUILD_TYPE to "Debug", which disables compiler
optimizations. We should instruct our users to build libyang in the
"Release" mode so that compiler optimizations are enabled and they
can benefit from the associated performance improvements.
Dmitrii Turlupov [Wed, 27 Mar 2019 12:27:36 +0000 (15:27 +0300)]
yang: priority of isis commands in interface configuration
Move down the "circuit-type" leaf in the isisd YANG module so that
"ip[v6] router isis" will be the first commands displayed in the
running configuration.
Nitin Soni [Wed, 10 Apr 2019 06:13:51 +0000 (23:13 -0700)]
bgpd: new show cmd - bgp l2vpn evpn route detail
This command is added to provide detailed information. It will be
useful in troubleshooting as we will be able to dump all detailed info
using a single command.
"show bgp l2vpn evpn route [detail] ...". Additional filtering
can be done by providing type of the route.
Command will display the detailed content for all rd and macs-ip as
displayed by "show bgp l2vpn evpn route rd <> mac <>" for a single
rd, mac, ip from the global bgp routing table.
Ticket: CM-24397 Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by:
Testing-Done:
F. Aragon [Wed, 10 Apr 2019 17:08:50 +0000 (19:08 +0200)]
zebra: pseudowire event recovery (DoS fix)
When having a route recovery, because of the route installation
cycling and the next hop label check, it could happen that the PW
never gets recovered. The original code shows the intention of retrying,
but the code was missing. The fix includes the call to the timer programming
the recovery attempt.
Example for reproducing the issue:
|P1| <-> |P2| <-> |P3|
- Being P1, P2, P3 nodes, using IS-IS as IGP, and having a pseudowire
betwen P1 and P3 (P1, P2, P3 having configured LDP daemons).
- After 60 seconds, kill the IS-IS daemon in P2.
- Wait 30 seconds
- Launch again the IS-IS daemon in P2
- The bug/issue is that after P1 <-> P3 recovering connectivity sometimes
the PW is not recovered because the reason explained in the first paragraph.
Chirag Shah [Tue, 9 Apr 2019 19:30:15 +0000 (12:30 -0700)]
zebra: avoid removing node twice from rb_tree
In zebra terminate path, the node was attempted to remove
twice from the RB_TREE table. This lead to a crash during
zebra shutdown zebra_router_free_table already calls RB_REMOVE
to remove a node from rb tree table.
siginfo=0x7fffd9134a30, context=<optimized out>) at lib/sigevent.c:249
rbt=<optimized out>, t=<optimized out>) at lib/openbsd-tree.c:226
t=0x56296965ff50 <zebra_router_table_head_RB_INFO>) at lib/openbsd-tree.c:383
rbt=rbt@entry=0x562969669bd0 <zrouter+16>, elm=elm@entry=0x56296afcf810)
at lib/openbsd-tree.c:393
(elm=0x56296afcf810, head=0x562969669bd0 <zrouter+16>) at zebra/zebra_router.h:46
Singned-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Donald Sharp [Tue, 9 Apr 2019 17:20:32 +0000 (13:20 -0400)]
pimd: Add JoinDesired(S,G) to deciding to set spt bit
The decision for Update_SPTbit(S,G, iif) includes a test
for JoinDesired(S,G) in section 4.2.2. When we were deciding
to update the spt bit we were not taking this into account.
This commit fixes this issue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 8 Apr 2019 18:37:00 +0000 (14:37 -0400)]
pimd: Update state when receiving S,G join when in S,G RPT Prune state
When we receive a S,G join and the ifchannel is in S,G RPT Prune state,
pim should transition the ifchannel state to JOIN and transition the
pim_upstream state for the S,G stream.
Ticket: CM-24513 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Nitin Soni [Thu, 28 Mar 2019 05:49:03 +0000 (22:49 -0700)]
bgpd: new show cmd - bgp l2vpn evpn route vni all detail
This command is added to provide detailed information. It will be
useful in troubleshooting as we will be able to dump all detailed info
using a single command.
"net show bgp evpn route vni <all|id> [detail]". Additional filtering
can be done by providing vtep ip.
Command will display the detailed content for all vni and macs as
displayed by "net show bgp evpn route vni <> mac <> ip <>" for a single
vni, mac, ip.
Ticket: CM-24397 Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by:
Testing-Done:
Stephen Worley [Tue, 9 Apr 2019 14:35:11 +0000 (10:35 -0400)]
zebra: Fix late memset of pbr rule in rule_netlink
We were memsetting zebra_pbr_rule struct after
we had already put some information in it. Also updated
the init of the struct to use braces instead of a
memset.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Quentin Young [Mon, 8 Apr 2019 17:05:45 +0000 (17:05 +0000)]
ospfd: fix behavior of +/-metric
OSPFD uses -1 as a sentinel value for uninitialized metrics. When
applying a route map with a +/-metric to redistributed routes, we were
using -1 as our base value to increment or decrement on, which meant
that if you set e.g. +10, you would end up with a redistributed route of
metric 9.
This patch also removes an off-by-one sanity check that would cause a
set metric +1 or set metric 0 to result in a metric value of 20 :-)
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Donald Sharp [Sun, 7 Apr 2019 00:08:34 +0000 (20:08 -0400)]
bgpd: Remove extra alloc function bgp_path_info_new
The bgp_path_info_new function whenever it was called
pretty much duplicated the info_make function call. So
convert over to using it and remove the bgp_path_info_new
function so people are not tempted.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Sat, 6 Apr 2019 23:56:06 +0000 (19:56 -0400)]
ospfd: rn may be null
rn is not set the first time through the do {} while (); loop
As such we need to protect against it from being null( although
highly unlikely to ever happen given the ospf code base.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>