This is a basic setup and test of evpn-pim.
Create a vxlan device ensure that pim notices this
and setups the appropriate groups and sends them
to the RP.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. show ip pim mlag summary
provides MLAG session information and stats
2. show ip pim mlag upstream
displays the upstream entries synced across the MLAG switches
ipmr-lo is an internally added device used for multicast vxlan tunnel
termination. This device is not expected to be managed by the admin
however in the case it is accidentally shut we need to be able handle
it by recovering when it is "no shut" again.
This string is used in some logging for e.g. in zclient_read -
>>>>>>>>>>>>>>>>>>>>>>>>>>
if (zclient_debug)
zlog_debug("zclient 0x%p command %s VRF %u",
(void *)zclient, zserv_command_string(command),
vrf_id);
>>>>>>>>>>>>>>>>>>>>>>>>>>
pimd: remove peerlink_rif from the orig-mroute OIL when it is oper down
In an anycast VTEP setup the peerlink_rif is added as a static OIF
to the originating mroute (bypassing the pim state machine). This is
needed to ensure both MLAG switches rx a copy of encapsulated BUM flow.
We were not handling link state changes on this static OIF resulting
in the wrong vifi being used in the OIL (because of vifi re-allocation).
This commit re-acts to oper state changes by deleting the OIF on link
down and re-adding it on link up.
pimd: stop overloading SRC_IGMP upstream for vxlan local membership
A local membership is created on the vxlan termination device ipmr-lo. This
is done to -
1. Pull multicast vxlan tunnel traffic to the VTEP for termination by
triggering JoinDesired on the BUM multicast group.
2. Include the OIF in the mroute to signal to the dataplane component
that flow needs to be vxlan terminated.
Earlier we were overloading the PIM_UPSTREAM_FLAG_MASK_SRC_IGMP for
this local membership creation but that is creating confusion both in
the state machine and in the show outputs. To avoid that we use the
more apparent PIM_UPSTREAM_FLAG_MASK_SRC_VXLAN_TERM. With this change -
1. We get LHR functionality for VXLAN_TERM mroutes
2. OIF is populated with PIM_OIF_FLAG_PROTO_PIM only
pimd: force update inherited OIL when vxlan local membership is created
When local member is added the (*, G) entry may already be in a JOINED
state. In that case the OIL is not updated i.e. pim_channel_add_oif is
not happening for ipmr-lo. Because of this the traffic associated with
the multicast vxlan tunnel is pulled down to the VTEP but not terminated
by the kernel.
This change force updates the OIL anytime ipmr-lo is added or removed
as a local member.
pimd: increase RPF metric via the peerlink_rif by plus-10
The RPF cost is incremented by 10 if the RPF interface is the peerlink-rif.
This is used to force the MLAG switch with the lowest cost to the RPF
to become the MLAG DF. If a switch has to go via the peerlink-rif to get
to the RP or source it simplly cannot be the designated forwarder.
pimd: inherit MLAG DF role from the parent (*, G) entry
DF election is only run for (*,G) entries i.e. election is skipped
for (S,G) entries that are setup as a result of SPT switchover. (S,G)
entries inherit the DF role from the parent (*,G) entry. So the DF is
responsible for terminating all sources associated with a group.
pim: DF election for tunnel termination mroutes in an anycast-VTEP setup
1. Upstream entries associated with tunnel termination mroutes are
synced to the MLAG peer via the local MLAG daemon.
2. These entries are installed in the peer switch (via an upstream
ref flag).
3. DF (Designated Forwarder) election is run per-upstream entry by both
the MLAG switches -
a. The switch with the lowest RPF cost is the DF winner
b. If both switches have the same RPF cost the MLAG role is
used as a tie breaker with the MLAG primary becoming the DF
winner.
4. The DF winner terminates the multicast traffic by adding the tunnel
termination device to the OIL. The non-DF suppresses the termination
device from the OIL.
Note: Before the PIM-MLAG interface was available hidden config was
used to test the EVPN-PIM functionality with MLAG. I have removed the
code to persist that config to avoid confusion. The hidden commands are
still available.
Channel with the MLAG daemon is setup on the first VxLAN BUM MDT or
pim-mlag AA SVI.
This channel is used for -
1. rxing MLAG status status updates (peer state, role etc.)
2. for syncing active-active upstream entries with the peer MLAG
switch.
Kishore Aramalla [Mon, 10 Feb 2020 19:38:27 +0000 (11:38 -0800)]
bgpd: RFC compliance wrt invalid RMAC, GWIP, ESI and VNI
A route where ESI, GW IP, MAC and Label are all zero at the same time SHOULD
be treat-as-withdraw.
Invalid MAC addresses are broadcast or multicast MAC addresses. The route
MUST be treat-as-withdraw in case of an invalid MAC address.
As FRR support Ethernet NVO Tunnels only.
Route will be withdrawn when ESI, GW IP and MAC are zero or Invalid MAC
Test cases:
1) ET-5 route with valid RMAC extended community
2) ET-5 route no RMAC extended community
3) ET-5 route with Multicast MAC in RMAC extended community
4) ET-5 route with Broadcast MAC in RMAC extended community
Donald Sharp [Tue, 11 Feb 2020 00:25:52 +0000 (19:25 -0500)]
bgpd: Update failed reason to distinguish some NHT scenarios
Current failed reasons for bgp when you have a peer that
is not online yet is `Waiting for NHT`, even if NHT has
succeeded. Add some code to differentiate this.
eva# show bgp ipv4 uni summ failed
BGP router identifier 192.168.201.135, local AS number 3923 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
192.168.44.1 0 0 never Waiting for NHT
192.168.201.139 0 0 never Waiting for Open to Succeed
Total number of neighbors 2
eva#
eva# show bgp nexthop
Current BGP nexthop cache:
192.168.44.1 invalid, peer 192.168.44.1
Must be Connected
Last update: Mon Feb 10 19:05:19 2020
Ameya Dharkar [Thu, 6 Feb 2020 21:47:43 +0000 (13:47 -0800)]
bgpd: EVPN crash because of incorrect nexthop for IPv6 prefix
RCA:
When we install IPv6 prefix imported from EVPN RT-5 in vrf, nexthop of the IPv6
route should be IPv4 mapped IPv6 address. In function
install_evpn_route_entry_in_vrf, we generate a new attribute with IPv4 mapped
IPv6 nexthop, but we use parent->attr while creating the actual route.
Thus, Ipv4 nexthop is assigned to this route.
Because of this incorrect nexthop, we observed a crash in function
update_ipv6nh_for_route_install.
Fix:
Pass the new attribute with Ipv4 mapped Ipv6 nexthop to
bgp_create_evpn_bgp_path_info
Rafael Zalamena [Mon, 30 Sep 2019 13:34:49 +0000 (10:34 -0300)]
lib: implement route map northbound
Based on the route map old CLI, implement the route map handling using
the exported functions.
Use a curry-like programming pattern avoid code repetition when
destroying match/set entries. This is needed by other daemons that
implement custom route map functions and need to pass to lib their
specific destroy functions.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Rafael Zalamena [Fri, 27 Sep 2019 22:32:10 +0000 (19:32 -0300)]
yang: update route map model
Important changes:
* Rename top container `route-map` to `lib`;
* Rename list `instance` to `route-map`;
* Move route map repeated data to list `entry`;
* Use interface reference instead of typedef'ed string;
* Remove some zebra specific route map conditions;
* Protect `tag` set value with `when "./action = 'tag'"`;
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Stephen Worley [Tue, 4 Feb 2020 16:35:31 +0000 (11:35 -0500)]
doc: indicate non-support for dynamic pbr maps
Indicate in the doc clearly that we do not support the dynamic
creation of pbr maps for sub-interfaces when the master has
an interface. Such as in vlans. It must be explicitly added.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Donald Sharp [Mon, 13 Jan 2020 21:11:46 +0000 (16:11 -0500)]
zebra: nexthop groups vrf's are only a function of namespaces
Nexthop groups as a whole do not make sense to have a vrf'ness
As that you can have a arbitrary number of nexthops that point
to separate vrf's.
Modify the code to make this distinction, by clearly delineating
the line between the nhg and the nexthop a bit better.
Nexthop groups having a vrf_id only make sense if you are using
network namespaces to represent them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>