Donald Sharp [Wed, 7 Dec 2016 13:15:16 +0000 (08:15 -0500)]
pimd: Modify pim_upstream_new behavior
Modify pim_usptream_new to auto create the pim
channel oil. Why? Because there exists situations
where we have upstream state and we are attempting
to set the inherited_olist on it and we have
not created the channel oil yet. When that
happens we end up with mroutes that are
being meanies.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 6 Dec 2016 18:52:14 +0000 (13:52 -0500)]
pimd: Correctly associate channel oil with correct IIF
There exists situations where PIM stores a S,G channel oil
and doesn't delete it. When a new upstream comes in for
it we were not ensuring that the IIF for the S,G channel oil
was correct.
Ticket: CM-13908 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 6 Dec 2016 15:08:09 +0000 (10:08 -0500)]
pimd: Clarify pim_mroute_[add|del] function debugging
When debugging is turned on for 'debug mroute' we
are unable to tell when a mroute has been added
or deleted from the mrib. Notice when we
do it and who called it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 5 Dec 2016 12:39:59 +0000 (07:39 -0500)]
pimd: Stop sending Register under certain situations
When the switch in question is both a FHR and the RP
for the received multicast group stream. When we go
to send the NULL register to the RP( ourselves )
do not send it if we have not seen packets for
that stream in time greater than PIM_KEEPALIVE_PERIOD
and I_am_RP(G).
Ticket: CM-13880 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Sat, 3 Dec 2016 00:12:35 +0000 (16:12 -0800)]
pimd: reset packet size on tcp connection reset
If we were in the middle of a partial read when the tcp connection is
reset we were clearing the buffers but not the packet size. This can be
problematic when the connection is re-established.
There is no easy way to repro and test this without scale (and a timing
pattern that is hard to predict). So this change is mostly untested.
Donald Sharp [Fri, 2 Dec 2016 01:32:03 +0000 (20:32 -0500)]
pimd: Modify the Prune Pending Timer Pop to not assert
So there exist conditions where we can start the Prune
Pending Timer and receive other packets that cause
us to not stop the pending timer. This was
due to a missread of the state machine.
Additionally when the prune pending timer pops and
we are not in prune pending state, note the fact
and move on with our life instead of crashing and burning
Ticket: CM-13851 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by:
anuradhak [Fri, 2 Dec 2016 05:01:34 +0000 (21:01 -0800)]
pimd: Drop local SA reference when the upstream SG is deleted
This is done irrespective of the reason for del and is intended as a
catchall for cases (unclear which ones) where the RP can drop the SG
without KAT expiry.
anuradhak [Thu, 1 Dec 2016 23:46:36 +0000 (15:46 -0800)]
pimd: restart the ka timer after the sa adv timer
To avoid unnecessary ka activity in the network. When the SA
advertisment timer fires we build SA TLVs and send them to peers. As a
part of this tx we were also restarting the ka timer to avoid
unnecessary ka generation in the next 60 seconds. However because the
adv timer was restarted after tx (i.e. after ka restart) ka timer would
always endup firing just before the adv timer.
Donald Sharp [Tue, 29 Nov 2016 15:49:00 +0000 (10:49 -0500)]
pimd: Do not allow deletion of output interface.
There exists conditions where PIM will have it's
upstream route removed and an unreachable route
is installed that points out the downstream
interface. This unreachable route is removed
from bgp as soon as it's path selection algorithim
works properly, but pim has already deleted
the oif and never puts it back in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 30 Nov 2016 00:19:51 +0000 (19:19 -0500)]
pimd: Lower Hello sent to be immediate
There exists situations where an interface flaps
and routing recovers and we attempt to install
an upstream but since we have no neighbor out
that interface still. Let's cause the hello
to go out immediately for the 3.1 release
to allow mrouting to recover in this situation.
We will need to revisit this issue after
we have proper nexthop tracking in place
Ticket: CM-13185 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Donald Sharp [Tue, 29 Nov 2016 13:14:35 +0000 (08:14 -0500)]
pimd: Only check to see if current rp is 'right' or not.
When a new rp is entered, pim is looking at all rp's and failing the check if
any of the RP's have no path to the RP, instead of the one that was
just entered being wrong.
Ticket: CM-12623 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Donald Sharp [Tue, 29 Nov 2016 00:05:09 +0000 (19:05 -0500)]
pimd: Actually expedite a hello
When we get a new neighbor for an interface, we need
to send a hello out that interface in some situations.
At this time we were tracking this by the pim_ifstat_hello_sent
value but not reseting it when we received a new neighbor.
Ticket: CM-13769 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
anuradhak [Mon, 28 Nov 2016 18:12:21 +0000 (10:12 -0800)]
pim-nexthop: mroute and pim-upstream rpf are falling out of sync.
Currently the mroute-IIF and upstream RPF-IIF/neigh are resolved separately.
This must change i.e. be merged together for a couple of reasons -
1. In the case of ECMP we will use a load-share mechanism (based on G or
SG) to pick an RPF neighbor in 3.2.1 (to use the load-sharing cap of
anycast-RP). Using a different resolution mechanism for mroute-IIF will
simply not work.
2. In a non-CLOS topology it is actually possible to have routers that
do not participate in PIM. In this case the tree will be set up using
different routers than the ones chosen for the mroute IIF. And traffic
will not be forwarded.
This change is however too big for 3.2.0. So to handle CM-13714 I have
simply forced rpf update on neigh add which fixes the specific problem
seen on link flap in a clos (it is not very efficient but traffic
recovers).
In problem state -
(jessie-30-dev-switch-amd64-sbuild)root@spine-1:/home/cumulus# ip mr
(0.0.0.0, 225.1.1.1) Iif: lo Oifs: swp3 lo
(20.0.11.253, 225.1.1.1) Iif: swp1 Oifs: swp3
(jessie-30-dev-switch-amd64-sbuild)root@spine-1:/home/cumulus# vtysh -c
"show ip pim upstream"
Iif Source Group State Uptime JoinTimer
RSTimer KATimer RefCnt
lo * 225.1.1.1 Joined 00:08:44 00:00:15
--:--:-- --:--:-- 1
swp2 20.0.11.253 225.1.1.1 Joined 00:08:35 00:00:56
--:--:-- 00:02:59 1
(jessie-30-dev-switch-amd64-sbuild)root@spine-1:/home/cumulus#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Wed, 23 Nov 2016 21:35:45 +0000 (13:35 -0800)]
pim-nexthop: set the correct nexthop address in the rpf info.
When a nexthop lookup is done we can get an ECMP output. But not all
nexthops are pim neighbors. If for this reason PIM chose a nexthop other
than the first the rpf info was not being set correctly i.e.
nexthop ip was still the one associated with the first path but
interface was set to the one associated with second path.
This problem is seen on a link flap in the CLOS topology.
Ticket: CM-13714 Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Sat, 19 Nov 2016 16:20:26 +0000 (11:20 -0500)]
pimd: Cleanup igmp read socket
With the change over to using the kernel upcall for igmp messages,
we need to add in a read thread for the igmp socket to drain
the igmp socket's receive queue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Sat, 19 Nov 2016 00:19:26 +0000 (16:19 -0800)]
pim-anycast-rp: Support in BGP unnumbered networks.
Anycast rp requires multiple ip addresses on the lo. If PIM is used in
an unnumbered BGP network it picks one of the lo addresses as the
pim-primary for the swp interfaces. But if the anycast IP is picked up
by both sides pim nbr will never converge. So a static "use-source" config
is provided to allow the administrator to force the the hello source to the
unique IP address.
Sample output:
=============
dell-s6000-04(config-if)# do show running-config pimd
>>>>>> SNIPPED >>>>>>>>>>>>>>>>>
interface lo
ip pim sm
ip pim use-source 100.1.1.5
!
>>>>>> SNIPPED >>>>>>>>>>>>>>>>>
dell-s6000-04(config-if)# do show ip pim interface lo
Interface : lo
State : up
Use Source : 100.1.1.5
Address : 100.1.1.5 (primary)
100.1.1.100
>>>>>> SNIPPED >>>>>>>>>>>>>>>>>
dell-s6000-04(config-if)# do show ip pim interface lo json
{
"lo":{
"name":"lo",
"state":"up",
"address":"100.1.1.5",
"index":1,
"lanDelayEnabled":true,
"useSource":"100.1.1.5",
"secondaryAddressList":[
"100.1.1.100"
],
>>>>>> SNIPPED >>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Fri, 18 Nov 2016 17:12:27 +0000 (09:12 -0800)]
pim-anycast-rp: Add limited support for secondary addresses.
Anycast requires that the lo interface be associated with multiple
addresses. One is the anycast IP address (which is the same on all RPs
participating in RP redundancy) and the second is the unique IP address
that will be used as the router id by routing protocols.
To accomodate that we maintain a list of secondary addresses per-pim iface
and allow any of them to be the RP address. This lets the I_am_RP macro
succeed on anycast RPs.
Note that the support is limited i.e. we don't actually advertise a
secondary list to the neighbors. This is assuming the anycast IP will never
be used as a router id i.e. will never be an RPF neighbor.
Sample output:
==============
dell-s6000-04# sh ip pim interface lo
Interface : lo
State : up
Address : 100.1.1.1 (primary)
100.1.1.2
100.1.1.3
100.1.2.1
>>>>>>> SNIP >>>>>>>>>>>>>>>
dell-s6000-04# sh ip pim interface lo json
{
"lo":{
"name":"lo",
"state":"up",
"address":"100.1.1.1",
"index":1,
"lanDelayEnabled":true,
"secondaryAddressList":[
"100.1.1.2",
"100.1.1.3",
"100.1.2.1"
],
>>>>>>> SNIP >>>>>>>>>>>>>>>
dell-s6000-04#sh ip pim rp-info
RP address group/prefix-list OIF I am RP
100.1.2.1 224.0.0.0/4 lo yes
dell-s6000-04#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 18 Nov 2016 00:57:10 +0000 (19:57 -0500)]
pimd: Set pim socket receive buffer to a larger value
There exists situations where we can receive data
faster than we can process it. Make the buffer
large enough to catch these situations for
the pim sockets.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Tue, 15 Nov 2016 23:39:11 +0000 (15:39 -0800)]
pim-kat: changed kat handling to match rfc-4601 more closely.
1. This is needed to layout the MSDP macros for determining what SAs are
originated by a MSDP speaker.
2. We no longer let the kat timer expire on an active flow. Activity
counters/lastuse is polled via a wheel for every SG entry. If new
activity is detected the keepalive timer is started and SPT bit set.
A SRC_STREAM reference is also created for the entry if one doesn't
already exist.
3. If KAT actually expires it means the flow is no longer active. At
this point we stop advertising the SA to MSDP peers. We also pull
the SRC_STREAM reference (deleting the entry if there are no other
references).
PS: Checking counters on KAT expiry will come in the next change.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 16 Nov 2016 19:38:14 +0000 (14:38 -0500)]
pimd: Increase kernel socket rcvbuf size.
We are receiving notifications from the linux
kernel that we are filling up the receive buffer
for upcalls into pimd. Let's increase the size
to something a bit bigger.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 15 Nov 2016 00:54:36 +0000 (19:54 -0500)]
pimd: Fix crash in pim_rp_show_information
When a 'show ip pim rp-info' is issued shortly
after a restart/start, pim will crash because
nexthop information has not been fully resolved
and the outgoing interface is NULL.
Ticket: CM-13567 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Sat, 12 Nov 2016 13:39:44 +0000 (05:39 -0800)]
pim-msdp: part-4: cli cleanup
1. Add support for mesh-group based configuration that is easy to apply
via automation. The older per-peer configuartion is temporarily hidden
and will be cleaned up later.
Sample config -
ip msdp mesh-group cumulus source 100.1.1.4
ip msdp mesh-group cumulus member 100.1.1.5
ip msdp mesh-group cumulus member 100.1.1.6
2. Added support for detail peer and sa-cache displays. Along with
filter keys.
3. Add json output for all the msdp displays.
With this commit basic support for anycast-RP with MSDP (in numbered
network is complete). Unnumbered support will be added separately.
Ticket: CM-13306
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Sat, 12 Nov 2016 02:21:49 +0000 (21:21 -0500)]
pimd: Removing impossible code
We've already allocated the mp data structure
and am using hash_alloc_intern for the hash_get
function. This will return the passed
in data structure. There is no possibility
of mp being NULL.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 11 Nov 2016 18:57:49 +0000 (13:57 -0500)]
lib: Fix thread_execute_crash
With the change to have thread_get fill inthe ->hist
pointer, thread_execute was missed and it
needs to fill in the .hist pointer for the
dummy thread created.
I'm not really sure why we need to call a
thread_execute on a function. When we
could, you know, just call the bloody
thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Tue, 8 Nov 2016 18:34:31 +0000 (10:34 -0800)]
pim-msdp: part-3: use SA cache for setting up SPTs
1. Added a new MSDP source reference flag for creating (S,G) entries
based on the SA-cache. The RFC recommends treating as SA like rxing
a (S, G) join (which is a bit different then treating like a traffic
stream).
2. SA-SPT is only setup if we are RP for the group and a corresponding
(*,G) exists with a non-empty OIL.
3. When an SA is moved we need to let the SPT live if it is active (this
change will come in a subsequent CL).
Testing done:
1. SA first; SPT setup whenever (*, G) comes around.
2. (*, G) first. As soon as SA is added SPT is setup.
3. (*, G) del with valid SA entries around.
Ticket: CM-13306
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
vivek [Fri, 11 Nov 2016 02:49:43 +0000 (18:49 -0800)]
zebra: Perform safe walk of RIB entries in rib_process()
There is a scenario where a RIB entry is unlinked and freed during RIB
processing. However, the walk of the entries is not being performed in
a safe manner. Fix the code to do this correctly.
Donald Sharp [Thu, 10 Nov 2016 18:58:40 +0000 (13:58 -0500)]
lib: Slight Optimization of thread handling.
This commit does these things:
1) Make thread_add_unuse own the setting of THREAD_UNUSED.
2) Move thread->hist finding to to thread_get.
We are storing the thread->hist even when the thread
is on the unused. This means that we check to see
if the funcname or func have changed and we get new
history. Else we've probably just retrieved the last
unused which has the same func/funcanme. This is
a common practice to do THREAD_OFF/THREAD_ON in
quick succession.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
Donald Sharp [Tue, 8 Nov 2016 20:26:48 +0000 (15:26 -0500)]
pimd: Add Handler for Receive (*,G) join for (S,G,rpt)
According to Figure 5( Downstream per-interface (S,G,rpt)
state when we receive a (*,G) we need to move (S,G,rpt)
children of the (*,G) into different states. This
implements that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
anuradhak [Mon, 31 Oct 2016 19:29:17 +0000 (12:29 -0700)]
pim-msdp: part-2: SA cache support
This commit includes -
1. Maintaining SA cache with local and remote entries.
2. Local SA entries - there are two cases where we pick up these -
- We are RP and got a source-register from the FHR.
- We are RP and FHR and learnt a new directly connected source on a
DR interface.
3. Local entries are pushed to peers immediately on addition and
periodically. An immediate push is also done when peer session is
established.
4. Remote SA entries - from other peers in the mesh group and passed
peer-RPF checks.
5. Remote entries are aged out. No other way to del them
currently. In the future we may add a knob to flush entries on
peer-down.
Testing done -
Misc topologies with CL routers plus basic interop with another vendor (
we can process their SA updates and they ours).
Sample output -
root@rp:~# vtysh -c "show ip msdp sa"
Source Group RP Uptime
33.1.1.1 239.1.1.2 local 00:02:34
33.1.1.1 239.1.1.3 local 00:02:19
44.1.1.1 239.1.1.4 100.1.3.1 00:01:12
44.1.1.1 239.1.1.5 100.1.3.1 00:00:55
root@rp:~#
Ticket: CM-13306
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>