Donald Sharp [Thu, 11 Jun 2015 16:11:12 +0000 (09:11 -0700)]
The CHANGED flag may be set for a route (RIB entry) due to change in
interface or nexthop status. However, this route may not be selected as
the best and may not be the prior best. The flag needs to be reset
after evaluating the route as not doing so may prevent future nexthop
validation for this route.
Donald Sharp [Thu, 11 Jun 2015 16:11:12 +0000 (09:11 -0700)]
If the nexthop is only resolved over a default route and that is not
explicitly allowed, don't treat it as a change for routes using this
nexthop, unless the resolution has really changed.
Donald Sharp [Thu, 11 Jun 2015 16:11:12 +0000 (09:11 -0700)]
Zebra: Implement route replace semantics.
Zebra currently performs a delete followed by add when a route needs to be
modified. Change this to use the replace semantics of netlink so that the
operation can possibly be atomic.
Donald Sharp [Thu, 11 Jun 2015 16:11:12 +0000 (09:11 -0700)]
Zebra: Don't resolve nexthops over default route unless explicitly allowed.
Ensure that resolution of a nexthop using a default route is not done in the
nexthop validation/update code in zebra_rib.c also. This is an addition to
the zebra-nht-no-default.patch which made the checks only in the NHT code. In
the case of scenarios like interface down, this nexthop update code will kick
in first to update the route before the NHT code comes into play; without the
additional fix, this code could incorrectly resolve the nexthop over a default
route, even when disallowed by the administrator.
Donald Sharp [Thu, 11 Jun 2015 16:11:12 +0000 (09:11 -0700)]
When an interface is disabled, a Cumulus kernel patch causes route deletes to
be issued to Quagga. Quagga will in turn try to re-add the route(s) back to
the kernel and this will result in an error back from the kernel. This change
is to make sure these error messages are not logged by default. Subsequent
changes will cleanup this handling (to address CM-4577).
Donald Sharp [Wed, 20 May 2015 23:55:57 +0000 (16:55 -0700)]
vtysh-integrated-fix.patch
Lost config when switching back and forth between 'service integrated-vtysh-config'.
Also it was possible to have config files not be read in if they were not generated.
Ticket: CM-6011, CM-6033
Reviewed By: Daniel Walton <dwalton@cumulusnetworks.com>
Testing Done: See bugs
Donald Sharp [Wed, 20 May 2015 14:00:02 +0000 (07:00 -0700)]
Fixup 'force' -vs- 'all' compile issue
Our code implemented 'force' for a keyword while quagga mainline implemented 'all'.
This fixups the #define usage that was missed that came in during one of the patch
files. This is a compile only testing
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:46:10 +0000 (18:46 -0700)]
bgpd-ttl-fix.patch
BGP: Fix MINTTL and IPV6_MINHOPCOUNT
The #defines for IP_MINTTL and IPV6_MINHOPCOUNT need to be handled
correctly as part of the configure.ac code. Instead of hard coding
the values directly in the code
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:45:53 +0000 (18:45 -0700)]
bgpd-ebgp-multihop-fix.patch
BGP: Fix EBGP multihop transitions correctly
Since BGP connection setup has migrated to using NHT to decide when to bring a
session up, we have to handle ebgp multihop transitions correctly to ensure NHT
registrations are correctly handled.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:29:17 +0000 (18:29 -0700)]
debian: modify quagga pkg depend on cl-utilities pkg
Modified debian/control file to list as a dependency the cl-utilities package
as the cl-utilities package provides start-stop-monitor which is needed by
quagga to be monitored correctly.
Basically ZEBRA_INTERFACE_LINKDETECTION is set to on by default now.
Virtual links are failing to identify as up because of this code change.
Modify ospf to set the flag as appropriate
Donald Sharp [Wed, 20 May 2015 01:29:16 +0000 (18:29 -0700)]
bgpd-hostname-cap.patch
bgpd: Exchange hostname capability and display hostnames in outputs
This patch adds a hostname capability. The node's hostname and
domainname are exchanged in the new capability and used in show command
outputs based on a knob enabled by the user. The hostname and domainname
can be a maximum of 64 chars long, each.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:29:16 +0000 (18:29 -0700)]
quagga: quagga-debian-upgrade.patch
debian: The upgrade is failing due to missing files Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by:
Donald Sharp [Wed, 20 May 2015 01:29:16 +0000 (18:29 -0700)]
quagga: quagga-startup-fds.patch
Setup default number of filedescriptors allowed in quagga defaults and ulimit calls Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by:
Donald Sharp [Wed, 20 May 2015 01:29:14 +0000 (18:29 -0700)]
ospfd: ospfd-warnings.patch
Remove compile warnings for the ospfd/ospf6d directory Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by:
Donald Sharp [Wed, 20 May 2015 01:04:25 +0000 (18:04 -0700)]
bgpd: bgpd-no-as.patch
bgp: Fixup of the remote-as command to allow user to not have to enter an actual as number Signed-off-by: Donald Sharp<sharpd@cumulusnetworks.com>
Reviewed-by:
Donald Sharp [Wed, 20 May 2015 01:04:20 +0000 (18:04 -0700)]
If a route-map is used on a neighbor default-originate statement we need to dynamically add/del the default route if the permit/deny result of the route-map changes.
Donald Sharp [Wed, 20 May 2015 01:04:20 +0000 (18:04 -0700)]
bgpd-nht-import-check-fix.patch
BGP: Fix network import check use with NHT instead of scanner
When next hop tracking was implemented and the bgp scanner was eliminated,
the "network import-check" command got broken. This patch fixes that
issue. NHT is used to not just track nexthops, but also the static routes
that are announced as part of BGP's network command. The routes are
registered only when import-check is enabled. To optimize performance,
we register static routes only when import-check is enabled.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:04:19 +0000 (18:04 -0700)]
During connection setup, there may be two connections in progress for a BGP
peer - one initiated by the local system and the other initiated by the peer.
Enhance key debug logs to also print the socket file descriptor so that it is
clear which events pertain to which connection.
Donald Sharp [Wed, 20 May 2015 01:04:18 +0000 (18:04 -0700)]
When a peer is unbound from its peer-group, in some situations the peer is
deleted while in other situations, the peer continues to exist but its
global flags have all been reset. This is incorrect, particularly for the
CONFIG_NODE flag as other parts of the code depend on this flag being set
for a configured peer. This patch ensures that the correct flags still
remain set for the peer after unbind from its peer-group.
Donald Sharp [Wed, 20 May 2015 01:04:17 +0000 (18:04 -0700)]
The retry of BGP connection after expiry of connect retry timer was
broken by some earlier patches. Instead of staying in Connect state
after reattempting the connection, the state used to go back to Idle
and then try to connect. This patch fixes this error.
Donald Sharp [Wed, 20 May 2015 01:04:16 +0000 (18:04 -0700)]
Zebra: Don't resolve routes over default for nexthop tracking
Resolving routes over the default route for NHT can lead to all sorts
of problems. So, we explicitly exclude resolving routes for NHT over the
default route. A knob is provided to allow the route to be resolved over
the default in case of special circumstances.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:04:16 +0000 (18:04 -0700)]
Zebra: Ensure we compare prefix and NHs when checking if NH changed
In nexthop tracking, the code currently compares the nexthop state of the
resolved_route for a prefix with the previous nexthop state. However, if
the resolved route itself changes, we can end up comparing the RIBs of
unrelated prefixes and assuming that nothing has changed. To fix this, we
need to store and compare the new resolved route with the previously
resolved route. If this has changed, assume the NH associated with a route
has changed.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:04:15 +0000 (18:04 -0700)]
Zebra: Static NHT fixes
When NHT calls rib_process() to be invoked for a prefix, the RIB has already
been marked as having NH changes. The first call to nexthop_active_update
clears this flag and attempts to re-determine if there are any NH changes for
a prefix. However, when the NH is recurisve, this fails. Furthermore, since
NHT has already determined that this RIB has NH changes, there's no need to
ascertain that again. The original patch used static route as the proxy to
skip this call which was incorrect since rib_process can be invoked for
static routes for reasons other than NHT. So, this patch removes the check
for static route and directly checks if the NH changed flag has been set.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:04:15 +0000 (18:04 -0700)]
ospfd: ospf_cli_fixes
ospf: Fix cli issues with timers throttle spf and no ip ospf authentication...
When entering no timers throttle spf there was no way to specify the delay, hold
time and max hold time so the command was rejected. This is useful for automated
processes that take currently entered cli to remove the cli.
When entering no ip ospf authentication most forms of the command were being
ignored, this fixes that as well.
Signed-off-by: Donald Sharp <sharpd at cumulusnetworks.com>
Reviewed-by:
Donald Sharp [Wed, 20 May 2015 01:04:14 +0000 (18:04 -0700)]
When an incoming connection is received from a neighbor that is configured but
is not activated for any address-family, the connection is accepted without
taking further action. This causes the connection to hang in OpenSent on the
neighbor and can in turn delay the connection setup. Fix to reject incoming
connections when there is no address-family activated for the neighbor.
Donald Sharp [Wed, 20 May 2015 01:04:13 +0000 (18:04 -0700)]
initd-status.patch
Add support for service quagga status.
As per LSB initscript status code definitions, support is added for
querying status of quagga. All daemons supposed to have been enabled, will
be checked as running and if any one of them is found to be not running, the
appropriate status code is returned.
Note that if watchquagga is running, a status indicating a problem maybe a
trasient problem because watchquagga will start back an unresponsive or dead
process.
http://refspecs.linuxbase.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
Donald Sharp [Wed, 20 May 2015 01:04:13 +0000 (18:04 -0700)]
zebra-rtadv-suppress-default-config.patch
Zebra: Suppress displaying default config as part of running config
Quagga doesn't display default config as part of the running config, only
what is different from the default. However, in the case of rtadv, every
link displays the default "ipv6 nd suppress-ra" as part of running config.
This patch fixes that.
Donald Sharp [Wed, 20 May 2015 01:04:12 +0000 (18:04 -0700)]
When a peer that is Established goes down, it is moved into the Clearing
state to facilitate clearing of the routes received from the peer - remove
from the RIB, reselect best path, update/delete from Zebra and to other
peers etc. At the end of this, a Clearing_Completed event is generated to
the FSM which will allow the peer to move out of Clearing to Idle.
The issue in the code is that there is a possibility of multiple Clearing
Completed events being generated for a peer, one per AFI/SAFI. Upon the
first such event, the peer would move to Idle. If other events happened
(e.g., new connection got established) before the last Clearing_Completed
event is received, bad things can happen.
Fix to ensure only one Clearing_Completed event is generated.
Donald Sharp [Wed, 20 May 2015 01:04:12 +0000 (18:04 -0700)]
When unexpected events are received, do not silently transition to Idle
state through bgp_ignore() as that may not do required cleanup. Instead,
define a new event handler to handle such cases, which will go through
bgp_stop(). A similar change is also done to handle the case where an
event handler fails.
Also add a couple of variables to keep track of events for a peer.
Donald Sharp [Wed, 20 May 2015 01:04:11 +0000 (18:04 -0700)]
initd-reload.patch
init.d: Add reload option
Add an option to apply only modifications to running configuration from the
specified configuration file. The default modification file is
/etc/quagga/Quagga.conf. A new script, quagga-reload.py, has been added to
the tools directory.
Donald Sharp [Wed, 20 May 2015 01:04:11 +0000 (18:04 -0700)]
vtysh-add-mark-cmd.patch
VTYSH: Add support for marking a file with appropriate end of context
To support applying only differences to the existing config, this patch
enables supplying the appropriate end markers to a provided file (or
stdin). By end markers, I mean, adding "end" and "exit-address-family"
at the appropriate places in the configuration to ease finding the
differences with the running configuration.
Donald Sharp [Wed, 20 May 2015 01:04:10 +0000 (18:04 -0700)]
Zebra: Fix multiple RNH deletes
The code is structured in a way that ends up invoking zebra_delete_rnh()
multiple times which can lead to crashes and asserts. This patch fixes
the issue by setting a flag when an RNH structure is being deleted and
ignores any further attempts to delete the structure.
Donald Sharp [Wed, 20 May 2015 01:04:10 +0000 (18:04 -0700)]
Zebra: Add onlink attribute even for recursive routes
When a route is resolved recursively, and the recursively resolved nexthop
has the onlink attribute, the route is not programmed with the nexthop with
the onlink attribute. This patch addresses that.
Donald Sharp [Wed, 20 May 2015 01:04:09 +0000 (18:04 -0700)]
BGP: Fix update-groups commands to match neighbors
show update-groups summary was mislabeled. What it displays is not a summary
at all, but the detailed info about all update-groups. Furthermore, there
was no way to get detailed info about a specific subgroup.
This patch renames "show * update-groups summary" to "show * update-groups"
and adds an option to see the info specific to a subgroup only. It also
validates the subgroup-id.
show * update-groups summary will be added separately.
Donald Sharp [Wed, 20 May 2015 01:04:08 +0000 (18:04 -0700)]
Cleanup some code related to NHT.
When BGP connection setup was moved to rely on nexthop tracking, a few silly
bugs were introduced.
- bgp_connect_check() was called unnecessarily which resulted in false
positives which resulted in log messages indicating an error and the FSM
was unnecessarily reset.
- When routes to peer disappeared, and the peer was not directly connected,
the session was not immediately torn down, but only on hold timer expiry.
- When NHT indicated that route to session IP addr was available, the previous
state was not reset and as a result, connect retry timer had to expire
before a reconnection was attempted.
- connected check MUST be enabled only for EBGP non-multihop sessions and
only if disbale-connected-check option is not enabled.
Donald Sharp [Wed, 20 May 2015 01:04:07 +0000 (18:04 -0700)]
Changing router-id inline isnt handled correctly in the current implementation.
At the minimum, the OSPF_LSA_SELF logic isnt foolproof, and it may hit assert
in ospf_refresh_unregister_lsa on a router-id change.
Once OSPF has created and flooded LSAs, its not a good idea to change
router-id inline. Tying it to restart has at least two benefits:
- Implementation can remain sane by not having to re-adjust neighbors and LSAs,
based on the new router-id.
- Works as a deterrent for the user to not meddle with the router-id unless
really needed.
Donald Sharp [Wed, 20 May 2015 01:04:04 +0000 (18:04 -0700)]
Remove incorrect call to delete NHT for a route added via "network" command.
When a route is announced in BGP via "network" command, we also register its
next hop with NHT code to allow of updates when the nexthop changes. When this
route is deleted via "no network" command, we incorrectly make a second call to
unregister the NHT tracking associated with this route. This causes a crash.
Fix that.
Donald Sharp [Wed, 20 May 2015 01:04:00 +0000 (18:04 -0700)]
Ensure that during event-driven route-map processing, the peer status is
considered, if required. Attempting to do certain processing while the
peer is not Established can lead to errors.
Donald Sharp [Wed, 20 May 2015 01:03:59 +0000 (18:03 -0700)]
If on-shutdown is configured to a large value and 'service quagga restart'
is executed, then the init.d/quagga script doesnt wait more than 120 seconds
for the daemon do stop, worse, it goes ahead and starts the new daemon
regardless. This can result into two ospfd processes running on the same config.
Which leads to many issues including but not limited to high cpu usage.
Thats because the two processes are mixing packets on adjancencies thus
causing churn on the box and network.
As long as OSPF is able to reliably send the max-metric router-lsa before
exiting thats mostly good enough for this purpose anyways.
As a solution to this situation, bringing the maximum configurable value of
the on-shutdown timer below the maximum retry to stop a daemon in init.d/quagga
Notes: This may not be an upstreamable patch, still we needed to find
a solution for init.d/quagga and this command this co-exist.
Donald Sharp [Wed, 20 May 2015 01:03:55 +0000 (18:03 -0700)]
bgpd-ensure-fast-eor-send.patch
BGP: Ensure EOR is always sent immediately after all prefixes have been adv.
Its possible that EOR send is delayed until the next KeepAlive timer fires.
This can happen when the send update iteration precisely matches the last
update packet sent. After this since there are no more updates to be sent,
no write thread is setup, but there's still the EOR to be sent. Therefore,
EOR is not sent right away causing some neighbors to not exit RO mode and
delaying convergence overall. This patch ensures that EOR is sent at the end
of all updates on startup.
Signed-off-by: Vivek Venkataraman <vivek@cumulusnetworks.com> Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>