Don Slice [Wed, 10 Feb 2016 17:53:21 +0000 (09:53 -0800)]
zebra: fix interface lookup for vrf configuration
Ticket:CM-9073
Reviewed By: sharpd
Testing Done:Manual, see ticket
Changed logic when "interface swpxx <vrf foo>" entered so that:
1. it matches when the command is entered without a vrf but the interface already exists in a vrf.
2. If the command is entered with a vrf name that is different than is defined by the kernel, the command is rejected.
3. If the call is made from other than the vty session, believe the new information and update the vrf accordingly.
Donald Sharp [Wed, 10 Feb 2016 12:30:56 +0000 (07:30 -0500)]
debian: Add Systemd integration to control files
Ticket:
Reviewed By: inprogress
Testing Done: minimal. Built, installed, started a few services.
This is in progress testing. quagga.service tries to start and stop
all the routing daemons. There is no check for whether they are enabled
via /etc/quagga/daemons (yet).
As installed, nothing is enabled (won't start on system boot or install).
The -A 127.0.0.1 is now in /etc/default/quagga, and picked up from there
by all routing daemons.
MAX_FDS is in all the service files for now as LimitNOFILE. Users who
need to modified the number of fd's will use e.g.
the file /etc/systemd/system/bgpd.service.d/maxfds.conf to override
bgpd.service contents
[Service]
LimitNOFILE=2048
MAX_INSTANCES isn't implemented yet.
reload isn't implemented yet (it should be possible via ExecReload
in the services, just not done yet).
The init.d file is removed.
All of the daemons are started without the -d/--daemonize option, and
use Type=simple rather than forking in the services file, to use the
systemd daemonizing.
All the daemons were set to have a 1m start time, and restart up to 3
times in 3 minutes, and for now, are only restart on-abnormal, not always
(we'll likely want the latter, but testing is easier with abnormal).
Also use tmpfiles.d to create /run/quagga
For now, we leave dh_installinit, even though it creates unneeded
update-rc.d calls, and causes lintian complaints about init.d files
that aren't present, so that it installs files like etc/default/quagga.
It also runs the tmpfiles.d commands for us, so we need to add those to
postinst if we dummy it out to fix the update-rc.d lines being added
(and lintian complaints).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 10 Feb 2016 13:15:42 +0000 (08:15 -0500)]
*: Modify protocols to have systemd integration
Modify the daemons to integrate with systemd, if it is enabled via configure,
and to notify systemd that they are running/stopping and to send watch
notifications.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 3 Feb 2016 19:44:56 +0000 (14:44 -0500)]
debian: Remove 'Do you want to stop Quagga' Question
During the upgrade process of quagga, the user is asked
if they would like to stop quagga. There is no point in
asking this question. The fact that you are upgrading
means you are willing for a service interruption.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 3 Feb 2016 02:11:40 +0000 (21:11 -0500)]
config: Remove unused library check
configure.ac is looking for the setproctitle library,
which while it might be useful, we never call setproctitle
or any other function that the library might expose.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 1 Feb 2016 17:56:42 +0000 (09:56 -0800)]
zebra: Add the 'struct zebra_ns' data structure
This commit adds the 'struct zebra_ns' data structure.
We are not currently using it. But pretty much
everything after this commit in zebra depends on it.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Evgeny Uskov [Wed, 13 Jan 2016 10:58:00 +0000 (13:58 +0300)]
bgpd: Fix buffer overflow error in bgp_dump_routes_func
Now if the number of entries for some prefix is too large, multiple TABLE_DUMP_V2 records are created.
In the previous version in such situation bgpd crashed with SIGABRT.
Donald Sharp [Tue, 26 Jan 2016 14:57:17 +0000 (06:57 -0800)]
bgpd: Fix 'show bgp ipv4 vpnv4 statistics' cli
When attempting to use the 'show bgp ipv4 vpnv4 statistics' cli, the safi
choosen is BGP_MPLS_LABELED_VPN which is #defined to 128. The afi/safi
combination is fed to bgp->rib, which limits the size of the safi to BGP_SAFI_MAX
which is #defined to 5. The correct value to use is BGP_MPLS_VPN
The bgp code differentiates between the actual safi value for BGP_MPLS_LABELED_VPN
used defined by RFC 4364, to a internal SAFI value used to limit array size.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
vivek [Fri, 22 Jan 2016 18:56:48 +0000 (10:56 -0800)]
BGP: Rework iteration of peer_af_array
While processing references to the macro PEERAF_FOREACH(), aggressive loop
optimization by gcc 4.9.x (probably 4.8 and greater) was resulting in the
generated code not checking on the index as well as eliminating some code.
This was leading to a dereference of invalid memory when a BGP peer came up.
The fix is to scrap this convoluted macro. Two other changes done are to
eliminate overloading of "afindex" and make the loop iterator an integer.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Dave Olson <olson@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Ticket: CM-8889
Reviewed By: CCR-4018
Testing Done: Verified failure scenario
Note: This code was added as part of update-groups implementation; when
upstreaming update-groups, this patch should also be included.
Donald Sharp [Fri, 22 Jan 2016 15:46:08 +0000 (10:46 -0500)]
lib: Allow zclient do-over of connect on initial attempt
When a protocol is attempting to connect to the zebra daemon
through it's socket. If the inital attempt fails, give it a
few more attempts before giving up and leaving the daemon in
a bizarre state.
This problem was found by Ashley Penney, and Ashley was of
immense help in debugging and testing the fix for this issue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Tested-by: Ashley Penney <apenney@ntoggle.com>
Donald Sharp [Thu, 14 Jan 2016 14:19:29 +0000 (09:19 -0500)]
ospf6d: Fix double increment of Sequence Number
When OSPF6 is creating the header for the ROUTER LSA type
if the packet being sent has interface information to add
to the data, the Sequence Number is at least double incremented.
This change moves the header creation to outside the loop over
all interfaces in the area. Additionally the header is created
at the bottom of the function now.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 18 Jan 2016 17:44:52 +0000 (09:44 -0800)]
bgpd: Fix work-quanta to be a reasonable value
The work-quanta that a user can specify is ~4billion. If a user
specifies such a large value this translates into processing 4billion
outgoing packets before moving onto the next interface. This makes
no sense. Reduce the value of allowed work quanta's to be between
1 and 10000.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Daniel Walton [Thu, 14 Jan 2016 15:25:32 +0000 (15:25 +0000)]
BGP: ebgp-multihop should accept a value up to 255
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-8788
Donald Sharp [Wed, 13 Jan 2016 18:49:50 +0000 (10:49 -0800)]
doc, vtysh: Fixup of history handling
This fix does two things:
1) If the ${HOME}/.history_quagga file does not exist, create it
for history storing.
2) Allow vtysh -c "..." commands to be stored in history file
as well
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don Slice [Wed, 13 Jan 2016 14:02:46 +0000 (06:02 -0800)]
lib:removed onmatch next and onmatch goto from route-map deny
Ticket: CM-7566
Reviewed By: Daniel Walton, Donald Sharp
Testing Done: Manual testing - see bug
Since on a route-map deny clause, the route-map will end on match, the on-match next and on-match goto statements are meaningless and confusing. Removed them.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
vivek [Fri, 8 Jan 2016 06:14:38 +0000 (22:14 -0800)]
BGP: Only accept prefixes for negotiated address families
When handling a received Update message, only process and store the
prefixes if the corresponding address family has been negotiated with
the peer. Prior to this change, the receive processing only checked
whether the address family was locally configured, trusting to the peer
to not advertise prefixes for an address family that has not been
negotiated. Most implementations conform to this but a misbehavior could
result in processing and memory overhead.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Ticket: CM-5594
Reviewed By: CCR-3946
Testing Done: Sanity test (good case)
vivek [Fri, 8 Jan 2016 06:00:03 +0000 (22:00 -0800)]
BGP: Ignore unexpected values in ENHE capability
Silently ignore (without sending a Notification) unexpected values
of AFI, SAFI or Nexthop AFI received in the Extended Next Hop Encoding
capability (defined in RFC 5549). While this RFC only defines certain
values as allowed, that may be changed by a future spec.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Ticket: CM-5975
Reviewed By: CCR-3947
Testing Done: test_fuzz 1.11, 1.12 and 1.13
Donald Sharp [Wed, 16 Dec 2015 01:24:26 +0000 (17:24 -0800)]
debian: Remove unnecessary dependency on cl-utilties
The cl-utilities dependency were causing issues in two situations:
A) The cl-utilities package name has been changed but the quagga
cmaster branch was being built on two different branches, one
with the old name, one with the new name
B) People installing quagga on non-cumulus switches were experiencing
issues due to cl-utilities not being installed. This was especially
true if they built quagga from our source code. We only need
cl-utilities for the startt-stop-daemon wrapper so that we could
have jdoo watch watchquagga. This is not a big deal if people are
missing this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Daniel Walton [Fri, 11 Dec 2015 21:12:56 +0000 (21:12 +0000)]
Quagga: make check is broken with addpath changes
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-8472
The first as-path is what the original as-path for the test, the second
is what the as-path should look like once all confed SETs and SEQs have
been removed. At the time the test was written quagga did not correctly
remove confed SETs and SEQs but whoever wrote the test didn't notice
this and assumed that the behavior they were seeing was correct so used
that output to populate the second as-path.
Now that we do correctly remove the confed parts these tests fail. So
the fix is to update the second as-path for these two tests so that they
no longer contain any confed SETs/SEQs.
vivek [Wed, 9 Dec 2015 19:01:21 +0000 (11:01 -0800)]
zebra: Reorganize NHT code
NextHop Tracking (NHT) is a significant function introduced into Quagga
by Cumulus. Initially intended for tracking BGP nexthops, this has been
extended subsequently to also cater to nexthops for static routes, BGP
peer reachability tracking and BGP route tracking for routes to be
imported into BGP.
This patch reorganizes the code a bit to make it easier to follow and
maintain. No functional changes introduced.
vivek [Wed, 9 Dec 2015 00:55:43 +0000 (16:55 -0800)]
Zebra: Schedule RIB processing based on trigger event
Currently, when RIB processing is initiated (i.e., by calling rib_update()),
all routes are queued for processing. This is not desirable in all situations
because, sometimes the protocol may have an alternate path. In addition,
with NHT tracking nexthops, there are situations when NHT should be kicked
off first and that can trigger subsequent RIB processing.
This patch addresses this by introducing the notion of a trigger event. This
is only for the situation when the entire RIB is walked. The current triggers
- based on when rib_update() is invoked - are "interface change" and "route-
map change". In the former case, only the relevant routes are walked and
scheduled, in the latter case, currently all routes are scheduled for
processing.
Note: The initial defect in this area was CM-7420. This was addressed in
2.5.4 with an interim change that only walked static routes upon interface
down. The change was considered a bit risky to do for interface up etc. Also,
this did not address scenarios like CM-7662. The current fix addresses CM-7662.
vivek [Tue, 8 Dec 2015 23:04:48 +0000 (15:04 -0800)]
Zebra: Eliminate unnecessary del-add upon static route addition
When static routes are added, they get processed and potentially installed
in the RIB once. Subsequently, NHT is invoked and ends up scheduling the
route for processing again because this is the first time the nexthop is
resolved for NHT. This used to result in a del-add earlier (as noted in
the defect), but is a replace now. This change eliminates the unnecessary
replace by ensuring NHT is invoked first if the static route has a nexthop
that will be tracked by NHT.
Donald Sharp [Tue, 8 Dec 2015 17:08:46 +0000 (17:08 +0000)]
bgpd: Modify maxpaths cli's to use MULTIPATH_NUM for range
Modify the various maxpath commands to use MULTIPATH_NUM
as the upper limit of allowed max paths in BGP. There
is no point in allowing a number of maximum paths greater
than what Quagga is compiled for.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 4 Dec 2015 17:34:42 +0000 (12:34 -0500)]
zebra: Remove STATIC_XXX_IFNAME and use _IFINDEX
When we get a static route through an interface convert the interface
name to an ifindex and pass it through to zebra_rib.c. zebra_rib.c
should not care about the ifname.
This code change will allow us to collapse some of the NEXTHOP_XXX types.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Daniel Walton [Mon, 30 Nov 2015 21:19:39 +0000 (21:19 +0000)]
BGP: 'neighbor swpX interface peer-group FOO' is needed to simplify swpX
configuration
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Ticket: CM-8321
This allows the user to configure the peer-group as an option for the
"neighbor swpX interface" command.
{code}
!
router bgp 100
neighbor FOO peer-group
neighbor FOO remote-as external
neighbor swp1 interface
neighbor swp2 interface v6only
neighbor swp3 interface peer-group FOO
neighbor swp4 interface v6only peer-group FOO
!
address-family ipv4 unicast
neighbor FOO activate
neighbor swp1 activate
neighbor swp2 activate
exit-address-family
!
{code}
Note that if the user configures
{code}
neighbor swp5 interface
neighbor swp5 peer-group FOO
{code}
We will display that as "neighbor swp5 interface peer-group FOO". It
did not seem worth tracking that the peer-group was entered via two
lines instead of one.
Donald Sharp [Fri, 20 Nov 2015 19:36:46 +0000 (11:36 -0800)]
Quagga: vrf_id not being set correctly
Several routing protocols use the zapi_ipv[4|6] api to talk to
zebra. There are some instances where the api.vrf_id was not
being set. Since the practice is to declare the api structure
on the stack, the data inside is not being set to 0. As
such random vrf_id values were being passed to zebrad
causing rage and confusion.
Ticket: CM-8287 Reviewed-by: CCR-3841
Testing: Test suites no longer crashing and burning
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Daniel Walton [Fri, 20 Nov 2015 18:53:50 +0000 (18:53 +0000)]
BGP: Remove the requirement to rebind a peer to its peer-group under the address-family.
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-3868
NOTE: Many of the ibgp peers are not up in the 'show ip bgp summ' output below. This
is because ospf was disabled at the time. The main thing to look for is whether or
not all of the correct peers are listed based on their 'activate' status.
superm-redxp-05# show ip bgp summ
BGP router identifier 10.0.0.1, local AS number 10
BGP table version 4200
RIB entries 2399, using 281 KiB of memory
Peers 8, using 129 KiB of memory
Peer groups 2, using 112 bytes of memory
Example where the peer-group is inactive but a member of the peer-group is active:
==================================================================================
superm-redxp-05(config)# router bgp 10
superm-redxp-05(config-router)# address-family ipv4 unicast
superm-redxp-05(config-router-af)# neighbor 10.0.0.3 activate
superm-redxp-05(config-router-af)# no neighbor IBGP activate
superm-redxp-05(config-router-af)#
superm-redxp-05(config-router-af)# neighbor 10.0.0.4 activate
superm-redxp-05(config-router-af)# end
superm-redxp-05# show run
router bgp 10
bgp router-id 10.0.0.1
no bgp default ipv4-unicast
no bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
bgp route-map delay-timer 1
neighbor EBGP peer-group
neighbor EBGP advertisement-interval 1
neighbor IBGP peer-group
neighbor IBGP remote-as 10
neighbor IBGP update-source 10.0.0.1
neighbor IBGP advertisement-interval 1
neighbor 10.0.0.2 peer-group IBGP
neighbor 10.0.0.3 peer-group IBGP
neighbor 10.0.0.4 peer-group IBGP
neighbor 20.1.1.6 remote-as 20
neighbor 20.1.1.6 peer-group EBGP
neighbor 20.1.1.7 remote-as 20
neighbor 20.1.1.7 peer-group EBGP
neighbor 40.1.1.2 remote-as 40
neighbor 40.1.1.2 peer-group EBGP
neighbor 40.1.1.6 remote-as 40
neighbor 40.1.1.6 peer-group EBGP
neighbor 40.1.1.10 remote-as 40
neighbor 40.1.1.10 peer-group EBGP
!
address-family ipv4 unicast
neighbor EBGP activate
neighbor IBGP next-hop-self
neighbor 10.0.0.4 activate
exit-address-family
!
superm-redxp-05# show ip bgp summ
BGP router identifier 10.0.0.1, local AS number 10
BGP table version 4200
RIB entries 2399, using 281 KiB of memory
Peers 8, using 129 KiB of memory
Peer groups 2, using 112 bytes of memory
Daniel Walton [Fri, 20 Nov 2015 18:43:33 +0000 (18:43 +0000)]
BGP crash in group_announce_route_walkcb
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-8295
This is a regression from the addpath-tx commit. We loop over the adj_out list
but the adj can be freed within the body of the loop so we need to note adj->next
prior to the free.
vivek [Fri, 20 Nov 2015 18:34:50 +0000 (10:34 -0800)]
Zebra: Handle IPv6 address status during initialization
At init time, Zebra queries the kernel for all interfaces. At this time,
an IPv6 address may exist on an interface but IPv6 DAD has not completed,
or it could be that DAD has failed. Zebra should examine the flags on
the address and act accordingly. Otherwise, it may end up with addresses
and routes which are not actually valid in the kernel, and this may lead
to, for example, BGP attempting neighbor connections on an interface on
which the source IPv6 address is not yet valid.
vivek [Fri, 20 Nov 2015 16:48:32 +0000 (08:48 -0800)]
Zebra: Cleanup RIB debugs
Some of the changes include:
- ensuring IPv6 addresses are printed correctly
- say 'updating' or 'deleting' etc. only when that is actually done
- say 'queuing' or 'dequeuing' only when that is actually done
- print useful info for 'detailed' debug - that now subsumes 'rib queue'
- delete various useless logs
- VRF-specific - print VRF id in RIB debugs prior to prefix
(e.g., 4:37.1.1.0/28)
Donald Sharp [Fri, 20 Nov 2015 15:10:47 +0000 (07:10 -0800)]
Quagga: Fixup decision about what an unnumbered interface is
This Change modifies what zebra thinks is an unnumbered interface.
If the interface is not a loopback and the prefixlength for the
interface is 32 than consider this an unnumbered interface.
Ticket: CM-8016
Reviewed by: CCR-3827
Testing: Full Regression Suites
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
vivek [Thu, 19 Nov 2015 20:48:02 +0000 (12:48 -0800)]
Zebra: Fix replace route for uninstall scenario
When a Quagga route that is currently installed is superceded by a
kernel route (connected or static route to same destination), the
Quagga route is not uninstalled from the kernel. Fix by ensuring
this case is handled correctly.
vivek [Thu, 19 Nov 2015 20:22:55 +0000 (12:22 -0800)]
Zebra: Implement route replace for IPv6
Zebra currently performs a delete followed by add when a route needs to be
modified. Change this to use the replace semantics of netlink so that the
operation can possibly be atomic.
Note: This patch handles IPv6 routes, IPv4 already performs a replace.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Ticket: CM-5597
Reviewed By: CCR-3407
Testing Done: Manual testing of various scearnios (Vivek, Satish)
Note: This is an import of patch zebra-ipv6-route-replace.patch from 2.5-br.
Donald Sharp [Wed, 18 Nov 2015 23:36:04 +0000 (15:36 -0800)]
Quagga: Fixup cli and json keyword
The json keyword was being read incorrectly.
Basically some commands read a variable # of arguments
and in ospf the command values were being placed into
argc and argv. With a variable # of arguments their
existed a possibility that less arguments would be read
from the cli than were being tested for in the command function
handler. This caused core dumps in some situations.
All code to read to decide to use the json keyword has
been centralized through a function and all code
converted to use it, irrelevant if it exhibited the bug
Ticket: CM-8278
Reviewed by: CCR-3830
Testing: OSPF no longer crashes and all other test suites still run
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
vivek [Tue, 17 Nov 2015 21:57:56 +0000 (13:57 -0800)]
BGP: Handle router-id correctly in config and display.
BGP is currently displaying the in-use router-id in the config. This is
conditional on a CONFIG flag, however, that flag is set even when there
is no configured router-id and the router-id learnt from Zebra is in-use.
The CONFIG flag for router-id is redundant since there is a separate
variable for the configured value, so use that and deprecate the CONFIG
flag. This also makes BGP behave like OSPF.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Ticket: CM-8077, CM-8220
Reviewed By: CCR-3793
Testing Done: Manual verification (in 2.5-br)
Note: Imported from 2.5-br patch bgpd-fix-router-id-config-display.patch
Donald Sharp [Tue, 17 Nov 2015 16:13:23 +0000 (08:13 -0800)]
Quagga: Set MULTIPATH_NUM to 64 when user specifies 0 from cli
The code has tests to see if the MULTIPATH_NUM == 0 and to
treat it like the user has entered 'Maximum PATHS'.
This 0 is treated as 64 internally. Remove this dependency
and setup MULTIPATH_NUM to 64 when --enable-multipath=0 from
the configure cli.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Daniel Walton [Tue, 17 Nov 2015 02:09:57 +0000 (02:09 +0000)]
BGP: crash in update_subgroup_merge()
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-8191
On my hard node I have a route to 10.0.0.0/22 via eth0, I then learn
10.0.0.8/32 from peer 10.0.0.8 with a nexthop of 10.0.0.8:
superm-redxp-05# show ip bgp 10.0.0.8/32
BGP routing table entry for 10.0.0.8/32
Paths: (1 available, no best path)
Not advertised to any peer
80
10.0.0.8 from r8(swp6) (10.0.0.8)
Origin IGP, metric 0, localpref 100, valid, external
AddPath ID: RX 0, TX 9
Last update: Thu Nov 12 14:00:00 2015
superm-redxp-05#
I do a lookup for the nexthop and see that 10.0.0.8 is reachable via my
eth0 10.0.0.22 so I select a bestpath and install the route. At this
point my route to 10.0.0.8 is a /32 that resolves via itself, NHT sees
that this is illegal and flags the nexthop as inaccessible.
superm-redxp-05# show ip bgp 10.0.0.8/32
BGP routing table entry for 10.0.0.8/32
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to non peer-group peers:
r6(swp4) r7(swp5) r8(swp6) r2(10.1.2.2) r3(10.1.3.2) r4(10.1.4.2)
80
10.0.0.8 (inaccessible) from r8(swp6) (10.0.0.8)
Origin IGP, metric 0, localpref 100, invalid, external, bestpath-from-AS 80, best
AddPath ID: RX 0, TX 9
Last update: Thu Nov 12 14:00:00 2015
superm-redxp-05#
at which point we withdraw the route, things churn, we relearn it and go
through the whole process over and over again. We end up advertising and
withdrawing this route about 9 times a second!!
This exposed a crash in the update-group code where we try to merge two sub-groups
but the assert on adj_count fails because the timing worked out where one had
advertised 10.0.0.8/32 but the other had not.
NOTE: the race condition described above will be resolved via a separate patch.
Donald Sharp [Mon, 16 Nov 2015 20:48:07 +0000 (12:48 -0800)]
Zebra: Remove reliance on NEXTHOP_TYPE_IPV4_ONLINK
Zebra already knows if an interface is unnumbered or not. This
is communicated to OSPF.
OSPF would only send a NEXTHOP_TYPE_IPV4_ONLINK *if* the path
was unnumbered, which it learns from Zebra.
As such, Have OSPF use the normal NEXTHOP_TYPE_IPV4_IFINDEX
type for unnumbered paths. In Zebra, if the ifindex recieved
is unnumbered then assume that the link is NEXTHOP_FLAG_ONLINK.
Ticket: CM-8145 Reviewed-by: CCR-3771
Testing: See bug
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>