Donald Sharp [Wed, 20 May 2015 00:40:42 +0000 (17:40 -0700)]
bgpd-scale-update-delay-packing.patch
ISSUE:
During startup, BGP update prefix packing wasnt optimal and route installation
was found to be spread over.
SOLUTION:
With this patch, update-delay post processing is serialized to achieve:
a. better peer update packing
(which helps in reducing total number of BGP update packets)
b. installation of the resulting routes in zebra as close to each others
as possible.
(which can help zebra batch its processing and updates to Kernel better)
Donald Sharp [Wed, 20 May 2015 00:40:41 +0000 (17:40 -0700)]
bgpd: bgpd-ibgp-policy-out-allow-mods.patch
BGPd: Allow route-map policy modifications to also affect route reflectors.
By default, attribute modification via route-map policy out is ignored on
reflected routes. This patch provides an option to allow this modification
to occur. Once enabled, it affects all reflected routes.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:37 +0000 (17:40 -0700)]
bgpd: bgpd-mrai.patch
BGP: Event-driven route announcement taking into account min route advertisement interval
ISSUE
BGP starts the routeadv timer (peer->t_routeadv) to expire in 1 sec
when a peer is established. From then on, the timer expires
periodically based on the configured MRAI value (default: 30sec for
EBGP, 5sec for IBGP). At the expiry, the write thread is triggered
that takes the routes from peer's sync FIFO (adj-rib-out) and sends
UPDATEs. This has a few drawbacks:
(1) Delay in new route announcement: Even when the last UPDATE message
was sent a while back, the next route change will necessarily have
to wait for routeadv expiry
(2) CPU usage: The timer is always armed. If the operator chooses to
configure a lower value of MRAI (zero second is a preferred choice
in many deployments) for better convergence, it leads to high CPU
usage for BGP process, even at the times of no network churn.
PATCH
Make the route advertisement event-driven - When routes are added to
peer's sync FIFO, check if the routeadv timer needs to be adjusted (or
started). Conversely, do not arm the routeadv timer unconditionally.
The patch also addresses route announcements during read-only mode
(update-delay). During read-only mode operation, the routeadv timer
is not started. When BGP comes out of read-only mode and all the
routes are processed, the timer is started for all peers with zero
expiry, so that the UPDATEs can be sent all at once. This leads to
(near-)optimal UPDATE packing.
Finally, the patch makes the "max # packets to write to peer socket at
a time" configurable. Currently it is hard-coded to 10. The command is
at the top router-bgp mode and is called "write-quanta <number>". It
is a useful convergence parameter to tweak.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:36 +0000 (17:40 -0700)]
bgpd: bgpd-peer-outq.patch
BGP: Show more meaningful outq value in 'show ip bgp summary' output.
'outq' field in 'show ip bgp sum' displays the number of formatted packets
to a peer. Since the route announcement follows an input-buffered pattern
(i.e. adj-rib-out is a separate queue of routes per peer and packets are
formatted from the routes at the time of TCP write), the outq field doesn't
show any interesting data worth watching.
The patch is to display the adj-rib-out queue depth instead.
Donald Sharp [Wed, 20 May 2015 00:40:36 +0000 (17:40 -0700)]
bgpd: bgpd-fix-ipv6-afi-parser-node.patch
BGPd: Make ipv6 unicast/multicast address-family work
In the absence of this patch, attempting to type "address-family ipv6 unicast"
would result in an "Ambiguous command" error and in the case of
"address-family ipv6 multicast", the command would silently fail, without the
prompt dropping into the address-family mode.
The cause is how the parse tree is constructed for ipv6 address family. There
was an error in extract.pl.in script and in vtysh.c files which assumed that
there was only address family ipv6 command, without unicast or multicast and
so the command was failing.
Donald Sharp [Wed, 20 May 2015 00:40:35 +0000 (17:40 -0700)]
zebra: zebra-use-fixed-metric-cost.patch
Zebra: Use a fixed route metric when populating kernel
The route metric is not used by the Linux kernel and is irrelevant to
the forwarding decision made by the kernel. Metric is a parameter used
only by a routing protocol to compute best path(s) and to communicate this
info to its peers. Consequently, there is no value in pushing the metric
provided by a protocol daemon to the kernel.
There is a significant advantage, at least on the Linux kernel, in pushing
a constant metric with a route populated by zebra. The metric is used as a
priority field in the kernel and modifying the metric due to say topology
changes causes multiple routes to be inserted into the kernel, with differing
priorities instead of replacing the existing one. This prevents us from
using replace semantic when a route changes.
So, this patch pushes a constant metric with a route populated by zebra.
Donald Sharp [Wed, 20 May 2015 00:40:34 +0000 (17:40 -0700)]
bgpd: bgpd-table-map.patch
COMMAND:
table-map <route-map-name>
DESCRIPTION:
This feature is used to apply a route-map on route updates from BGP to Zebra.
All the applicable match operations are allowed, such as match on prefix,
next-hop, communities, etc. Set operations for this attach-point are limited
to metric and next-hop only. Any operation of this feature does not affect
BGPs internal RIB.
Supported for ipv4 and ipv6 address families. It works on multi-paths as well,
however, metric setting is based on the best-path only.
IMPLEMENTATION NOTES:
The route-map application at this point is not supposed to modify any of BGP
route's attributes (anything in bgp_info for that matter). To achieve that,
creating a copy of the bgp_attr was inevitable. Implementation tries to keep
the memory footprint low, code comments do point out the rationale behind a
few choices made.
bgp_zebra_announce() was already a big routine, adding this feature would
extend it further. Patch has created a few smaller routines/macros whereever
possible to keep the size of the routine in check without compromising on the
readability of the code/flow inside this routine.
For updating a partially filtered route (with its nexthops), BGP to Zebra
replacement semantic of the next-hops serves the purpose well. However, with
this patch there could be some redundant withdraws each time BGP announces a
route thats (all the nexthops) gets denied by the route-map application.
Handling of this case could be optimized by keeping state with the prefix and
the nexthops in BGP. The patch doesn't optimizing that case, as even with the
redundant withdraws the total number of updates to zebra are still be capped
by the total number of routes in the table.
Donald Sharp [Wed, 20 May 2015 00:40:33 +0000 (17:40 -0700)]
bgpd: bgpd-update-delay.patch
COMMAND:
'update-delay <max-delay in seconds> [<establish-wait in seconds>]'
DESCRIPTION:
This feature is used to enable read-only mode on BGP process restart or when
BGP process is cleared using 'clear ip bgp *'. When applicable, read-only mode
would begin as soon as the first peer reaches Established state and a timer
for <max-delay> seconds is started.
During this mode BGP doesn't run any best-path or generate any updates to its
peers. This mode continues until:
1. All the configured peers, except the shutdown peers, have sent explicit EOR
(End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached
Established is considered an implicit-EOR.
If the <establish-wait> optional value is given, then BGP will wait for
peers to reach establish from the begining of the update-delay till the
establish-wait period is over, i.e. the minimum set of established peers for
which EOR is expected would be peers established during the establish-wait
window, not necessarily all the configured neighbors.
2. max-delay period is over.
On hitting any of the above two conditions, BGP resumes the decision process
and generates updates to its peers.
Default <max-delay> is 0, i.e. the feature is off by default.
This feature can be useful in reducing CPU/network used as BGP restarts/clears.
Particularly useful in the topologies where BGP learns a prefix from many peers.
Intermediate bestpaths are possible for the same prefix as peers get established
and start receiving updates at different times. This feature should offer a
value-add if the network has a high number of such prefixes.
IMPLEMENTATION OBJECTIVES:
Given this is an optional feature, minimized the code-churn. Used existing
constructs wherever possible (existing queue-plug/unplug were used to achieve
delay and resume of best-paths/update-generation). As a result, no new
data-structure(s) had to be defined and allocated. When the feature is disabled,
the new node is not exercised for the most part.
Donald Sharp [Wed, 20 May 2015 00:40:32 +0000 (17:40 -0700)]
bgpd: bgpd-restart-bit-fix.patch
ISSUE:
Quagga BGP doesn't send or use the restart-bit via the Graceful-Restart(GR)
capability. GR capability implementation isn't complete as per the RFC.
PATCH:
Patch uses BGP instance creation as the beginning of the startup period,
and 'restart_time' is taken as the startup period. As a result, BGP will
set the restart bit in the GR capability of the OPEN messages during the
startup period.
As an indication of quagga implementation's capability of sending End-Of-RIB,
helping a restarting neighbor, quagga BGP will now send global GR capability
irrespective of the graceful-restart config in BGP and the address-family
specific GR capability will be sent only if the GR config is present.
Forwarding bit is not set assuming its not preserved.
Donald Sharp [Wed, 20 May 2015 00:40:32 +0000 (17:40 -0700)]
ospfd: ospfv2-fix-interface-mode-cmd.patch
SYMPTOM:
Interface mode OSPF area configuration is not retained after restarting quagga.
Example -
quagga(config)# interface swp49
quagga(config-if)# ip ospf area 0.0.0.0
quagga# sh run
<snip>
interface swp49
ip ospf area 0.0.0.0
ipv6 nd suppress-ra
link-detect
!
quagga# write memory
* Restart quagga at this point*
quagga# sh run
<snip>
interface swp49
ipv6 nd suppress-ra
link-detect
!
ISSUE:
The issue is that the interface mode commands can reach the OSPF process even
before 'router ospf' command that initializes the default OSPF instance, this
is not getting handled properly in OSPF process.
FIX:
Initialize the default OSPF instance during OSPF process initializations, which
is before 'router ospf' command is received in OSPF process. So, when interface
mode command is received, it is guaranteed to have ospf instance to work with.
Other way could be to call ospf_get() instead of ospf_lookup() while processing
the config command callbacks, although OSPF needs to have at least one instance
structure anyways, therefore calling it unconditionally in OSPF initializations
should be fine too.
There could be more elaborate fix(es) possible to handle this, like adding some
ordering mechanism for commands as they are read by a process, or storing the
received command and applying it after the commands its dependent upon are
processed. For the issue at hand, initializing the default instance in main()
serves the purpose well.
Donald Sharp [Wed, 20 May 2015 00:40:31 +0000 (17:40 -0700)]
cluster-id length equality for multipath
A fat tree topology running IBGP gets into two issues with anycast address
routing. Consider the following topology:
R9 R10
x x
R3 R4 R7 R8
x x
R1 R2 R5 R6
| | | |
10/8 10/8 10/8 S
Let's remind ourselves of BGP decision process steps:
1. Highest Local Preference
2. Shortest AS Path Length
3. Lowest Origin Type
4. Lowest MED (Multi-Exit Discriminator)
5. Prefer External to Internal
6. Closest Egress (Lowest IGP Distance)
7. Tie Breaking (Lowest-Router-ID)
8. Tie Breaking (Lowest-cluster-list length)
9. Tie Breaking (Lowest-neighbor-address)
Without any policies, steps 1-6 will almost always evaluate identically for
all paths received on any router in the above topology. Let's assume that
the router-ids follow the following inequality: R1 < R2 < R5 < R6. Owing to
the 7th step above, all routers will now choose R1's path as the best. This
is undesirable. As an example, traffic from S to 10/8 will follow the path
S -> R6 -> R7 -> R9 -> R4 -> R2 -> 10/8 instead of S -> R6 -> R7 -> R5 -> 10/8.
Furthermore, once R7 (& R8) chooses R1's path as the best, it would withdraw
its path learned through (R5, R6) from (R9, R10). This leads to inefficient
load balancing - e.g. R9 can't do ECMP across all available egresses -
(R1, R2, R5).
The patch addresses these issues by noting that that cluster list is always
carried along with the routes and its length is a good indicator of IBGP
hops. It thus makes sense to compare that as an extension to metric after
step 6. That automatically ensures correct multipath computation.
Unfortunately a partial deployment of this in a generic topology (note:
fat-tree/clos topologies work fine) may lead to potential loops. It needs
to be looked into.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:45 +0000 (17:24 -0700)]
Add set ipv6 next-hop peer-address command.
IPv4 has the ability to specify the peer address with the keyword peer-address.
IPv6 mandates the use of a specific global or local address only in setting the
next-hop in routemaps. This makes it cumbersome to configure some large networks
with BGP and IPv6. This patch fixes that deficiency.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:45 +0000 (17:24 -0700)]
IPv6 multipath is broken in BGP if nexthop contains only global address.
IPv6 always uses both nextop IPv6 address and ifIndex in sending routes down to
zebra. In cases where only the global IPv6 address is present in the nexthop
information, the existing code doesn't set the ifIndex. An example of such a
case is when a route-map isused with "set ipv6 next-hop" and only global
address is specified. This code causes the ifIndex to be determined and
set thereby fixing the multipath programming.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:44 +0000 (17:24 -0700)]
When an LSA is flushed we need to update the timestamps for them. This
allows for the node to give the neighbor sufficient time to send back
an acknowledgement before retransmission kicks in.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com> Reviewed-by: James Li <jli@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:44 +0000 (17:24 -0700)]
Section 16.0 of rfc2328 (OSPF) specifies that the short-path
calculation to a node should be constructed with the sum of all path
costs (metrics) to the node (pretty simple huh). There is a usage of
metric typified by the "max-metric router-lsa" command in many
networking stacks that allows a router to gracefully "remove" itself
from a topology by advertising the maximum value of metric in it's
router LSAs (16 bits of "1"). In this case, the router will continue
to forward any traffic sent to it while these "max-metric" LSAs are
propagated through the network; at which point, the router can be
taken out of service.
The correct handling of this in ospfd would use this metric as part of
the calculation, disuading other routers from using it for transit
traffic (assuming a better path exits). Unfortunately, the ospfd
behavior is to remove these links from the SPF calculation. This
patch changes the behavior to omit this exception handling.
Signed-off-by: JR Rivers <jrrivers@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:43 +0000 (17:24 -0700)]
enable autoreconf so that Makefile.in is regenerated in the cumulus build.
This is necessary for the added .c files and modified Makefile.am files
in our patches.
Donald Sharp [Wed, 20 May 2015 00:24:43 +0000 (17:24 -0700)]
This patch enables support for multipath for IPV6. The nexthop information
from the protocols have ifindices and nexthop addresses in two different
structures. This patch combines them to ensure that the correct APIs can
be called. Also, given that IPV6 Linux implementation does not support the
rta_XXX APIs for multipath, the communication with the kernel is in terms
of a single nh/ifindex pair.
Donald Sharp [Tue, 19 May 2015 23:36:05 +0000 (16:36 -0700)]
ospfd-spf-stats.patch
Compute and display SPF execution statistics
Detailed SPF statistics, all around time spent executing various pieces of SPF
such as the SPF algorithm itself, installing routes, pruning unreachable networks
etc.
Reason codes for firing up SPF are:
R - Router LSA, N - Network LSA, S - Summary LSA, ABR - ABR status change,
ASBR - ASBR Status Change, AS - ASBR Summary, M - MaxAge
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: JR Rivers <jrrivers@cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com> Reviewed-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Donald Sharp [Tue, 19 May 2015 23:33:52 +0000 (16:33 -0700)]
zebra-enable-link-detect-by-default.patch
zebra: Set link-detect on by default
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt at cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma at cumulusnetworks.com>
Donald Sharp [Tue, 19 May 2015 23:31:06 +0000 (16:31 -0700)]
conditional-quagga-pdf.patch
The building of quagga.pdf requires the convert program out of the imagemagick
package. Getting this to run correctly in the scratchbox2 environment is
painful. Conditionally generate documentation during native compilation.
David Lamparter [Tue, 1 Jul 2014 14:14:05 +0000 (16:14 +0200)]
lib: unset ZEBRA_IFA_PEER if no dst addr present (BZ#801)
On OpenBSD, carp interfaces claim to be PtP interfaces with a 0.0.0.0/0
peer address. We process those in zebra and try to send them to
clients, at which point they get encoded as all-0. The client code,
however, decodes that to a NULL pointer instead of 0.0.0.0. This later
turns into a SEGV when CONNECTED_PREFIX sees that ZEBRA_IFA_PEER is set
and tries to access the peer prefix.
This is a band-aid fix for stable/0.99.23, a long-term solution needs
some conceptual improvements on the entire thing.
(The usefulness of a PtP-to-0.0.0.0/0 is a separate question; at this
point dropping the peer prefix seems the least intrusive solution.)
Reported-by: Laurent Lavaud <laurent.lavaud@ladtech.fr> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Mon, 18 Aug 2014 16:05:25 +0000 (18:05 +0200)]
isisd: type mix-up in 28a8cfc "don't require IPv4"
Whoops, these are in6_addrs, not prefix_ipv6... funnily enough, it does the
right thing either way, if it compiles, which it only does on Linux because
IN6_IS_ADDR_LINKLOCAL contains a cast to the right type. On BSD there is no
such cast, hence it explodes on trying to compile, trying to access struct
members of in6_addrs while operating on prefix_ipv6...
Fixes: 28a8cfc ("isisd: don't require IPv4 for adjacency") Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
John Glotzer [Mon, 4 Aug 2014 19:39:23 +0000 (19:39 +0000)]
bgpd: memmove needed in community_del_val
In bgpd/bgp_community_del_val memcpy is used for potentially overlapping
regions which is *not* safe. It may "work" in some cases but is not
guaranteed to work in all cases. The case that I saw fail was on an
x86_64 architecture with the number of bytes being moved/copied equal to
8.
The way the code is written the uint32_t pointers will always differ by
1, which is equivalent to a memcpy/memmove of regions that are 4 bytes
away from one another. So the code failed while copying an 8 byte region
to an address that is 4 bytes lower i.e. overlapping regions.
Interestingly, the same architecture had no problems with a 12 byte
copy.
When the code failed the communities were [200,300,400] and a call was
made to delete the 200 community. The result of this was an array that
looked like [400,400] which was uniquified to [400]. Of course the
expected result should have been [300, 400].
One additional point - in our production environment memmove would not
*link* without including <string.h> but in an isolated quagga git repo
this #include does not seem to be required and I see memmove is used in
vtysh.c without this #include either.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Timo Teräs [Tue, 29 Jul 2014 09:41:55 +0000 (09:41 +0000)]
zebra: fix struct msghdr initializers
struct msghdr field orders are not strictly specified in POSIX.
Improve portability by using designated initializer. This fixes
build against musl c-library where struct msghdr is POSIX
compliant (Linux kernel and glibc definitions are non-conforming).
As the result is also more readable, struct iovec initilizers
were also converted.
Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Timo Teräs [Tue, 29 Jul 2014 09:41:54 +0000 (09:41 +0000)]
*: fix detection and usage of sys/cdefs.h
This header is non-standard (though present on many systems) and
there is no standard for what it should or should not define.
Remove it where it is not really needed. But add also a configure
check, so it can be used if available but otherwise fallback to
defining the needed macroes.
Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Sun, 6 Jul 2014 20:33:48 +0000 (22:33 +0200)]
tests/bgpd: don't hardcode error number (fix f57000c)
f57000c ("bgpd: don't send NOTIFY twice for malformed attrs") introduces
BGP_ATTR_PARSE_ERROR_NOTIFYPLS as additional error code that implies the
caller should sent a NOTIFY and convert it to BGP_ATTR_PARSE_ERROR.
Sadly, the latter was hardcoded in bgp_mp_attr_test.c, which now didn't
consider the new value to be an error.
Make the testcase treat all nonzero values as error without discern.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Lu Feng [Wed, 25 Jun 2014 07:43:15 +0000 (07:43 +0000)]
ripd: use only one constant for derivation
RIP_MAX_RTE is defined in ripd.h as 25 but is in fact the
result of a formula. More over it is not used in the code:
the code itself includes the fomula. This makes it un-clear
for maintenance.
Signed-off-by: Feng Lu <lu.feng@6wind.com> Reviewed-by: Alain Ritoux <alain.ritoux@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Tue, 3 Jun 2014 23:00:51 +0000 (01:00 +0200)]
bgpd: fix memory leak on malformed attribute
When bgp_attr_parse returns BGP_ATTR_PARSE_ERROR, it may already have
parsed and allocated some attributes before hitting that error. Free
the attr's data before returning.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Tue, 3 Jun 2014 22:59:01 +0000 (00:59 +0200)]
bgpd: fix double free after extcommunity set (BZ#799)
The route-map extcommunity set code was incorrectly assuming that it
owns the intern'd struct ecommunity reference. In reality, the intern'd
reference belongs to bgp_update_receive() and we're not supposed to
touch it in the route-map code.
Instead, like all the other set commands, we use a on-heap but
non-intern'd ecommunity to set the new value. This is then either
intern'd in bgp_update_main/_rsclient() through bgp_attr_intern(), or
free'd through bgp_attr_flush().
This fixes Bugzilla #799, which is that bgpd otherwise crashes with a
double free. The ecommunity got unintern'd first in the route-map set
command, then in bgp_update_receive().
Debugged-by: Milan Kocian <milon@wq.cz> Reported-by: Florian S <florian@herrenlohe.de> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Tue, 3 Jun 2014 22:54:58 +0000 (00:54 +0200)]
bgpd: fix some bgp_update_main() attribute leaks
bgp_update_main() wasn't doing anything to release attribute values
set from route maps for two of its error paths. To fix, pull up the
appropriate cleanup from further down and apply it here.
bgp_update_rsclient() doesn't have the issue since it immediately
does bgp_attr_intern() on the results from bgp_{export,import}_modifier.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Wed, 4 Jun 2014 04:53:35 +0000 (06:53 +0200)]
*: nuke ^L (page feed)
Quagga sources have inherited a slew of Page Feed (^L, \xC) characters
from ancient history. Among other things, these break patchwork's
XML-RPC API because \xC is not a valid character in XML documents.
Nuke them from high orbit.
Patches can be adapted simply by:
sed -e 's%^L%%' -i filename.patch
(you can type page feeds in some environments with Ctrl-V Ctrl-L)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Tue, 3 Jun 2014 16:42:25 +0000 (18:42 +0200)]
build: Quagga 0.99.23-rc1
this is not a full release version, so neither release notes nor
documentation are updated yet. Also, signing the tag with my private
GPG key instead of the Quagga one.
Lu Feng [Wed, 19 Feb 2014 09:05:05 +0000 (09:05 +0000)]
isisd: ignore the unrecognized TLVs
When processing LSPDUs, the unrecognized TLVs/sub-TLVs should be
silently ignored.
In parse_tlvs(), ISIS_WARNING is returned once an unrecognized TLV
exists. It breaks the processing in lsp_authentication_check() and
lsp_update_data(). So remove it.
Signed-off-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Timo Teräs [Thu, 24 Apr 2014 06:40:33 +0000 (06:40 +0000)]
lib: remove redundant and incorrect sys/fcntl.h include
POSIX defines <fcntl.h>, <sys/fcntl.h> is the same thing. However,
it should not be used as it's existence can depend on C-library
implementation. E.g. musl gives warning if <sys/fcntl.h> is used.
Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgpd: fix crash when allowas-in is done on inactive peer
When allowas-in is changed on a peer that is not up, BGP would crash
trying to do route_refresh. If peer is not up, there is no need
to do notification or send.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgpd: efficient NLRI packing for AFs != ipv4-unicast
ISSUE:
Currently, for non-ipv4-unicast address families where prefixes are
encoded in MP_REACH/MP_UNREACH attributes, BGP ends up sending one
prefix per UPDATE message. This is quite inefficient. The patch
addresses the issue.
PATCH:
We introduce a scratch buffer in the peer structure that stores the
MP_REACH/MP_UNREACH attributes for non-ipv4-unicast families. This
enables us to encode multiple prefixes. In the end, the two buffers
are merged to create the UPDATE packet.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
[DL: removed no longer existing bgp_packet_withdraw prototype] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
While announcing a path to a peer, the code currently compares the path's
next-hop with the peer's router-id. This can lead to problems as the router
IDs are unique only within an AS. Suppose AS 1 sends route with next-hop
10.1.1.1. It is possible that the speaker has an established BGP peering
with a router in AS 2 with router ID 10.1.1.1. The route will not be
advertised to that peer in AS 2.
The patch removes this check.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
readline 6.3 removes some old deprecated funnily-named types. This
updates vtysh to use the new types so it builds again.
Reported-by: Joel Teichroeb <klusark@archlinux.invalid>
References: https://bugs.archlinux.org/task/39495 Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Michal Sekletar [Fri, 16 May 2014 14:13:43 +0000 (14:13 +0000)]
zebra: raise the privileges before calling socket()
Because of recent changes when creating AF_NETLINK socket, kernel will
cache capabilities of the caller and if file descriptor is used or
otherwise handed to another process it will check that current user has
necessary capabilities to use the socket. Hence we need to ensure we
have necessary capabilities when creating the socket and at the time we
use the socket.
Milan Kocian [Fri, 18 Oct 2013 07:59:38 +0000 (07:59 +0000)]
bgpd: Fix condition allowas-in in rsclient code
Currently when you set neighbour's 'allowas-in' option on route server side
you get redistribution of the prefixes from this neighbour's table into all
neighbour's tables which have the same AS number. I think that wanted behaviour
is to allow import prefixes from neighbour's tables with the same AS num
into neighbour which has 'allowas-in' option set.
Signed-off-by: Milan Kocian <milon@wq.cz> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Traditionally, ttl-security feature has been associated with EBGP
sessions as those identify directly connected external peers. The
GTSM RFC (rfc 5082) does not make any restrictions on type of
peering. In fact, it is beneficial to support ttl-security for both
EBGP and IBGP sessions. Specifically, in data centers, there are
directly connected IBGP peerings that will benefit from the protection
ttl-security provides.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
[DL: function refactoring split out into previous 2 patches. changes:
- bgp_set_socket_ttl(): ret type int -> void
- is_ebgp_multihop_configured(): stripped peer == NULL check
- comments/whitespace] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Mon, 19 May 2014 21:15:02 +0000 (23:15 +0200)]
bgpd: factor out eBGP multihop check
The check for an eBGP multihop configuration is unwieldy; factor it out
into a separate function.
[DL: originally by Dinesh G Dutt <ddutt@cumulusnetworks.com>,
split off from the next commit] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Mon, 19 May 2014 20:52:04 +0000 (22:52 +0200)]
bgpd: factor out TTL setting
TTL/min TTL are set from both bgp_accept() and bgp_connect(). Factor
them out so the following change to enable iBGP GTSM becomes more
readable.
[DL: originally by Dinesh G Dutt <ddutt@cumulusnetworks.com>,
split off from the next commit] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
1. When an interface goes down, the zclient callbacks are invoked
in the following order: (a) address_delete() that removes the
connected address list: ifp->connected, (b) interface_down()
that performs "fast external fallover" operation. The operation
relies on ifp->connected to look for peers that should be brought
down. That's a cyclic dependency.
2. 'ttl-security' configuration handler sets peer->ttl to
MAXTTL (so that BGP packets are sent with TTL=255, as per the
requirement of ttl-security). This, however, is incompatible
with 'fast external fallover' as the fallover operation checks
for (ttl == 1) to determine directly connected peers.
3. The current fallover operation does not work for IPv6 address family.
PATCH
1. The patch removes the dependency on 'ifp->connected' list for fast
fallover. The peer already contains a nexthop structure that reflects
the peering address. The nexthop structure has a pointer to the
interface (ifp) that peering address resolves to. Everytime the TCP
connection succeeds, the ifp is updated. The patch uses this ifp in
the interface_down() callback for a match for the peers that should be
brought down.
2. The evaluation for directly connected peering is enhanced as
'peer->ttl == 1' OR 'peer->gtsm_hops == 1'. Thus a ttl-security
configuration on the peer with one hop is directly connected and
should be brought down under 'fast external fallover'.
3. Because of fix (1), IPv6 address family works automatically.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Vipin Kumar [Thu, 9 Jan 2014 00:31:22 +0000 (00:31 +0000)]
bgpd: increase TCP socket buffer size
BGP does not respond fairly in high scale. As the number of BGP peers
and prefixes increase, triggers like interface flaps which lead to BGP
peer flaps, cause blockage in bgp_write.
BGP does handle the cases of TCP socket buffer full by queuing a write
event back, there is no functional issue there as such. Still,
increasing the peer socket buffer size should help reduce event queueing
in BGP.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
[DL: patch split, this is item 3.] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Vipin Kumar [Thu, 9 Jan 2014 00:31:22 +0000 (00:31 +0000)]
bgpd: fix O_NONBLOCK on outgoing connects
BGP was setting sockets to be non-blocking only for the accepted passive
peers. As a fix, setting the BGP sockets to be non-blocking even for
the active peers.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
[DL: patch split, this is item 1.] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgpd: send notify in OpenSent when stopping manually
The issue it fixes is that the notification message is not sent to a
second peer when bgp is stopped manually.
According to BGP RFC4271, section 8.2.2, regarding the FSM transitions,
in OpenSent state:
If a ManualStop event (Event 2) is issued in the OpenSent state, the
local system:
* sends the NOTIFICATION with a Cease,
* sets the ConnectRetryTimer to zero,
* releases all BGP resources,
* drops the TCP connection,
* sets the ConnectRetryCounter to zero, and
* changes its state to Idle.
I've added a check for OpenSent state when the notification is sent from
the functions which are called from the CLI commands which
directly/indirectly stop/restart BGP.
Acked-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The output of "show ip bg" does not show whether and which routes are
installed as multipath routes along the best route:
BGP table version is 0, local router ID is 10.10.100.209
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i1.0.0.0/24 10.10.100.1 1 111 0 15169 i
* i 10.10.100.2 1 111 0 15169 i
* i 10.10.100.3 1 111 0 65100 15169 i
This patch adds a new status code that is showing exactly which routes
are used as multipath:
BGP table version is 0, local router ID is 10.10.100.209
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i1.0.0.0/24 10.10.100.1 1 111 0 15169 i
*=i 10.10.100.2 1 111 0 15169 i
* i 10.10.100.3 1 111 0 65100 15169 i
The inconsistency in the status code legend ("i - internal" vs. "i internal")
inherent from old IOS was fixed. It had to be touched anyways.
Signed-off-by: Boian Bonev <bbonev at ipacct.com>
[DL: rewrap long line, clean whitespace in same chunk] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
bgpd: track correct originator-id in reflected routes
ISSUE:
Suppose route1 and route2 received from route-reflector-client1 and client2
respectively have identical attributes. The current logic of creating the
adj-rib-out for a peer threads the 'adv' structures for both routes against
the same attribute. This results in 'bgp_update_packet()' to pack those
routes in the same UPDATE message with one attr structure formatted. The
originator-id is thus set according to the first route's received router id.
This is incorrect.
PATCH:
Fix bgp_announce_check() function to set the originator-id in the
advertising attr structure. Also, fix the attribute hash function and
compare function to consider originator-id. Otherwise attributes where all
fields except the originator-id are identical get merged into one memory
location.
Signed-off-by: Pradosh Mohapatra <pmohapat at cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma at cumulusnetworks.com> Reviewed-by: Ken Yin <kyin at cumulusnetworks.com>
[DL: whitespace changes dropped] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Compute multipath in BGP based on AS_PATH hop count match. If the knob
is turned on, it is not required to have an exact match of AS_PATHs
(provided other multipath conditions are met, of course).
Signed-off-by: Pradosh Mohapatra <pmohapat at cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt at cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Christian Franke [Mon, 28 Apr 2014 11:42:20 +0000 (11:42 +0000)]
ospfd: add debug messages for router lsa-generation
Add log messages to lsa_link_broadcast_set so it becomes more
apparent why a particular broadcast interface was added as
transit or stub interface.
Signed-off-by: Christian Franke <chris@opensourcerouting.org> Acked-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
ospfd: For an ABR, ensure the right LSID is MaxAge'd
PROBLEM:
Accurate garbage collection of maxage LSAs. The global OSPF structure has
a maxage_lsa tree - the key to the tree is <ls-id, adv-router> tuple. Suppose
the ABR has multiple areas and has originated some intra-area LSAs. The
key for all those LSAs is the same. The code then ends up in a state where
all but the first LSA do not get cleaned up from the areas' LSDB. A subsequent
event would readvertise those LSAs.
PATCH:
Since the LSA is going to stick around till it actually gets cleaned up by
the maxage_walker, make the LSA pointer as the key. Each distinct LSA that
gets maxage'd then gets added to the tree and will get cleaned up correctly.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
[CF: Use CHAR_BIT; use uintptr_t; use sizeof(field) instead of sizeof(type)] Signed-off-by: Christian Franke <chris@opensourcerouting.org>
[DL: this must remain a temporary fix! needs to be redone after 0.99.23] Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Christian Franke [Mon, 28 Apr 2014 08:04:59 +0000 (08:04 +0000)]
ospfd: clarify indentation and comments in ospf_lsa_maxage_delete
Signed-off-by: Christian Franke <chris@opensourcerouting.org> Acked-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Christian Franke [Mon, 28 Apr 2014 08:04:58 +0000 (08:04 +0000)]
ospfd: fix a reference counting issue introduced by commit 4de8bf0011
Commit 4de8bf0011 added a return statement to a loop iterating over a
route_table. That loop uses route_top/route_next.
As commit 4de8bf0011 failed to add a route_node_unlock before the
return statement, a reference is leaked when this codepath is taken.
Signed-off-by: Christian Franke <chris@opensourcerouting.org> Acked-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Lu Feng [Fri, 21 Feb 2014 08:11:15 +0000 (08:11 +0000)]
ospfd: check the LS-Ack's recentness instead of only comparing the #seq
ISSUE:
RTA(DR)-----(BackupDR)RTB
RTA advertises a new LSA to RTB, and then flushes the LSA (with setting
the age of the LSA to MaxAge) within 1 second. Then the LSA is deleted
from RTA, while it still exists on RTB with non-MaxAge and can not be
flushed any more.
FIX:
The reason can be explained in below:
a) RTA -- new LSA, #seq=1 --> RTB (RTB will send the delayed Ack in 1s)
b) RTA -- MaxAge LSA, #seq=1 --> RTB (RTB discards it for the MIN_LS_ARRIVAL)
c) RTA <-- Ack for the new LSA, #seq=1 -- RTB (RTA accepts it)
In the step c), ospf_ls_ack() compares the #seq of the entry in the LS-Ack
with that of local MaxAge LSA. The #seq of the two entries are same. So
the Ack is accepted and the LSA is removed from the retransmit-list (while
it should not).
In RFC2328, section 13.7. Receiving link state acknowledgments:
o If the acknowledgment is for the same instance that is <==
contained on the list, remove the item from the list and
examine the next acknowledgment. Otherwise:
where "same instance" does not mean the same #seq. We must call
ospf_lsa_more_recent() to check whether the two instances are same.
Signed-off-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: Christian Franke <chris@opensourcerouting.org> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Christian Franke [Thu, 11 Jul 2013 07:56:29 +0000 (07:56 +0000)]
ospfd: don't allow to set network type on loopback interfaces
OSPFd only allocates some stub information for loopback interfaces.
This causes a crash when the interface state machine is started on
that interface by configuring a different network type.
It doesn't make much sense to configure the network type of a loopback
interface, therefore, just forbid it.
See also bugzilla #670.
Signed-off-by: Christian Franke <chris@opensourcerouting.org> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Christian Franke [Wed, 10 Jul 2013 11:56:18 +0000 (11:56 +0000)]
ospfd: run DR election prior to LSA regeneration
The results from DR election are used when constructing router-LSAs.
E.g. they are used to determine whether a broadcast interface should
be added with a link type of stub interface or transit interface.
Therefore, we should run DR election prior before regenerating LSAs.
Signed-off-by: Christian Franke <chris@opensourcerouting.org> Acked-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Ken Williams [Tue, 15 Apr 2014 02:23:11 +0000 (02:23 +0000)]
zebra: Change the mechanism for comparing route ID's.
The current format uses subtraction of two ints. Unfortunately, the
subtraction method does not work for all combinations of numbers.
For example, the with numbers represented by 10.x.x.x and 192.x.x.x,
10.x.x.x - 192.x.x.x will yield a very large positive number indicating
that 10.x.x.x is larger.
Signed-off-by: Ken Williams <kenneth.j.williams@intel.com> Acked-by: Feng Lu <lu.feng@6wind.com> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>