- etc/init.d/quagga is modified to support creating separate ospf daemon
process for each instance. Each individual instance is monitored by
watchquagga just like any protocol daemons.(requires initd-mi.patch).
- Vtysh is modified to able to connect to multiple daemons of the same
protocol (supported for OSPF only for now).
- ospfd is modified to remember the Instance-ID that its invoked with. For
the entire life of the process it caters to any command request that
matches that instance-ID (unless its a non instance specific command).
Routes/messages to zebra are tagged with instance-ID.
- zebra route/redistribute mechanisms are modified to work with
[protocol type + instance-id]
- bgpd now has ability to have multiple instance specific redistribution
for a protocol (OSPF only supported/tested for now).
- zlog ability to display instance-id besides the protocol/daemon name.
- Changes in other daemons are to because of the needed integration with
some of the modified APIs/routines. (Didn’t prefer replicating too many
separate instance specific APIs.)
- config/show/debug commands are modified to take instance-id argument
as appropriate.
Guidelines to start using multi-instance ospf
---------------------------------------------
The patch is backward compatible, i.e for any previous way of single ospf
deamon(router ospf <cr>) will continue to work as is, including all the
show commands etc.
To enable multiple instances, do the following:
1. service quagga stop
2. Modify /etc/quagga/daemons to add instance-ids of each desired
instance in the following format:
ospfd=“yes"
ospfd_instances="1,2,3"
assuming you want to enable 3 instances with those instance ids.
3. Create corresponding ospfd config files as ospfd-1.conf, ospfd-2.conf
and ospfd-3.conf.
4. service quagga start/restart
5. Verify that the deamons are started as expected. You should see
ospfd started with -n <instance-id> option.
ps –ef | grep quagga
With that /var/run/quagga/ should have ospfd-<instance-id>.pid and
ospfd-<instance-id>/vty to each instance.
6. vtysh to work with instances as you would with any other deamons.
7. Overall most quagga semantics are the same working with the instance
deamon, like it is for any other daemon.
NOTE:
To safeguard against errors leading to too many processes getting invoked,
a hard limit on number of instance-ids is in place, currently its 5.
Allowed instance-id range is <1-65535>
Once daemons are up, show running from vtysh should show the instance-id
of each daemon as 'router ospf <instance-id>’ (without needing explicit
configuration)
Instance-id can not be changed via vtysh, other router ospf configuration
is allowed as before.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:41 +0000 (18:03 -0700)]
initd: initd-mi.patch
Support Multi-Instance protocol daemons initd
OSPFd is the first of the multi-instance daemons. This patch allows the
starting, stopping, restarting and monitoring of multiple instances of
the same protocol daemon.
Multiple instances are specified in the daemons file using a new variable:
ospfd_instances="1,2"
Absence of this variable means ospfd will start in legacy, single instance
mode. The original "ospfd=yes" line is still required.
Daemons are started with the "-n <instance>" option. Each daemon is named
"<daemon>-<instance>", for example "ospfd-1", "ospfd-2" etc. Similarly,
pid files are ospfd-1.pid and vty files are named ospfd-1.vty.
We're also introducing a new file, /etc/default/quagga to store the
default value for the maximum instances associated with a daemon.
watchquagga and others are unmodified and everything else just works once
this code is in place.
The code has been enhanced to support restarting watchquagga with only the
updated daemons when an individual daemon is stopped or started. For example,
without this patch, stopping just bgpd would terminate watchquagga even if
ospfd and zebra are still running. Similarly, starting just bgpd when ospfd
and zebra are running wouldn't update watchquagga to include bgpd. Furthermore,
when the daemons file is modified and a daemon is no longer deemed necessary
and quagga restarted, the daemon is not killed. For example, switching
ospfd=yes to ospfd=no and restarting the quagga will leave ospfd daemon
running. This case is also fixed with this patch.
However, adding a new instance to the ospfd_instances file and starting
just that instance will start just that instance and add it to watchquagga.
Similarly, a single instance maybe stopped or restarted.
Caveat emptor: With multi-instance daemons, stopping a single instance and then
starting a different instance will cause all instances to be monitored by
watchquagga i.e. all instances will be restarted, if necessary.
Signed-off-by: Dinesh G Dutt <ddutt at cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:41 +0000 (18:03 -0700)]
ospf6d: ospfv3-show-cmds-instance-check-fix.patch
SYMPTOM:
If some of the ospfv3 commands like 'show ipv6 ospf6 route' are executed
with ospf6d daemon running but before having any ospfv3 configuration, then
ospf6d crash is seen.
ISSUE:
There are a few show commands, which are (unlike others) not checking if
ospf6 instance is initialized already.
FIX:
Add the missing checks, by using OSPF6_CMD_CHECK_RUNNING() in the commands
where its needed and not yet used.
Donald Sharp [Wed, 20 May 2015 01:03:40 +0000 (18:03 -0700)]
ospf6d: ospfv3-setsocket-retry.patch
SYMPTOM:
With quagga running on Linux, 'ifdown <if-name>' followed by 'ifup <ifname>
can cause OSPFv3 to not receive Hello packets on the interface.
ISSUE:
Operating System's interface IPv6 readiness may not be guaranteed at the
time of interface-up event. Thats because the ipv6 components in an OS may
also be listening to the same interface-up event that (in this case) is
relayed to OSPFv3.
In this failure case, setsockopt with option IPV6_JOIN_GROUP on the interface
returned EINVAL.
Error logs -
OSPF6: Zebra Interface state change: swp1 index 3 flags 11043 metric 1 mtu 1500
OSPF6: Interface Event swp1: [InterfaceUp]
OSPF6: Network: setsockopt (20) on ifindex 3 failed: Invalid argument
FIX:
To take care of this possible race condition, any address-family related
setting should be retried. Given it's a rare condition and window of this
race should be short, the patch adds a limited retry mechanism for the
IPV6 membership setting on the socket.
Donald Sharp [Wed, 20 May 2015 00:58:12 +0000 (17:58 -0700)]
Overhual BGP debugs
Summary of changes
- added an option to enable keepalive debugs for a specific peer
- added an option to enable inbound and/or outbound updates debugs for a specific peer
- added an option to enable update debugs for a specific prefix
- added an option to enable zebra debugs for a specific prefix
- combined "deb bgp", "deb bgp events" and "deb bgp fsm" into "deb bgp neighbor-events". "deb bgp neighbor-events" can be enabled for a specific peer.
- merged "deb bgp filters" into "deb bgp update"
- moved the per-peer logging to one central log file. We now have the ability to filter all verbose debugs on a per-peer and per-prefix basis so we no longer need to keep log files per-peer. This simplifies troubleshooting by keeping all BGP logs in one location. The use
r can then grep for the peer IP they are interested in if they wish to see the logs for a specific peer.
- Changed "show debugging" in isis to "show debugging isis" to be consistent with all other protocols. This was very confusing for the user because they would type "show debug" and expect to see a list of debugs enabled across all protocols.
- Removed "undebug" from the parser for BGP. Again this was to be consisten with all other protocols.
- Removed the "all" keyword from the BGP debug parser. The user can now do "no debug bgp" to disable all BGP debugs, before you had to type "no deb all bgp" which was confusing.
The new parse tree for BGP debugging is:
deb bgp as4
deb bgp as4 segment
deb bgp keepalives [A.B.C.D|WORD|X:X::X:X]
deb bgp neighbor-events [A.B.C.D|WORD|X:X::X:X]
deb bgp nht
deb bgp updates [in|out] [A.B.C.D|WORD|X:X::X:X]
deb bgp updates prefix [A.B.C.D/M|X:X::X:X/M]
deb bgp zebra
deb bgp zebra prefix [A.B.C.D/M|X:X::X:X/M]
Donald Sharp [Wed, 20 May 2015 00:58:12 +0000 (17:58 -0700)]
Changes to improve BGP convergence time:
- Schedule write thread for advertisements and withdraws only if corresponding
FIFOs are growing and/or upon work_queue getting fully processed.
- Set non-default yield time for the main work_queue, as the default value
of 10ms results in yielding after processing very few nodes.
- Remove unnecessary scheduling of write thread when update packet is formed.
- If MRAI is 0, don't start a timer unnecessarily, directly schedule write
thread.
- Some debugs.
Donald Sharp [Wed, 20 May 2015 00:58:10 +0000 (17:58 -0700)]
Some small enhancements to thread and workqueue libraries in zebra:
- Allow work queues to specify the yield duration for corresponding background thread
- Support using specified yield duration in thread yielding
- During work queue processing, if using a single list element with a meta-queue
(like done in Zebra), do not exit after each element is processed, instead
update the next-node upon a WQ_REQUEUE so that the WQ processing continues
and is terminated by the yield logic.
- Enhance work queue debug output
Donald Sharp [Wed, 20 May 2015 00:47:25 +0000 (17:47 -0700)]
bgpd-delete-route-on-invalid-nh.patch
BGPd: Delete the route from the kernel when a valid NH changes to invalid NH
A route has been announced by a BGP peer with a valid NH and has been
populated into the kernel. Now, if the NH announced changes (say via routemap)
to an invalid NH, the route is marked as inactive/inaccessible inside Quagga,
but is not deleted from the kernel. This patch fixes that issue.
The problem is caused by BGP losing the old valid NH and using the new, invalid
NH to delete the now-inaccessible route. However, the kernel/zebra has the
route using the old NH and so they reject the delete. Fix involves not sending
the invalid NH when its the only NH. Things worked fine if the route had BGP
multipath.
Donald Sharp [Wed, 20 May 2015 00:47:24 +0000 (17:47 -0700)]
zebra-set-src-routemap.patch
Honor setting source via route map and pushing that to the kernel.
With recursive routes, the ability to set the source IP address of a route
via a routemap has been broken. This patch fixes that.
To allow route map to set a source and then to unapply the route map and
have the source be taken out, I've introduced a new field in the nexthop
data structure called rmap_src. This field is zero'd before invoking the
route map apply function.
Today, no protocol daemon specifies the src in its route update to zebra.
If that happens, I didn't want to stomp on it and so have left the src
field intact instead of reusing that for the routemap to play with.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:47:22 +0000 (17:47 -0700)]
zebra: zebra-client-info-detail.patch
Zebra: Gather and display detailed info about clients of Zebra
The display of zebra client info is rather paltry: just the name and the FD.
For troubleshooting and general helpfulness, its useful to gather more info
about each client and display that. This patch does just that.
Donald Sharp [Wed, 20 May 2015 00:47:20 +0000 (17:47 -0700)]
zebra: zebra-nht-routemap.patch
Zebra: Add route-map support for Next Hop Tracking
It is sometimes useful to restrict the resolution of recursive routes
to only specific via's. For example, in some configurations resolving
a route through a default route is not acceptable.
This patch adds a new route-map attach point, to zebra's next-hop-tracking
server. Whenever NHT is considering sending notification of a route
resolution, it applies a specified route-map and only if it passes, is the
NHT reachable message sent to the appropriate client protocol (BGP, OSPF etc.).
If the route-map filters the resolution, then a withdraw is sent to the
client protocol.
The route-map is sent the ip address of the route via which the resolution is
happening as well as the valid NHs associated with that route.
We also add support for matching on IP addr prefix len and source protocol
to ensure that resolution happens only via a very specific route.
Donald Sharp [Wed, 20 May 2015 00:46:33 +0000 (17:46 -0700)]
Add support for route tags
Credit
------
A huge amount of credit for this patch goes to Piotr Chytla for
their 'route tags support' patch that was submitted to quagga-dev
in June 2007.
Documentation
-------------
All ipv4 and ipv6 static route commands now have a "tag" option
which allows the user to set a tag between 1 and 65535.
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag ?
<1-65535> Tag value
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag 40
quagga(config)#
quagga# show ip route 1.1.1.1/32
Routing entry for 1.1.1.1/32
Known via "static", distance 1, metric 0, tag 40, best
* 10.1.1.1, via swp1
quagga#
The route-map parser supports matching on tags and setting tags
!
route-map MATCH_TAG_18 permit 10
match tag 18
!
!
route-map SET_TAG_22 permit 10
set tag 22
!
BGP and OSPF support:
- matching on tags when redistribing routes from the RIB into BGP/OSPF.
- setting tags when redistribing routes from the RIB into BGP/OSPF.
BGP also supports setting a tag via a table-map, when installing BGP
routes into the RIB.
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:47 +0000 (17:40 -0700)]
bgpd: bgpd-route-map-match-interface.patch
BGP: Add match interface support to BGP route-map.
Currently, BGP route maps don't support interface match. This is a problem
for commands such as redistribite connected that cannot exclude routes from
specific interfaces (such as mgmt interfaces).
Donald Sharp [Wed, 20 May 2015 00:40:45 +0000 (17:40 -0700)]
Summary: Test effect of route-map on received/advertised routes
This patch adds the ability to see the effect of applying a route-map on
the routes received or advertised from or to a neighbor. This effect can
be seen without actually affecting the current state. If the result seen
is what is desired, then the user can actually apply the route-map.
Currently, the application acts on route-map in or out and on unsuppress
maps.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:45 +0000 (17:40 -0700)]
bgpd: bgpd-event-driven-route-map-updates.patch
BGP: Reprocess the trigger points when an attached route map changes
Currently, modifications to route maps do not affect already processed
routes; they only affect new route updates. This patch addresses this
limitation.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:44 +0000 (17:40 -0700)]
ptm-integration.patch
Integrates Prescriptive Topology Module(ptm) into quagga.
If this module is enabled, link ups are notified only after the link is verified
as being connected to the neighbor specified. The neighbor specification and
checking is done by the ptm daemon.
Donald Sharp [Wed, 20 May 2015 00:40:43 +0000 (17:40 -0700)]
iquagga-faster-compile.patch
Avoid cleaning up the source tree and running reconf every time. Allows
recompilation of only those files that have been modified since last
run. Relies on the existence of config.status file to decide if we've
run the first time or subsequent times.
'administrative' takes effect from the time of the config until the config is
removed.
'on-startup' is effective only at the startup time for the given '<period>'
after the first peer is established.
'<max-med-value>' is used as the MED value to be sent out when the max-med
is effective. Default max-med value is 4294967294.
NOTE:
When max-med is active, MED is changed only in the outgoing attributes to the
peers, it doesn't modify any MED specific state of the attributes in BGP on
the local node.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:42 +0000 (17:40 -0700)]
bgpd-scale-update-delay-packing.patch
ISSUE:
During startup, BGP update prefix packing wasnt optimal and route installation
was found to be spread over.
SOLUTION:
With this patch, update-delay post processing is serialized to achieve:
a. better peer update packing
(which helps in reducing total number of BGP update packets)
b. installation of the resulting routes in zebra as close to each others
as possible.
(which can help zebra batch its processing and updates to Kernel better)
Donald Sharp [Wed, 20 May 2015 00:40:41 +0000 (17:40 -0700)]
bgpd: bgpd-ibgp-policy-out-allow-mods.patch
BGPd: Allow route-map policy modifications to also affect route reflectors.
By default, attribute modification via route-map policy out is ignored on
reflected routes. This patch provides an option to allow this modification
to occur. Once enabled, it affects all reflected routes.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:37 +0000 (17:40 -0700)]
bgpd: bgpd-mrai.patch
BGP: Event-driven route announcement taking into account min route advertisement interval
ISSUE
BGP starts the routeadv timer (peer->t_routeadv) to expire in 1 sec
when a peer is established. From then on, the timer expires
periodically based on the configured MRAI value (default: 30sec for
EBGP, 5sec for IBGP). At the expiry, the write thread is triggered
that takes the routes from peer's sync FIFO (adj-rib-out) and sends
UPDATEs. This has a few drawbacks:
(1) Delay in new route announcement: Even when the last UPDATE message
was sent a while back, the next route change will necessarily have
to wait for routeadv expiry
(2) CPU usage: The timer is always armed. If the operator chooses to
configure a lower value of MRAI (zero second is a preferred choice
in many deployments) for better convergence, it leads to high CPU
usage for BGP process, even at the times of no network churn.
PATCH
Make the route advertisement event-driven - When routes are added to
peer's sync FIFO, check if the routeadv timer needs to be adjusted (or
started). Conversely, do not arm the routeadv timer unconditionally.
The patch also addresses route announcements during read-only mode
(update-delay). During read-only mode operation, the routeadv timer
is not started. When BGP comes out of read-only mode and all the
routes are processed, the timer is started for all peers with zero
expiry, so that the UPDATEs can be sent all at once. This leads to
(near-)optimal UPDATE packing.
Finally, the patch makes the "max # packets to write to peer socket at
a time" configurable. Currently it is hard-coded to 10. The command is
at the top router-bgp mode and is called "write-quanta <number>". It
is a useful convergence parameter to tweak.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:36 +0000 (17:40 -0700)]
bgpd: bgpd-peer-outq.patch
BGP: Show more meaningful outq value in 'show ip bgp summary' output.
'outq' field in 'show ip bgp sum' displays the number of formatted packets
to a peer. Since the route announcement follows an input-buffered pattern
(i.e. adj-rib-out is a separate queue of routes per peer and packets are
formatted from the routes at the time of TCP write), the outq field doesn't
show any interesting data worth watching.
The patch is to display the adj-rib-out queue depth instead.
Donald Sharp [Wed, 20 May 2015 00:40:36 +0000 (17:40 -0700)]
bgpd: bgpd-fix-ipv6-afi-parser-node.patch
BGPd: Make ipv6 unicast/multicast address-family work
In the absence of this patch, attempting to type "address-family ipv6 unicast"
would result in an "Ambiguous command" error and in the case of
"address-family ipv6 multicast", the command would silently fail, without the
prompt dropping into the address-family mode.
The cause is how the parse tree is constructed for ipv6 address family. There
was an error in extract.pl.in script and in vtysh.c files which assumed that
there was only address family ipv6 command, without unicast or multicast and
so the command was failing.
Donald Sharp [Wed, 20 May 2015 00:40:35 +0000 (17:40 -0700)]
zebra: zebra-use-fixed-metric-cost.patch
Zebra: Use a fixed route metric when populating kernel
The route metric is not used by the Linux kernel and is irrelevant to
the forwarding decision made by the kernel. Metric is a parameter used
only by a routing protocol to compute best path(s) and to communicate this
info to its peers. Consequently, there is no value in pushing the metric
provided by a protocol daemon to the kernel.
There is a significant advantage, at least on the Linux kernel, in pushing
a constant metric with a route populated by zebra. The metric is used as a
priority field in the kernel and modifying the metric due to say topology
changes causes multiple routes to be inserted into the kernel, with differing
priorities instead of replacing the existing one. This prevents us from
using replace semantic when a route changes.
So, this patch pushes a constant metric with a route populated by zebra.
Donald Sharp [Wed, 20 May 2015 00:40:34 +0000 (17:40 -0700)]
bgpd: bgpd-table-map.patch
COMMAND:
table-map <route-map-name>
DESCRIPTION:
This feature is used to apply a route-map on route updates from BGP to Zebra.
All the applicable match operations are allowed, such as match on prefix,
next-hop, communities, etc. Set operations for this attach-point are limited
to metric and next-hop only. Any operation of this feature does not affect
BGPs internal RIB.
Supported for ipv4 and ipv6 address families. It works on multi-paths as well,
however, metric setting is based on the best-path only.
IMPLEMENTATION NOTES:
The route-map application at this point is not supposed to modify any of BGP
route's attributes (anything in bgp_info for that matter). To achieve that,
creating a copy of the bgp_attr was inevitable. Implementation tries to keep
the memory footprint low, code comments do point out the rationale behind a
few choices made.
bgp_zebra_announce() was already a big routine, adding this feature would
extend it further. Patch has created a few smaller routines/macros whereever
possible to keep the size of the routine in check without compromising on the
readability of the code/flow inside this routine.
For updating a partially filtered route (with its nexthops), BGP to Zebra
replacement semantic of the next-hops serves the purpose well. However, with
this patch there could be some redundant withdraws each time BGP announces a
route thats (all the nexthops) gets denied by the route-map application.
Handling of this case could be optimized by keeping state with the prefix and
the nexthops in BGP. The patch doesn't optimizing that case, as even with the
redundant withdraws the total number of updates to zebra are still be capped
by the total number of routes in the table.
Donald Sharp [Wed, 20 May 2015 00:40:33 +0000 (17:40 -0700)]
bgpd: bgpd-update-delay.patch
COMMAND:
'update-delay <max-delay in seconds> [<establish-wait in seconds>]'
DESCRIPTION:
This feature is used to enable read-only mode on BGP process restart or when
BGP process is cleared using 'clear ip bgp *'. When applicable, read-only mode
would begin as soon as the first peer reaches Established state and a timer
for <max-delay> seconds is started.
During this mode BGP doesn't run any best-path or generate any updates to its
peers. This mode continues until:
1. All the configured peers, except the shutdown peers, have sent explicit EOR
(End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached
Established is considered an implicit-EOR.
If the <establish-wait> optional value is given, then BGP will wait for
peers to reach establish from the begining of the update-delay till the
establish-wait period is over, i.e. the minimum set of established peers for
which EOR is expected would be peers established during the establish-wait
window, not necessarily all the configured neighbors.
2. max-delay period is over.
On hitting any of the above two conditions, BGP resumes the decision process
and generates updates to its peers.
Default <max-delay> is 0, i.e. the feature is off by default.
This feature can be useful in reducing CPU/network used as BGP restarts/clears.
Particularly useful in the topologies where BGP learns a prefix from many peers.
Intermediate bestpaths are possible for the same prefix as peers get established
and start receiving updates at different times. This feature should offer a
value-add if the network has a high number of such prefixes.
IMPLEMENTATION OBJECTIVES:
Given this is an optional feature, minimized the code-churn. Used existing
constructs wherever possible (existing queue-plug/unplug were used to achieve
delay and resume of best-paths/update-generation). As a result, no new
data-structure(s) had to be defined and allocated. When the feature is disabled,
the new node is not exercised for the most part.
Donald Sharp [Wed, 20 May 2015 00:40:32 +0000 (17:40 -0700)]
bgpd: bgpd-restart-bit-fix.patch
ISSUE:
Quagga BGP doesn't send or use the restart-bit via the Graceful-Restart(GR)
capability. GR capability implementation isn't complete as per the RFC.
PATCH:
Patch uses BGP instance creation as the beginning of the startup period,
and 'restart_time' is taken as the startup period. As a result, BGP will
set the restart bit in the GR capability of the OPEN messages during the
startup period.
As an indication of quagga implementation's capability of sending End-Of-RIB,
helping a restarting neighbor, quagga BGP will now send global GR capability
irrespective of the graceful-restart config in BGP and the address-family
specific GR capability will be sent only if the GR config is present.
Forwarding bit is not set assuming its not preserved.
Donald Sharp [Wed, 20 May 2015 00:40:32 +0000 (17:40 -0700)]
ospfd: ospfv2-fix-interface-mode-cmd.patch
SYMPTOM:
Interface mode OSPF area configuration is not retained after restarting quagga.
Example -
quagga(config)# interface swp49
quagga(config-if)# ip ospf area 0.0.0.0
quagga# sh run
<snip>
interface swp49
ip ospf area 0.0.0.0
ipv6 nd suppress-ra
link-detect
!
quagga# write memory
* Restart quagga at this point*
quagga# sh run
<snip>
interface swp49
ipv6 nd suppress-ra
link-detect
!
ISSUE:
The issue is that the interface mode commands can reach the OSPF process even
before 'router ospf' command that initializes the default OSPF instance, this
is not getting handled properly in OSPF process.
FIX:
Initialize the default OSPF instance during OSPF process initializations, which
is before 'router ospf' command is received in OSPF process. So, when interface
mode command is received, it is guaranteed to have ospf instance to work with.
Other way could be to call ospf_get() instead of ospf_lookup() while processing
the config command callbacks, although OSPF needs to have at least one instance
structure anyways, therefore calling it unconditionally in OSPF initializations
should be fine too.
There could be more elaborate fix(es) possible to handle this, like adding some
ordering mechanism for commands as they are read by a process, or storing the
received command and applying it after the commands its dependent upon are
processed. For the issue at hand, initializing the default instance in main()
serves the purpose well.
Donald Sharp [Wed, 20 May 2015 00:40:31 +0000 (17:40 -0700)]
cluster-id length equality for multipath
A fat tree topology running IBGP gets into two issues with anycast address
routing. Consider the following topology:
R9 R10
x x
R3 R4 R7 R8
x x
R1 R2 R5 R6
| | | |
10/8 10/8 10/8 S
Let's remind ourselves of BGP decision process steps:
1. Highest Local Preference
2. Shortest AS Path Length
3. Lowest Origin Type
4. Lowest MED (Multi-Exit Discriminator)
5. Prefer External to Internal
6. Closest Egress (Lowest IGP Distance)
7. Tie Breaking (Lowest-Router-ID)
8. Tie Breaking (Lowest-cluster-list length)
9. Tie Breaking (Lowest-neighbor-address)
Without any policies, steps 1-6 will almost always evaluate identically for
all paths received on any router in the above topology. Let's assume that
the router-ids follow the following inequality: R1 < R2 < R5 < R6. Owing to
the 7th step above, all routers will now choose R1's path as the best. This
is undesirable. As an example, traffic from S to 10/8 will follow the path
S -> R6 -> R7 -> R9 -> R4 -> R2 -> 10/8 instead of S -> R6 -> R7 -> R5 -> 10/8.
Furthermore, once R7 (& R8) chooses R1's path as the best, it would withdraw
its path learned through (R5, R6) from (R9, R10). This leads to inefficient
load balancing - e.g. R9 can't do ECMP across all available egresses -
(R1, R2, R5).
The patch addresses these issues by noting that that cluster list is always
carried along with the routes and its length is a good indicator of IBGP
hops. It thus makes sense to compare that as an extension to metric after
step 6. That automatically ensures correct multipath computation.
Unfortunately a partial deployment of this in a generic topology (note:
fat-tree/clos topologies work fine) may lead to potential loops. It needs
to be looked into.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:45 +0000 (17:24 -0700)]
Add set ipv6 next-hop peer-address command.
IPv4 has the ability to specify the peer address with the keyword peer-address.
IPv6 mandates the use of a specific global or local address only in setting the
next-hop in routemaps. This makes it cumbersome to configure some large networks
with BGP and IPv6. This patch fixes that deficiency.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:45 +0000 (17:24 -0700)]
IPv6 multipath is broken in BGP if nexthop contains only global address.
IPv6 always uses both nextop IPv6 address and ifIndex in sending routes down to
zebra. In cases where only the global IPv6 address is present in the nexthop
information, the existing code doesn't set the ifIndex. An example of such a
case is when a route-map isused with "set ipv6 next-hop" and only global
address is specified. This code causes the ifIndex to be determined and
set thereby fixing the multipath programming.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:44 +0000 (17:24 -0700)]
When an LSA is flushed we need to update the timestamps for them. This
allows for the node to give the neighbor sufficient time to send back
an acknowledgement before retransmission kicks in.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com> Reviewed-by: James Li <jli@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:44 +0000 (17:24 -0700)]
Section 16.0 of rfc2328 (OSPF) specifies that the short-path
calculation to a node should be constructed with the sum of all path
costs (metrics) to the node (pretty simple huh). There is a usage of
metric typified by the "max-metric router-lsa" command in many
networking stacks that allows a router to gracefully "remove" itself
from a topology by advertising the maximum value of metric in it's
router LSAs (16 bits of "1"). In this case, the router will continue
to forward any traffic sent to it while these "max-metric" LSAs are
propagated through the network; at which point, the router can be
taken out of service.
The correct handling of this in ospfd would use this metric as part of
the calculation, disuading other routers from using it for transit
traffic (assuming a better path exits). Unfortunately, the ospfd
behavior is to remove these links from the SPF calculation. This
patch changes the behavior to omit this exception handling.
Signed-off-by: JR Rivers <jrrivers@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:24:43 +0000 (17:24 -0700)]
enable autoreconf so that Makefile.in is regenerated in the cumulus build.
This is necessary for the added .c files and modified Makefile.am files
in our patches.
Donald Sharp [Wed, 20 May 2015 00:24:43 +0000 (17:24 -0700)]
This patch enables support for multipath for IPV6. The nexthop information
from the protocols have ifindices and nexthop addresses in two different
structures. This patch combines them to ensure that the correct APIs can
be called. Also, given that IPV6 Linux implementation does not support the
rta_XXX APIs for multipath, the communication with the kernel is in terms
of a single nh/ifindex pair.
Donald Sharp [Tue, 19 May 2015 23:36:05 +0000 (16:36 -0700)]
ospfd-spf-stats.patch
Compute and display SPF execution statistics
Detailed SPF statistics, all around time spent executing various pieces of SPF
such as the SPF algorithm itself, installing routes, pruning unreachable networks
etc.
Reason codes for firing up SPF are:
R - Router LSA, N - Network LSA, S - Summary LSA, ABR - ABR status change,
ASBR - ASBR Status Change, AS - ASBR Summary, M - MaxAge
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: JR Rivers <jrrivers@cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma@cumulusnetworks.com> Reviewed-by: Ayan Banerjee <ayan@cumulusnetworks.com>
Donald Sharp [Tue, 19 May 2015 23:33:52 +0000 (16:33 -0700)]
zebra-enable-link-detect-by-default.patch
zebra: Set link-detect on by default
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt at cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma at cumulusnetworks.com>
Donald Sharp [Tue, 19 May 2015 23:31:06 +0000 (16:31 -0700)]
conditional-quagga-pdf.patch
The building of quagga.pdf requires the convert program out of the imagemagick
package. Getting this to run correctly in the scratchbox2 environment is
painful. Conditionally generate documentation during native compilation.
David Lamparter [Tue, 1 Jul 2014 14:14:05 +0000 (16:14 +0200)]
lib: unset ZEBRA_IFA_PEER if no dst addr present (BZ#801)
On OpenBSD, carp interfaces claim to be PtP interfaces with a 0.0.0.0/0
peer address. We process those in zebra and try to send them to
clients, at which point they get encoded as all-0. The client code,
however, decodes that to a NULL pointer instead of 0.0.0.0. This later
turns into a SEGV when CONNECTED_PREFIX sees that ZEBRA_IFA_PEER is set
and tries to access the peer prefix.
This is a band-aid fix for stable/0.99.23, a long-term solution needs
some conceptual improvements on the entire thing.
(The usefulness of a PtP-to-0.0.0.0/0 is a separate question; at this
point dropping the peer prefix seems the least intrusive solution.)
Reported-by: Laurent Lavaud <laurent.lavaud@ladtech.fr> Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Mon, 18 Aug 2014 16:05:25 +0000 (18:05 +0200)]
isisd: type mix-up in 28a8cfc "don't require IPv4"
Whoops, these are in6_addrs, not prefix_ipv6... funnily enough, it does the
right thing either way, if it compiles, which it only does on Linux because
IN6_IS_ADDR_LINKLOCAL contains a cast to the right type. On BSD there is no
such cast, hence it explodes on trying to compile, trying to access struct
members of in6_addrs while operating on prefix_ipv6...
Fixes: 28a8cfc ("isisd: don't require IPv4 for adjacency") Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
John Glotzer [Mon, 4 Aug 2014 19:39:23 +0000 (19:39 +0000)]
bgpd: memmove needed in community_del_val
In bgpd/bgp_community_del_val memcpy is used for potentially overlapping
regions which is *not* safe. It may "work" in some cases but is not
guaranteed to work in all cases. The case that I saw fail was on an
x86_64 architecture with the number of bytes being moved/copied equal to
8.
The way the code is written the uint32_t pointers will always differ by
1, which is equivalent to a memcpy/memmove of regions that are 4 bytes
away from one another. So the code failed while copying an 8 byte region
to an address that is 4 bytes lower i.e. overlapping regions.
Interestingly, the same architecture had no problems with a 12 byte
copy.
When the code failed the communities were [200,300,400] and a call was
made to delete the 200 community. The result of this was an array that
looked like [400,400] which was uniquified to [400]. Of course the
expected result should have been [300, 400].
One additional point - in our production environment memmove would not
*link* without including <string.h> but in an isolated quagga git repo
this #include does not seem to be required and I see memmove is used in
vtysh.c without this #include either.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>