Donald Sharp [Wed, 20 May 2015 01:03:53 +0000 (18:03 -0700)]
Fixing a couple of issues with ospf6_route_remove () routine.
When a route_node has multiple ospf6_routes under it (common subnet case),
then the current implementation has an issue in adjusting the route_node->info
on a ospf6_route_remove() call.
The main reason is that it ends up using exact match to determine if the next
ospf6_route belongs to the same route_node or not. Fixing that part to use
rnode (the existing back-pointer to the route_node) from the ospf6_route to
determine that.
Also fixing some of the walks to turn them safe so that the route deletion is
fine.
Donald Sharp [Wed, 20 May 2015 01:03:52 +0000 (18:03 -0700)]
Process and/or announce existing routes when a prefix-list, distribute-
list or filter-list is applied (added or removed) against a neighbor or
peer group. This makes the behavior inline with other configuration changes
such as add or remove of route-map against a neighbor or change of other
settings such as next-hop-self or as-override.
Donald Sharp [Wed, 20 May 2015 01:03:51 +0000 (18:03 -0700)]
LA (local-address) bit related inter-op fix.
As per the RFC, when the NU bit is set, prefix should be ignored.
However, the code is currently ignoring prefix with LA bit too.
Fixing that part.
In future, we should also set LA bit for the loopback addresses. Not doing this
part right away, as quagga wont be backward compatible with its own previous
releases. Maybe after a release or so, we should start setting LA bit too.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:51 +0000 (18:03 -0700)]
Ensure that routes from a peer are not considered for best path
comparison if the peer is not in an Established state. There can
be a window between a peer being deleted and the background
thread that actually clears the routes (marks them as "removed")
runs during which best path may run. If this path selection
compared two prefixes all the way down to peer IP addresses and
one of these two peers had just been deleted, that peer would
not have its sockunion structures, especially su_remote, resulting
in a BGPD exception.
Donald Sharp [Wed, 20 May 2015 01:03:50 +0000 (18:03 -0700)]
Make OSPF compliant to the last sentence of this section in RFC 2328
9.5 Sending Hello packets
Hello packets are sent out each functioning router interface.
They are used to discover and maintain neighbor
relationships.[6] On broadcast and NBMA networks, Hello Packets
are also used to elect the Designated Router and Backup
Designated Router.
The format of an Hello packet is detailed in Section A.3.2. The
Hello Packet contains the router's Router Priority (used in
choosing the Designated Router), and the interval between Hello
Packets sent out the interface (HelloInterval). The Hello
Packet also indicates how often a neighbor must be heard from to
remain active (RouterDeadInterval). Both HelloInterval and
RouterDeadInterval must be the same for all routers attached to
a common network. The Hello packet also contains the IP address
mask of the attached network (Network Mask). On unnumbered
point-to-point networks and on virtual links this field should
be set to 0.0.0.0.
Donald Sharp [Wed, 20 May 2015 01:03:50 +0000 (18:03 -0700)]
When internal operations are performed (e.g., best-path selection, next-hop
change processing etc.) that refer to the BGP instance, the correct BGP
instance must be referenced and not the default BGP instance. The default
BGP instance is the first instance on the instance list. In a scenario
where one BGP instance is deleted (through operator action such as a
"no router bgp" command) and another instance exists or is created, there
may still be events in-flight that need to be processed against the
deleted instance. Trying to process these against the default instance
is erroneous. The calls to bgp_get_default() must be limited to the user
interface (vtysh) context.
Donald Sharp [Wed, 20 May 2015 01:03:49 +0000 (18:03 -0700)]
bgpd: Disable connected check for next hop on eBGP peers
In the data center, in conjunction with next hop propagation for features
such as announcing VIP routes to load balancers and such, it is desired to
disable the connected route check even on ebgp peers with TTL of 1. This
patch is used to disable the check for all peers instead of the peer by
peer check that is currently supported. Furthermore, the existing
disable-connected-check is different from how Cisco implements this feature.
So, we add this new flag to avoid reliance on the existing flag.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:49 +0000 (18:03 -0700)]
BGP: Use the new value of dynamic capability in Open
The value for dynamic capability used in BGP open during capability
negotiation is a deprecated value. Thus, interop with other systems
is broken. This patch fixes that by advertising both the old and new
values. This ensures interop with older versions of quagga and other
non-quagga systems.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:49 +0000 (18:03 -0700)]
bgpd: Add route-map support for set ip next-hop unchanged
In the data center, where load balancers are announced as VIPs, and eBGP
is used as the routing protocol, this feature is required to ensure that
VIP announcements can be made from anywhere the operator sees fit.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com> Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:47 +0000 (18:03 -0700)]
This patch adds support for allowing BGP to create and bring up neighbor
sessions dynamically. The operator configures a range of neighbor addresses
to which peering is allowed. The ranges are configured as subnets and
multiple ranges are allowed. Each range is associated with a peer-group
so that additional parameters can be configured.
BGP neighbor sessions are dynamically created when connections are initiated
by remote neighbors whose addresses fall within a configured range. The
sessions are deleted when the BGP connection terminates.
A limit on the number of neighbors allowed from each range of addresses
can be specified.
IPv4 and IPv6 peering is supported. Over the peering, any of the address
families configured for the peer-group can be negotiated.
Donald Sharp [Wed, 20 May 2015 01:03:47 +0000 (18:03 -0700)]
BGP: Add dynamic update group support
This patch implements the 'update-groups' functionality in BGP. This is a
function that can significantly improve BGP performance for Update generation
and resultant network convergence. BGP Updates are formed for "groups" of
peers and then replicated and sent out to each peer rather than being formed
for each peer. Thus major BGP operations related to outbound policy
application, adj-out maintenance and actual Update packet formation
are optimized.
BGP update-groups dynamically groups peers together based on configuration
as well as run-time criteria. Thus, it is more flexible than update-formation
based on peer-groups, which relies on operator configuration.
[Note that peer-group based update formation has been introduced into BGP by
Cumulus but is currently intended only for specific releases.]
Donald Sharp [Wed, 20 May 2015 01:03:45 +0000 (18:03 -0700)]
Per AFI redist registrations
The problem is that zclient->redist[ZEBRA_ROUTE_MAX] used for storing a
client’s redist state, has no address-family qualification. This means
a client can only store its interest in a protocol (connected, static etc.),
but cant choose IPv4 or ipv6 with that. This hindered implementation on
client sides to manage redistribution of ipv4 and ipv6 both.
BGP's redistribution of protocols like connected/static is one such place.
One fix could be to overload this and flap the redist connection each time
any new afi is added for redist, but that may have side-effects on the
existing afi redist.
The cleaner way is to modify redist data-structure to also take AFI, and adjust
routines that deal with it, so that a client can register for a protocol
redistribution based on the AFI. BGP already maintains redistribution state
based on afi and protocol (bgp->redist[AFI_MAX][ZEBRA_ROUTE_MAX]). This patch
takes care of filling up the gap in zclient/zserv redistribution state to
also use AFI qualification.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:44 +0000 (18:03 -0700)]
During best path selection, if one of the candidates is a stale entry, do not
perform the neighbor address comparison as that information is invalid for
the stale entry. Attempting to perform the comparison results in a bgpd
exception.
Donald Sharp [Wed, 20 May 2015 01:03:43 +0000 (18:03 -0700)]
ISSUE:
LSAcks (for directed acks) are being sent to neighbor's unicast address.
RFC 2328 says:
"The IP destination address for the packet is selected as
follows. On physical point-to-point networks, the IP
destination is always set to the address AllSPFRouters"
Fix is to unconditionally set the destination address for LSAcks over
point-to-point links as AllSPFRouters. Quagga OSPF already has similar
change for OSPF DBD, LSUpdate and LSrequest packets.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:42 +0000 (18:03 -0700)]
zebra-redistribute-table.patch
Zebra: Redistribute routes from non-main kernel table to main.
This can be the basis for many interesting features such as variations
of redistribute ARP, using zebra as the RIB in the presence of multiple
routing protocol stacks etc. The code only supports IPv4 for now, but
the infrastructure is in place for IPv6.
Usage:
There is a new route type introduced by this model: TABLE. Routes
imported from alternate kernel tables will have their protocol type set to
TABLE.
Routes from alternate kernel tables MUST be first imported into the main
table via "ip import-table <table id>". They can then be redistributed via
a routing protocol via the "redistribute table" command. Each imported table
can an optional administrative distance specified. In Zebra, a route with a
lower distance is chosen over routes with a higher distance. So, distance
is how the user can choose to prioritize routes from a particular table over
routes from other tables or routes learnt another way in zebra.
Route maps for imported tables are specified via "ip protocol" command in
zebra. Route maps for redistributed routes within a routing protocol are
subject to the route map options supported by the protocol. The
"match source-protocol" option in route maps can match against "table"
to filter routes learnt from alternate kernel routing tables.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
- etc/init.d/quagga is modified to support creating separate ospf daemon
process for each instance. Each individual instance is monitored by
watchquagga just like any protocol daemons.(requires initd-mi.patch).
- Vtysh is modified to able to connect to multiple daemons of the same
protocol (supported for OSPF only for now).
- ospfd is modified to remember the Instance-ID that its invoked with. For
the entire life of the process it caters to any command request that
matches that instance-ID (unless its a non instance specific command).
Routes/messages to zebra are tagged with instance-ID.
- zebra route/redistribute mechanisms are modified to work with
[protocol type + instance-id]
- bgpd now has ability to have multiple instance specific redistribution
for a protocol (OSPF only supported/tested for now).
- zlog ability to display instance-id besides the protocol/daemon name.
- Changes in other daemons are to because of the needed integration with
some of the modified APIs/routines. (Didn’t prefer replicating too many
separate instance specific APIs.)
- config/show/debug commands are modified to take instance-id argument
as appropriate.
Guidelines to start using multi-instance ospf
---------------------------------------------
The patch is backward compatible, i.e for any previous way of single ospf
deamon(router ospf <cr>) will continue to work as is, including all the
show commands etc.
To enable multiple instances, do the following:
1. service quagga stop
2. Modify /etc/quagga/daemons to add instance-ids of each desired
instance in the following format:
ospfd=“yes"
ospfd_instances="1,2,3"
assuming you want to enable 3 instances with those instance ids.
3. Create corresponding ospfd config files as ospfd-1.conf, ospfd-2.conf
and ospfd-3.conf.
4. service quagga start/restart
5. Verify that the deamons are started as expected. You should see
ospfd started with -n <instance-id> option.
ps –ef | grep quagga
With that /var/run/quagga/ should have ospfd-<instance-id>.pid and
ospfd-<instance-id>/vty to each instance.
6. vtysh to work with instances as you would with any other deamons.
7. Overall most quagga semantics are the same working with the instance
deamon, like it is for any other daemon.
NOTE:
To safeguard against errors leading to too many processes getting invoked,
a hard limit on number of instance-ids is in place, currently its 5.
Allowed instance-id range is <1-65535>
Once daemons are up, show running from vtysh should show the instance-id
of each daemon as 'router ospf <instance-id>’ (without needing explicit
configuration)
Instance-id can not be changed via vtysh, other router ospf configuration
is allowed as before.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:41 +0000 (18:03 -0700)]
initd: initd-mi.patch
Support Multi-Instance protocol daemons initd
OSPFd is the first of the multi-instance daemons. This patch allows the
starting, stopping, restarting and monitoring of multiple instances of
the same protocol daemon.
Multiple instances are specified in the daemons file using a new variable:
ospfd_instances="1,2"
Absence of this variable means ospfd will start in legacy, single instance
mode. The original "ospfd=yes" line is still required.
Daemons are started with the "-n <instance>" option. Each daemon is named
"<daemon>-<instance>", for example "ospfd-1", "ospfd-2" etc. Similarly,
pid files are ospfd-1.pid and vty files are named ospfd-1.vty.
We're also introducing a new file, /etc/default/quagga to store the
default value for the maximum instances associated with a daemon.
watchquagga and others are unmodified and everything else just works once
this code is in place.
The code has been enhanced to support restarting watchquagga with only the
updated daemons when an individual daemon is stopped or started. For example,
without this patch, stopping just bgpd would terminate watchquagga even if
ospfd and zebra are still running. Similarly, starting just bgpd when ospfd
and zebra are running wouldn't update watchquagga to include bgpd. Furthermore,
when the daemons file is modified and a daemon is no longer deemed necessary
and quagga restarted, the daemon is not killed. For example, switching
ospfd=yes to ospfd=no and restarting the quagga will leave ospfd daemon
running. This case is also fixed with this patch.
However, adding a new instance to the ospfd_instances file and starting
just that instance will start just that instance and add it to watchquagga.
Similarly, a single instance maybe stopped or restarted.
Caveat emptor: With multi-instance daemons, stopping a single instance and then
starting a different instance will cause all instances to be monitored by
watchquagga i.e. all instances will be restarted, if necessary.
Signed-off-by: Dinesh G Dutt <ddutt at cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 01:03:41 +0000 (18:03 -0700)]
ospf6d: ospfv3-show-cmds-instance-check-fix.patch
SYMPTOM:
If some of the ospfv3 commands like 'show ipv6 ospf6 route' are executed
with ospf6d daemon running but before having any ospfv3 configuration, then
ospf6d crash is seen.
ISSUE:
There are a few show commands, which are (unlike others) not checking if
ospf6 instance is initialized already.
FIX:
Add the missing checks, by using OSPF6_CMD_CHECK_RUNNING() in the commands
where its needed and not yet used.
Donald Sharp [Wed, 20 May 2015 01:03:40 +0000 (18:03 -0700)]
ospf6d: ospfv3-setsocket-retry.patch
SYMPTOM:
With quagga running on Linux, 'ifdown <if-name>' followed by 'ifup <ifname>
can cause OSPFv3 to not receive Hello packets on the interface.
ISSUE:
Operating System's interface IPv6 readiness may not be guaranteed at the
time of interface-up event. Thats because the ipv6 components in an OS may
also be listening to the same interface-up event that (in this case) is
relayed to OSPFv3.
In this failure case, setsockopt with option IPV6_JOIN_GROUP on the interface
returned EINVAL.
Error logs -
OSPF6: Zebra Interface state change: swp1 index 3 flags 11043 metric 1 mtu 1500
OSPF6: Interface Event swp1: [InterfaceUp]
OSPF6: Network: setsockopt (20) on ifindex 3 failed: Invalid argument
FIX:
To take care of this possible race condition, any address-family related
setting should be retried. Given it's a rare condition and window of this
race should be short, the patch adds a limited retry mechanism for the
IPV6 membership setting on the socket.
Donald Sharp [Wed, 20 May 2015 00:58:12 +0000 (17:58 -0700)]
Overhual BGP debugs
Summary of changes
- added an option to enable keepalive debugs for a specific peer
- added an option to enable inbound and/or outbound updates debugs for a specific peer
- added an option to enable update debugs for a specific prefix
- added an option to enable zebra debugs for a specific prefix
- combined "deb bgp", "deb bgp events" and "deb bgp fsm" into "deb bgp neighbor-events". "deb bgp neighbor-events" can be enabled for a specific peer.
- merged "deb bgp filters" into "deb bgp update"
- moved the per-peer logging to one central log file. We now have the ability to filter all verbose debugs on a per-peer and per-prefix basis so we no longer need to keep log files per-peer. This simplifies troubleshooting by keeping all BGP logs in one location. The use
r can then grep for the peer IP they are interested in if they wish to see the logs for a specific peer.
- Changed "show debugging" in isis to "show debugging isis" to be consistent with all other protocols. This was very confusing for the user because they would type "show debug" and expect to see a list of debugs enabled across all protocols.
- Removed "undebug" from the parser for BGP. Again this was to be consisten with all other protocols.
- Removed the "all" keyword from the BGP debug parser. The user can now do "no debug bgp" to disable all BGP debugs, before you had to type "no deb all bgp" which was confusing.
The new parse tree for BGP debugging is:
deb bgp as4
deb bgp as4 segment
deb bgp keepalives [A.B.C.D|WORD|X:X::X:X]
deb bgp neighbor-events [A.B.C.D|WORD|X:X::X:X]
deb bgp nht
deb bgp updates [in|out] [A.B.C.D|WORD|X:X::X:X]
deb bgp updates prefix [A.B.C.D/M|X:X::X:X/M]
deb bgp zebra
deb bgp zebra prefix [A.B.C.D/M|X:X::X:X/M]
Donald Sharp [Wed, 20 May 2015 00:58:12 +0000 (17:58 -0700)]
Changes to improve BGP convergence time:
- Schedule write thread for advertisements and withdraws only if corresponding
FIFOs are growing and/or upon work_queue getting fully processed.
- Set non-default yield time for the main work_queue, as the default value
of 10ms results in yielding after processing very few nodes.
- Remove unnecessary scheduling of write thread when update packet is formed.
- If MRAI is 0, don't start a timer unnecessarily, directly schedule write
thread.
- Some debugs.
Donald Sharp [Wed, 20 May 2015 00:58:10 +0000 (17:58 -0700)]
Some small enhancements to thread and workqueue libraries in zebra:
- Allow work queues to specify the yield duration for corresponding background thread
- Support using specified yield duration in thread yielding
- During work queue processing, if using a single list element with a meta-queue
(like done in Zebra), do not exit after each element is processed, instead
update the next-node upon a WQ_REQUEUE so that the WQ processing continues
and is terminated by the yield logic.
- Enhance work queue debug output
Donald Sharp [Wed, 20 May 2015 00:47:25 +0000 (17:47 -0700)]
bgpd-delete-route-on-invalid-nh.patch
BGPd: Delete the route from the kernel when a valid NH changes to invalid NH
A route has been announced by a BGP peer with a valid NH and has been
populated into the kernel. Now, if the NH announced changes (say via routemap)
to an invalid NH, the route is marked as inactive/inaccessible inside Quagga,
but is not deleted from the kernel. This patch fixes that issue.
The problem is caused by BGP losing the old valid NH and using the new, invalid
NH to delete the now-inaccessible route. However, the kernel/zebra has the
route using the old NH and so they reject the delete. Fix involves not sending
the invalid NH when its the only NH. Things worked fine if the route had BGP
multipath.
Donald Sharp [Wed, 20 May 2015 00:47:24 +0000 (17:47 -0700)]
zebra-set-src-routemap.patch
Honor setting source via route map and pushing that to the kernel.
With recursive routes, the ability to set the source IP address of a route
via a routemap has been broken. This patch fixes that.
To allow route map to set a source and then to unapply the route map and
have the source be taken out, I've introduced a new field in the nexthop
data structure called rmap_src. This field is zero'd before invoking the
route map apply function.
Today, no protocol daemon specifies the src in its route update to zebra.
If that happens, I didn't want to stomp on it and so have left the src
field intact instead of reusing that for the routemap to play with.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:47:22 +0000 (17:47 -0700)]
zebra: zebra-client-info-detail.patch
Zebra: Gather and display detailed info about clients of Zebra
The display of zebra client info is rather paltry: just the name and the FD.
For troubleshooting and general helpfulness, its useful to gather more info
about each client and display that. This patch does just that.
Donald Sharp [Wed, 20 May 2015 00:47:20 +0000 (17:47 -0700)]
zebra: zebra-nht-routemap.patch
Zebra: Add route-map support for Next Hop Tracking
It is sometimes useful to restrict the resolution of recursive routes
to only specific via's. For example, in some configurations resolving
a route through a default route is not acceptable.
This patch adds a new route-map attach point, to zebra's next-hop-tracking
server. Whenever NHT is considering sending notification of a route
resolution, it applies a specified route-map and only if it passes, is the
NHT reachable message sent to the appropriate client protocol (BGP, OSPF etc.).
If the route-map filters the resolution, then a withdraw is sent to the
client protocol.
The route-map is sent the ip address of the route via which the resolution is
happening as well as the valid NHs associated with that route.
We also add support for matching on IP addr prefix len and source protocol
to ensure that resolution happens only via a very specific route.
Donald Sharp [Wed, 20 May 2015 00:46:33 +0000 (17:46 -0700)]
Add support for route tags
Credit
------
A huge amount of credit for this patch goes to Piotr Chytla for
their 'route tags support' patch that was submitted to quagga-dev
in June 2007.
Documentation
-------------
All ipv4 and ipv6 static route commands now have a "tag" option
which allows the user to set a tag between 1 and 65535.
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag ?
<1-65535> Tag value
quagga(config)# ip route 1.1.1.1/32 10.1.1.1 tag 40
quagga(config)#
quagga# show ip route 1.1.1.1/32
Routing entry for 1.1.1.1/32
Known via "static", distance 1, metric 0, tag 40, best
* 10.1.1.1, via swp1
quagga#
The route-map parser supports matching on tags and setting tags
!
route-map MATCH_TAG_18 permit 10
match tag 18
!
!
route-map SET_TAG_22 permit 10
set tag 22
!
BGP and OSPF support:
- matching on tags when redistribing routes from the RIB into BGP/OSPF.
- setting tags when redistribing routes from the RIB into BGP/OSPF.
BGP also supports setting a tag via a table-map, when installing BGP
routes into the RIB.
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:47 +0000 (17:40 -0700)]
bgpd: bgpd-route-map-match-interface.patch
BGP: Add match interface support to BGP route-map.
Currently, BGP route maps don't support interface match. This is a problem
for commands such as redistribite connected that cannot exclude routes from
specific interfaces (such as mgmt interfaces).
Donald Sharp [Wed, 20 May 2015 00:40:45 +0000 (17:40 -0700)]
Summary: Test effect of route-map on received/advertised routes
This patch adds the ability to see the effect of applying a route-map on
the routes received or advertised from or to a neighbor. This effect can
be seen without actually affecting the current state. If the result seen
is what is desired, then the user can actually apply the route-map.
Currently, the application acts on route-map in or out and on unsuppress
maps.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:45 +0000 (17:40 -0700)]
bgpd: bgpd-event-driven-route-map-updates.patch
BGP: Reprocess the trigger points when an attached route map changes
Currently, modifications to route maps do not affect already processed
routes; they only affect new route updates. This patch addresses this
limitation.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:44 +0000 (17:40 -0700)]
ptm-integration.patch
Integrates Prescriptive Topology Module(ptm) into quagga.
If this module is enabled, link ups are notified only after the link is verified
as being connected to the neighbor specified. The neighbor specification and
checking is done by the ptm daemon.
Donald Sharp [Wed, 20 May 2015 00:40:43 +0000 (17:40 -0700)]
iquagga-faster-compile.patch
Avoid cleaning up the source tree and running reconf every time. Allows
recompilation of only those files that have been modified since last
run. Relies on the existence of config.status file to decide if we've
run the first time or subsequent times.
'administrative' takes effect from the time of the config until the config is
removed.
'on-startup' is effective only at the startup time for the given '<period>'
after the first peer is established.
'<max-med-value>' is used as the MED value to be sent out when the max-med
is effective. Default max-med value is 4294967294.
NOTE:
When max-med is active, MED is changed only in the outgoing attributes to the
peers, it doesn't modify any MED specific state of the attributes in BGP on
the local node.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:42 +0000 (17:40 -0700)]
bgpd-scale-update-delay-packing.patch
ISSUE:
During startup, BGP update prefix packing wasnt optimal and route installation
was found to be spread over.
SOLUTION:
With this patch, update-delay post processing is serialized to achieve:
a. better peer update packing
(which helps in reducing total number of BGP update packets)
b. installation of the resulting routes in zebra as close to each others
as possible.
(which can help zebra batch its processing and updates to Kernel better)
Donald Sharp [Wed, 20 May 2015 00:40:41 +0000 (17:40 -0700)]
bgpd: bgpd-ibgp-policy-out-allow-mods.patch
BGPd: Allow route-map policy modifications to also affect route reflectors.
By default, attribute modification via route-map policy out is ignored on
reflected routes. This patch provides an option to allow this modification
to occur. Once enabled, it affects all reflected routes.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:37 +0000 (17:40 -0700)]
bgpd: bgpd-mrai.patch
BGP: Event-driven route announcement taking into account min route advertisement interval
ISSUE
BGP starts the routeadv timer (peer->t_routeadv) to expire in 1 sec
when a peer is established. From then on, the timer expires
periodically based on the configured MRAI value (default: 30sec for
EBGP, 5sec for IBGP). At the expiry, the write thread is triggered
that takes the routes from peer's sync FIFO (adj-rib-out) and sends
UPDATEs. This has a few drawbacks:
(1) Delay in new route announcement: Even when the last UPDATE message
was sent a while back, the next route change will necessarily have
to wait for routeadv expiry
(2) CPU usage: The timer is always armed. If the operator chooses to
configure a lower value of MRAI (zero second is a preferred choice
in many deployments) for better convergence, it leads to high CPU
usage for BGP process, even at the times of no network churn.
PATCH
Make the route advertisement event-driven - When routes are added to
peer's sync FIFO, check if the routeadv timer needs to be adjusted (or
started). Conversely, do not arm the routeadv timer unconditionally.
The patch also addresses route announcements during read-only mode
(update-delay). During read-only mode operation, the routeadv timer
is not started. When BGP comes out of read-only mode and all the
routes are processed, the timer is started for all peers with zero
expiry, so that the UPDATEs can be sent all at once. This leads to
(near-)optimal UPDATE packing.
Finally, the patch makes the "max # packets to write to peer socket at
a time" configurable. Currently it is hard-coded to 10. The command is
at the top router-bgp mode and is called "write-quanta <number>". It
is a useful convergence parameter to tweak.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Donald Sharp [Wed, 20 May 2015 00:40:36 +0000 (17:40 -0700)]
bgpd: bgpd-peer-outq.patch
BGP: Show more meaningful outq value in 'show ip bgp summary' output.
'outq' field in 'show ip bgp sum' displays the number of formatted packets
to a peer. Since the route announcement follows an input-buffered pattern
(i.e. adj-rib-out is a separate queue of routes per peer and packets are
formatted from the routes at the time of TCP write), the outq field doesn't
show any interesting data worth watching.
The patch is to display the adj-rib-out queue depth instead.
Donald Sharp [Wed, 20 May 2015 00:40:36 +0000 (17:40 -0700)]
bgpd: bgpd-fix-ipv6-afi-parser-node.patch
BGPd: Make ipv6 unicast/multicast address-family work
In the absence of this patch, attempting to type "address-family ipv6 unicast"
would result in an "Ambiguous command" error and in the case of
"address-family ipv6 multicast", the command would silently fail, without the
prompt dropping into the address-family mode.
The cause is how the parse tree is constructed for ipv6 address family. There
was an error in extract.pl.in script and in vtysh.c files which assumed that
there was only address family ipv6 command, without unicast or multicast and
so the command was failing.
Donald Sharp [Wed, 20 May 2015 00:40:35 +0000 (17:40 -0700)]
zebra: zebra-use-fixed-metric-cost.patch
Zebra: Use a fixed route metric when populating kernel
The route metric is not used by the Linux kernel and is irrelevant to
the forwarding decision made by the kernel. Metric is a parameter used
only by a routing protocol to compute best path(s) and to communicate this
info to its peers. Consequently, there is no value in pushing the metric
provided by a protocol daemon to the kernel.
There is a significant advantage, at least on the Linux kernel, in pushing
a constant metric with a route populated by zebra. The metric is used as a
priority field in the kernel and modifying the metric due to say topology
changes causes multiple routes to be inserted into the kernel, with differing
priorities instead of replacing the existing one. This prevents us from
using replace semantic when a route changes.
So, this patch pushes a constant metric with a route populated by zebra.
Donald Sharp [Wed, 20 May 2015 00:40:34 +0000 (17:40 -0700)]
bgpd: bgpd-table-map.patch
COMMAND:
table-map <route-map-name>
DESCRIPTION:
This feature is used to apply a route-map on route updates from BGP to Zebra.
All the applicable match operations are allowed, such as match on prefix,
next-hop, communities, etc. Set operations for this attach-point are limited
to metric and next-hop only. Any operation of this feature does not affect
BGPs internal RIB.
Supported for ipv4 and ipv6 address families. It works on multi-paths as well,
however, metric setting is based on the best-path only.
IMPLEMENTATION NOTES:
The route-map application at this point is not supposed to modify any of BGP
route's attributes (anything in bgp_info for that matter). To achieve that,
creating a copy of the bgp_attr was inevitable. Implementation tries to keep
the memory footprint low, code comments do point out the rationale behind a
few choices made.
bgp_zebra_announce() was already a big routine, adding this feature would
extend it further. Patch has created a few smaller routines/macros whereever
possible to keep the size of the routine in check without compromising on the
readability of the code/flow inside this routine.
For updating a partially filtered route (with its nexthops), BGP to Zebra
replacement semantic of the next-hops serves the purpose well. However, with
this patch there could be some redundant withdraws each time BGP announces a
route thats (all the nexthops) gets denied by the route-map application.
Handling of this case could be optimized by keeping state with the prefix and
the nexthops in BGP. The patch doesn't optimizing that case, as even with the
redundant withdraws the total number of updates to zebra are still be capped
by the total number of routes in the table.
Donald Sharp [Wed, 20 May 2015 00:40:33 +0000 (17:40 -0700)]
bgpd: bgpd-update-delay.patch
COMMAND:
'update-delay <max-delay in seconds> [<establish-wait in seconds>]'
DESCRIPTION:
This feature is used to enable read-only mode on BGP process restart or when
BGP process is cleared using 'clear ip bgp *'. When applicable, read-only mode
would begin as soon as the first peer reaches Established state and a timer
for <max-delay> seconds is started.
During this mode BGP doesn't run any best-path or generate any updates to its
peers. This mode continues until:
1. All the configured peers, except the shutdown peers, have sent explicit EOR
(End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached
Established is considered an implicit-EOR.
If the <establish-wait> optional value is given, then BGP will wait for
peers to reach establish from the begining of the update-delay till the
establish-wait period is over, i.e. the minimum set of established peers for
which EOR is expected would be peers established during the establish-wait
window, not necessarily all the configured neighbors.
2. max-delay period is over.
On hitting any of the above two conditions, BGP resumes the decision process
and generates updates to its peers.
Default <max-delay> is 0, i.e. the feature is off by default.
This feature can be useful in reducing CPU/network used as BGP restarts/clears.
Particularly useful in the topologies where BGP learns a prefix from many peers.
Intermediate bestpaths are possible for the same prefix as peers get established
and start receiving updates at different times. This feature should offer a
value-add if the network has a high number of such prefixes.
IMPLEMENTATION OBJECTIVES:
Given this is an optional feature, minimized the code-churn. Used existing
constructs wherever possible (existing queue-plug/unplug were used to achieve
delay and resume of best-paths/update-generation). As a result, no new
data-structure(s) had to be defined and allocated. When the feature is disabled,
the new node is not exercised for the most part.
Donald Sharp [Wed, 20 May 2015 00:40:32 +0000 (17:40 -0700)]
bgpd: bgpd-restart-bit-fix.patch
ISSUE:
Quagga BGP doesn't send or use the restart-bit via the Graceful-Restart(GR)
capability. GR capability implementation isn't complete as per the RFC.
PATCH:
Patch uses BGP instance creation as the beginning of the startup period,
and 'restart_time' is taken as the startup period. As a result, BGP will
set the restart bit in the GR capability of the OPEN messages during the
startup period.
As an indication of quagga implementation's capability of sending End-Of-RIB,
helping a restarting neighbor, quagga BGP will now send global GR capability
irrespective of the graceful-restart config in BGP and the address-family
specific GR capability will be sent only if the GR config is present.
Forwarding bit is not set assuming its not preserved.