Donald Sharp [Thu, 27 Aug 2020 19:42:16 +0000 (15:42 -0400)]
zebra: When we get a rib deletion event be smarter
When we get a rib deletion event and we already have
that particular route node in the queue to be reprocessed,
just note that someone from kernel land has done us dirty
and allow it to be cleaned up by normal processing
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 27 Aug 2020 19:00:55 +0000 (15:00 -0400)]
zebra: When shutting down an interface immediately notify about rnh
Imagine a situation where a interface is bouncing up/down.
The interface comes up and daemons like pbr will get a nht
tracking callback for a connected interface up and will install
the routes down to zebra. At this same time the interface can
go down. But since zebra is busy handling route changes ( from pbr )
it has not read the netlink message and can get into a situation
where the route resolves properly and then we attempt to install
it into the kernel( which is rejected ). If the interface
bounces back up fast at this point, the down then up netlink
message will be read and create two route entries off the connected
route node. Zebra will then enqueue both route entries for future processing.
After this processing happens the down/up is collapsed into an up
and nexthop tracking sees no changes and does not inform any upper
level protocol( in this case pbr ) that nexthop tracking has changed.
So pbr still believes the nexthops are good but the routes are not
installed since pbr has taken no action.
Fix this by immediately running rnh when we signal a connected
route entry is scheduled for removal. This should cause
upper level protocols to get a rnh notification for the small
amount of time that the connected route was bouncing around like
a madman.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Jakub Urbańczyk [Thu, 27 Aug 2020 19:41:37 +0000 (21:41 +0200)]
zebra: fix netlink batching
It was wrongly assumed that the kernel is replying in batches when multiple
requests fail. The kernel sends one error message at a time, so we can
simply keep reading data from the socket as long as possible.
Donald Sharp [Thu, 27 Aug 2020 01:38:47 +0000 (21:38 -0400)]
zebra: When we fail, actually note the failure
During testing it was noticed that routes were considered
installed by zebra, but the kernel did not have the route.
Upon close debugging of the rib it was noticed that FRR
was turning a dplane_ctx_route_init into a success and
FRR was now in a bad state.
We were receiving a notification from the kernel that the route was deleted and deciding
that we needed to reinstall it. At that point in time when it got into the dplane
handlers to convert it to the dplane pthread, the dplane decided to drop the request
convert it too a success and not do anything.
This code change removes the conversion from this failure to success and
notifies the upper level about it. After this change the default route
to table 10012 is now properly marked as rejected:
root@mlx-2700-07:mgmt:/var/log/frr# vtysh -c "show ip route table 10012"
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
Renato Westphal [Mon, 24 Aug 2020 18:27:15 +0000 (15:27 -0300)]
isisd: add abiliy to compute the reverse shortest path tree
RFC 7490 says:
"The reverse SPF computes the cost from each remote node to root. This
is achieved by running the normal SPF algorithm but using the link
cost in the direction from the next hop back towards root in place of
the link cost in the direction away from root towards the next hop".
Support for reverse SPF will be necessary later as it's one of the
algorithms used to compute R-LFA/TI-LFA repair paths.
Renato Westphal [Mon, 24 Aug 2020 17:46:36 +0000 (14:46 -0300)]
tests, isisd: add IS-IS SPF unit tests
Now that the IS-IS SPF code is more modular, write some unit tests
for it.
This commit includes a new test program called "test_isis_spf" which
can load any test topology (there are 13 different ones available)
and run SPF on any desired node. In the future this same test program
and topologies will also be used to test reverse SPF and TI-LFA.
The "test_common.c" file contains helper functions used to parse the
topology descriptions from "test_topologies.c" into LSP databases
that can be used as an input to the SPF code.
This commit also introduces the F_ISIS_UNIT_TEST flag which is used
to prevent the IS-IS code from scheduling any event when running
under the context of an unit test.
Renato Westphal [Sun, 23 Aug 2020 03:22:32 +0000 (00:22 -0300)]
isisd: make the SPF code more modular
The goal of modularizing the SPF code is to make it possible for
isisd to run SPF in the behalf of other nodes in the network, which
is going to be necessary later when implementing the R-LFA/TI-LFA
solutions. On top of that, a modularized SPF opens the door for
much needed unit testing.
Summary of the changes:
* Change the isis_spf_preload_tent() function to use the local LSP
as an input (as per the ISO specification) instead of populating
the TENT based on the list of local interfaces;
* Introduce the "isis_spf_adj" structure to represent an SPF
adjacency. SPF adjacencies are inferred from the LSPDB, different
from normal adjacencies formed using IIH messages;
* Introduce the F_SPFTREE_NO_ROUTES flag to control whether the
SPT should create routes or not;
* Introduce the F_SPFTREE_NO_ADJACENCIES flag to specify whether
IS-IS adjacency information is available or not. When running SPF
in the behalf of other nodes, or under the context of an unit test,
no adjacency information will be present.
* On isis_area_create(), move some code around so that the area's isis
backpointer is set as early as possible.
Renato Westphal [Sun, 23 Aug 2020 02:24:06 +0000 (23:24 -0300)]
isisd: introduce command to display IS-IS routes
Introduce the "show isis route" command to display the routes
associated to an SPF tree. Different from the "show ip route" command,
"show isis route" displays the L1 and L2 routes separately (and not
the best routes only).
Renato Westphal [Fri, 21 Aug 2020 00:44:27 +0000 (21:44 -0300)]
isisd: minor cleanup
* Bring back some consts that were removed;
* Replace ALL_LIST_ELEMENTS by ALL_LIST_ELEMENTS_RO whenever
possible;
* Fix some CLI return values;
* Remove some unnecessary initializations.
Renato Westphal [Fri, 21 Aug 2020 00:27:56 +0000 (21:27 -0300)]
isisd: introduce two LSP iteration functions
Iterating over all IP or IS reachability information from a given
LSP isn't a trivial task. That information is scattered throughout
different TLV types, and which ones need to be used depend on
multiple variables (e.g. the SPF tree address family, MT-ID,
etc). This not to mention that an LSP might consist of multiple
fragments.
Introduce the following two LSP iteration function to facilitate
obtaining IP/IS reachability information from a given LSP:
* isis_lsp_iterate_ip_reach()
* isis_lsp_iterate_is_reach()
These functions will be used extensively by the upcoming TI-LFA
code.
Nathan Bahr [Mon, 24 Aug 2020 18:52:51 +0000 (13:52 -0500)]
pimd: fix IGMP querier election
Match by exact address rather than by prefix match to
determine if we generated the IGMPP query. Othwerwise
we will be ignoring IGMP queries coming from other
hosts on the same subnet.
Nathan Bahr [Wed, 19 Aug 2020 19:42:07 +0000 (14:42 -0500)]
pimd: fix IGMP source address on transmit
IGMP queries should contain the source address of the IGMP socket
they are being sent from.
Added binding the IGMP sockets to their specific source, otherwise
interfaces with multiple addresses will send multiple queries using
the same source, which is determined by the kernel.
Nathan Bahr [Wed, 19 Aug 2020 19:26:41 +0000 (14:26 -0500)]
pimd: fix IGMP receive handling
IGMP packets received from a source that does not match the subnet
of any configured addresses on the receive interface should be
ignored.
Also, find and use the correct IGMP socket object for the received
IGMP packet.
Mark Stapp [Tue, 25 Aug 2020 14:52:17 +0000 (10:52 -0400)]
tests: fix router stop logic
Change the public router stop method to always do a two-phase
shutdown - once without waiting and a second time with a wait.
Ordinary callers need to use this approach when stopping routers.
Move the detailed internal details to a private method that tests
should not call directly.
bgpd: reset session if ebgp-multihop is set and no session established
If you configure eBGP on loopbacks, you might miss setting the
ebgp-multihop option. Given that, the session will not be established
because of this. Now, the session is in Active state. When you update
your config afterwards and set the ebgp-multihop option to the
appropriate value, the session will still be in Active state. In fact,
it will be stuck in Active state and only services restart will help.
With this change, when set the ebgp-multihop option and no session was
established, reset the session.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
bgpd: withdraw default route when route-map has no match
If you advertise a default route (via default-originate) only if some
prefix is present in the BGP RIB (route-map specified) and this prefix
becomes unavailable, the default route keeps being advertised.
With this change, when we iterate over the BGP RIB to check if we can
advertise the default route, skip unavailable prefixes.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
David Schweizer [Mon, 24 Aug 2020 16:16:49 +0000 (18:16 +0200)]
bgpd: alias for bgp no shutdown cmd
* Reverted back to using an ALIAS definition for the negated bgp
shutdown command with a concatenated message string.
* Unified cli command descriptions for bgp shutdown commands.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
lib, tools: fix reloading of key sub-context in key chains
When you add a key chain in the RIP configuration file and reload the
configurations via the frr-reload.py script, the script will fail and
the key chain will not appear in the running configuration. The reason
is that frr-reload.py doesn't recognize key as a sub-context.
David Schweizer [Mon, 24 Aug 2020 06:12:16 +0000 (08:12 +0200)]
bgpd: additional no bgp shutdown cli command
* Added a "no bgp shutdown message MSG..." cli command for ease of use
with copy/paste. Because of current limitations with DEFPY/ALIAS and
the message string concatenation, a new command instead of an ALIAS
had to be implemented.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Philippe Guibert [Tue, 22 Oct 2019 07:21:28 +0000 (09:21 +0200)]
bgpd: fill in local ecommunity context with ecom unit length
because the same extended community can be used for storing ipv6 and
ipv4 et communities, the unit length must be stored. do not forget to
set the standard value in bgp evpn.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Mon, 21 Oct 2019 09:12:25 +0000 (11:12 +0200)]
bgpd: fallback proto icmp/v6 to appropriate l3 filter
if match protocol is icmp, then this protocol will be filtered with afi
= ipv4. however, if afi = ipv6, then the icmp protocol will fall back to
icmpv6.
note that this patch has also been done to simplify the policy routing,
as BGP will only handle TCP/UDP/ICMP(v4 or v6) protocols.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Mon, 21 Oct 2019 09:05:44 +0000 (11:05 +0200)]
bgpd: limit policy routing with flowlabel, fragment, and prefix offset
the following 3 options are not supported in current implementation of
policy routing. for that, inform the user that the flowspec entry is
invalid when attempting to use :
- prefix offset with src, or dst ipv6 address ( see [1])
- flowlabel value - limitation due to [0]
- fragment ( implementation not done today).
Philippe Guibert [Thu, 17 Oct 2019 14:11:57 +0000 (16:11 +0200)]
bgpd: support for flowspec interface list per address-family
in addition to ipv4 flowspec, ipv6 flowspec address family can configure
its own list of interfaces to monitor. this permits filtering the policy
routing only on some interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 17 Oct 2019 14:08:16 +0000 (16:08 +0200)]
bgpd: support for bgp ipv6 ext community, and flowspec redirect ipv6
rfc 5701 is supported. it is possible to configure in bgp vpn, a list of
route target with ipv6 external communities to import. it is to be noted
that this ipv6 external community has been developed only for matching a
bgp flowspec update with same ipv6 ext commmunity.
adding to this, draft-ietf-idr-flow-spec-v6-09 is implemented regarding
the redirect ipv6 option.
Practically, under bgp vpn, under ipv6 unicast, it is possible to
configure : [no] rt6 redirect import <IPV6>:<AS> values.
An incoming bgp update with fs ipv6 and that option matching a bgp vrf,
will be imported in that bgp vrf.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Wed, 16 Oct 2019 09:07:41 +0000 (11:07 +0200)]
bgp, zebra: add family attribute to ipset and iptable context
in order to create appropriate policy route, family attribute is stored
in ipset and iptable zapi contexts. This commit also adds the flow label
attribute in iptables, for further usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Wed, 16 Oct 2019 08:05:36 +0000 (10:05 +0200)]
bgpd: support for redirect ipv6 simpson method
this commit supports [0] where ipv6 address is encoded in nexthop
attribute of nlri, and not in bgp redirect ip extended community. the
community contains only duplicate information or not.
Adding to this, because an action or a rule needs to apply to either
ipv4 or ipv6 flow, modify some internal structures so as to be aware of
which flow needs to be filtered. This work is needed when an ipv6
flowspec rule without ip addresses is mentioned, we need to know which
afi is served. Also, this work will be useful when doing redirect VRF.
[0] draft-simpson-idr-flowspec-redirect-02.txt
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Wed, 16 Oct 2019 06:44:20 +0000 (08:44 +0200)]
bgpd, lib: support for flow_label flowspec type
in ipv6 flowspec, a new type is defined to be able to do filtering rules
based on 20 bits flow label field as depicted in [0]. The change include
the decoding by flowspec, and the addition of a new attribute in policy
routing rule, so that the data is ready to be sent to zebra.
The commit also includes a check on fragment option, since dont fragment
bit does not exist in ipv6, the value should always be set to 0,
otherwise the flowspec rule becomes invalid.
Philippe Guibert [Wed, 16 Oct 2019 06:42:30 +0000 (08:42 +0200)]
bgpd: ipv6 flowspec address decoding and validation
as per [0], ipv6 adress format introduces an ipv6 offset that needs to
be extracted too. The change include the validation, decoding for
further usage with policy-routing and decoding for dumping.
Philippe Guibert [Mon, 14 Oct 2019 16:02:22 +0000 (18:02 +0200)]
bgpd: flowspec code support for ipv6
until now, the assumption was done in bgp flowspec code that the
information contained was an ipv4 flowspec prefix. now that it is
possible to handle ipv4 or ipv6 flowspec prefixes, that information is
stored in prefix_flowspec attribute. Also, some unlocking is done in
order to process ipv4 and ipv6 flowspec entries.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Tue, 15 Oct 2019 13:01:39 +0000 (15:01 +0200)]
lib: add family attribute for flowspec prefix structure
to recognize whether a flowspec prefix has been carried out by
ipv4 flowspec or ipv6 flowspec ( actually, the hypothesis is that only
ipv4 flowspec is supported), then a new attribute should contain the
family value: AF_INET or AF_INET6. That value will be further used in
the BGP flowspec code.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Sarita Patra [Fri, 21 Aug 2020 06:33:09 +0000 (23:33 -0700)]
bgpd: Fix BGP session stuck in OpenConfirm state
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Connect for X succeeds and hence moved from idle ->connect with
FD-x.
4. A incoming connection is accepted and a new peer datastructure Y
is created with FD-y moves from idle->Active state.
5. Peer datastercture Y FD-y sends out OPEN and moves to
Active->Opensent state.
6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent->
Openconfirm state.
7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message
and moved from connect->Opensent.
8. For peer datastrcture Y FD-y keep alive is received and it is
moved from OpenConfirm->Established.
9. In this case peer datastructure Y FD-y is a accepted connection
so we try to copy all its parameter to peer datastructure X and
delete Y.
10. During this process TCP connection for the accepted connection
(FD-y) goes down and hence get remote address and port fails.
11. With this failure bgp_stop function for both peer datastrure X
and peer datastructure Y is called.
12. By this time all the parameters include state for datastrcture
for X and Y are exchanged. Peer Y FD-y when it entered this
function had state OpenConfirm still which has been moved to peer
datastrcture X.
13. In bgp_stop it will stop all the timers and take action only if
peer is in established state. Now that peer datastrcture X and Y
are not in established state (in this function) it will simply
close all timers and close the socket and assigns socket for both
the peer datastrcture to -1.
14. Peer datastrcture Y will be deleted as it is a datastrcture created
due to accept of connection where as peer datastrcture X will be held
as it is created with configuration.
15. Now peer datastrcture X now holds a state of OpenConfirm without any
timers running.
16. With this any new incoming connection will never be able to establish
as there is config connection X which is stuck in OpenConfirm.
Fix:
While transferring the peer datastructure Y FD-y (accepted connection)
to the peer datastructure X, if TCP connection for FD-y goes down, then
1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the
state to Idle) and
2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted
since it is an accept connection).
Sarita Patra [Fri, 21 Aug 2020 06:29:08 +0000 (23:29 -0700)]
bgpd: Don't stop hold timer in OpenConfirm State
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Peer datastrcture Y FD-X receives OPEN and moved from Opensent->
Openconfirm state and start the hold timer.
4. In the OpenConfirm state, the hold timer is stopped. So peer X
waits for Keepalive message from peer. If the Keepalive message
is not received, then it will be in OpenConfirm state for
indefinite time.
5. Due to this it neither close the existing connection nor it will
accept any connection from peer.
Fix:
In the OpenConfirm state, don't stop the hold timer.
1. Upon receipt of a neighbor’s Keepalive, the state is moved to
Established.
2. But If the hold timer expires, a stop event occurs, the state
is moved to Idle.
This is as per RFC.
Chirag Shah [Thu, 20 Aug 2020 19:09:53 +0000 (12:09 -0700)]
*: record transaction based on control flag
In case of config rollback is enabled,
record northbound transaction based on a control flag.
The actual frr daemons would set the flag to true via
nb_init from frr_init.
This will allow test daemon to bypass recording
transacation to db.
Mark Stapp [Thu, 20 Aug 2020 18:50:38 +0000 (14:50 -0400)]
lib: zapi nexthop sort fixes
The sorting for zapi nexthops in zapi routes needs to match
the sorting of nexthops done in zebra. Ensure all zapi_nexthop
attributes are included in the sort.
Renato Westphal [Wed, 19 Aug 2020 23:33:40 +0000 (20:33 -0300)]
lib: adapt plugin to use new Sysrepo version
Sysrepo recently underwent a complete rewrite, where some substantial
architectural changes were made (the most important one being the
extinction of the sysrepod daemon). While most of the existing API
was preserved, quite a few backward-incompatible changes [1] were
introduced (mostly simplifications). This commit adapts our sysrepo
northbound plugin to those API changes in order for it to be compatible
with the latest Sysrepo version.
Additional notes:
* The old Sysrepo version is EOL and not supported anymore.
* The new Sysrepo version requires libyang 1.x.