Philippe Guibert [Thu, 11 Oct 2018 16:37:01 +0000 (18:37 +0200)]
bgpd: allow vrf validity and bgp vrf import/export, when zebra is off
if zebra is not started, then vrf identifiers are not available. This
prevents import/exportation to be available. This commit permits having
import/export available, even when zebra is not started.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Mitch Skiba [Wed, 9 May 2018 23:10:02 +0000 (23:10 +0000)]
bgpd: Re-use TX Addpath IDs where possible
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Mitch Skiba [Thu, 17 May 2018 19:06:08 +0000 (19:06 +0000)]
lib: Implement an allocator for 32 bit ID numbers
This commit introduces lib/id_alloc, which has facilities for both an ID number
allocator, and less efficient ID holding pools. The pools are meant to be a
temporary holding area for ID numbers meant to be re-used, and are implemented
as a linked-list stack.
The allocator itself is much more efficient with memory. Based on sizeof
values on my 64 bit desktop, the allocator requires around 155 KiB per
million IDs tracked.
IDs are ultimately tracked in a bit-map split into many "pages." The
allocator tracks a list of pages that have free bits, and which sections
of each page have free IDs, so there isn't any scanning required to find
a free ID. (The library utility ffs, or "Find First Set," is generally a
single CPU instruction.) At the moment, totally empty pages will not be
freed, so the memory utilization of this allocator will remain at the
high water mark.
The initial intended use case is for BGP's TX Addpath IDs to be pulled
from an allocator that tracks which IDs are in use, rather than a free
running counter. The allocator reserves ID #0 as a sentinel value for
an invalid ID numbers, and BGP will want ID #1 reserved as well. To
support this, the allocator allows for IDs to be explicitly reserved,
though be aware this is only practical to use with low numbered IDs
because the allocator must allocate pages in order.
Rafael Zalamena [Sat, 3 Nov 2018 22:08:33 +0000 (19:08 -0300)]
bfdd: fix BGP unnumbered peer setup
The session key uses the scope id to figure out which interface we are
using with that link-local address, so if we don't set it when
registering a session we'll end up with multiple IPv6 sessions.
Olivier Dugeon [Fri, 26 Oct 2018 17:12:41 +0000 (19:12 +0200)]
OSPF: Add support to multi-area to Router Info.
Router Information needs to specify the area ID when flooding scope is set to
AREA. However, this authorize only one AREA. Thus, Area Border Router (ABR) are
unable to flood Router Information Opaque LSA in all areas they are belongs to.
The path implies that the area ID is no more necessary for the command
'router-info area'. It remains suported for compatibility, but mark as
deprecated. Documentation has been updated accordingly.
Donald Sharp [Tue, 6 Nov 2018 20:55:36 +0000 (15:55 -0500)]
bgpd: Late registration of Extended Nexthop should allow RA's to happen
When we have a late registration of the Extended Nexthop capability
for BGP and the peer already has nexthop information stored, go
through and enable RA on the important interfaces.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
watchfrr: Echo statements are blocking the execution of frr script
1) Certain echo statements present in the script before/after SSD process
restart are causing the FRR script to hang. This is breaking the frr script
functionality for start/stop/restart. Removed such echo statements.
Tests:
1. Multiple start, stop, restart
2. Multiple restarts/kill of same process.
Signed-off-by: Sri Mohana Singamsetty <msingamsetty@vmware.com>
Renato Westphal [Sat, 3 Nov 2018 17:27:33 +0000 (15:27 -0200)]
doc: update libyang build instructions
The --with-yangmodelsdir and --with-libyang-pluginsdir build-time options
pertain to FRR so they shouldn't be placed along with the libyang build
instructions. Move these instructions to where they belong to avoid
confusion.
Donald Sharp [Tue, 30 Oct 2018 19:12:07 +0000 (15:12 -0400)]
pimd: Send 1 on all systems for MRT_INIT
When sending a sockoption for MRT_INIT, *bsd requires that
the data passed in must be 1. While linux does not, the
code was sending in a positive value that was causing issues
on *bsd of protocol not supported.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 5 Oct 2018 15:31:29 +0000 (11:31 -0400)]
bgpd: Allow registration of nexthops after zebra connection
If we attempt to register nexthops before we have the zebra
connection, they will not be installed. After we have noticed
that we are up, re-install them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).
Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
will NOT be accepted if the remote seq was not reset in the previous step.
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit - f32ea5c07 - zebra: act on kernel notifications for remote neighbors
The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh
The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.
If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP. If its own route is not the best one, it
will withdraw the route.
]
To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.
Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.
If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.
At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.
To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.