David Lamparter [Wed, 15 Aug 2018 20:59:31 +0000 (22:59 +0200)]
build: non-recursive doc + parallel sphinx
Sphinx actually does work with a parallel build, if the doctree creation
is a separate step (which the other builds will then just read
unmodified.) This can be done with the "dummy" target.
This also adds "-j6" to sphinx-build and adds a "--disable-doc-html"
switch on ./configure to turn on/off building HTML docs separately.
Also, HTML docs are now installed by "make install" to
/usr/share/doc/frr/html.
Signed-off-by: David Lamparter <equinox@diac24.net>
Martin Winter [Mon, 8 Oct 2018 12:32:57 +0000 (05:32 -0700)]
FRRouting Release 6.0
Major Changes since 5.0:
- Staticd: New daemon responsible for management of static routes
- ISISd: Implement dst-src routing as per draft-ietf-isis-ipv6-dst-src-routing
- BFDd: new daemon for BFD (Bidrectional Forwarding Detection). Responsiblei
for notifying link changes to make routing protocols converge faster.
- various bug fixes
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Donald Sharp [Wed, 3 Oct 2018 16:27:57 +0000 (12:27 -0400)]
lib: Include compiler.h as early as is possible in the build
The compiler.h header provides us with some useful macro's
that we are using in the system. We do not know exactly
where the CPP_NOTICE and CPP_WARN macros are used but
they can move around. Place this header early in the
build then.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 17 Sep 2018 13:18:40 +0000 (09:18 -0400)]
zebra: Send correct default vrf tableid for MROUTE stats
So the linux kernel uses the RT_TABLE_MAIN for the table
id used for ip routing. The multicast routing tables use
RT_TABLE_DEFAULT. We changed the internal code of zebra_vrf
a few months back to use RT_TABLE_MAIN as the tableid to
use. This caused the pim sg stats to stop working because
of the kernel bug where it uses a different table
for ip routing and ip multicast.
Put a bit of a special case in to do the right thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 17 Sep 2018 17:58:59 +0000 (13:58 -0400)]
pimd: Actually create vif's in non-integrated config
The startup of a non-integrated config was not properly
allowing for startup to create the vif when we have
not learned about the interface we are trying to configure
at this point in time. Actually notice when we are
trying to create a pimreg device or not to properly
notice when to attempt to create the vif or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Mon, 10 Sep 2018 14:19:03 +0000 (10:19 -0400)]
bgpd: Honor origin change in bgp aggregates
When the origin changed we must honor and update the aggregate
to the peer. This code adds a bit of code to the bgp_aggregate_info_same
code to see if the origin has changed and to indicate that it has.
Fixes: #2993 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 29 Aug 2018 02:45:06 +0000 (22:45 -0400)]
staticd: Fix mixup in vrf translations
When we store the nexthop for ref-counting, keep
track of the nexthop vrf_id as well. This will allow
us to track the nexthop per vrf!
Additionally when we get the callback from zebra about
a nexthop update, iterate over all static routes to
see if the nexthop we are getting a callback is
one we are concerned about.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 28 Aug 2018 12:50:16 +0000 (08:50 -0400)]
pimd: Add some more useful data to debug output
End user was seeing this debug but we are not giving
the user enough information to debug this on his own.
Add a tiny bit of extra information that could point
the user to solving the problem for themselves.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Christian Franke [Sat, 25 Aug 2018 15:50:03 +0000 (17:50 +0200)]
watchfrr: fix global restart
watchfrr needs to handle a SIGCHLD also when it calls a global restart
command. Before this patch, it would lead to the following behavior:
15:44:28: zebra state -> down : unexpected read error: Connection reset by peer
15:44:33: Forked background command [pid 6392]: /usr/sbin/frr.init watchrestart all
15:44:53: Warning: restart all child process 6392 still running after 20 seconds, sending signal 15
15:44:53: waitpid returned status for an unknown child process 6392
15:44:53: background (unknown) process 6392 terminated due to signal 15
15:45:13: Warning: restart all child process 6392 still running after 40 seconds, sending signal 9
15:45:33: Warning: restart all child process 6392 still running after 60 seconds, sending signal 9
15:45:53: Warning: restart all child process 6392 still running after 80 seconds, sending signal 9
15:46:13: Warning: restart all child process 6392 still running after 100 seconds, sending signal 9
15:46:33: Warning: restart all child process 6392 still running after 120 seconds, sending signal 9
15:46:53: Warning: restart all child process 6392 still running after 140 seconds, sending signal 9
This is obviously incorrect and can be fixed by comparing the pid to
the global restart object as well.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Donald Sharp [Sat, 25 Aug 2018 00:42:45 +0000 (20:42 -0400)]
staticd: refcount the nht add/removal
When we add / remove a nexthop that we need to track,
keep track of the number of times we have done this
for each nexthop. Consequently keep track of the
number of available nexthops, so that we can
just install new routes when we get one
that uses a pre-existing nexthop. Deletion of
nexthops is done on refcount going to 0.
Removal of routes is handled elsewhere for removal.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 23 Aug 2018 20:05:02 +0000 (16:05 -0400)]
zebra: When registering a nexthop, we do not always need to re-eval
The code prior to this change, was allowing clients to register
for nexthop tracking. Then zebra would look up the rnh and
send to that particular client any known data. Additionally
zebra was blindly re-evaluating the rnh for every registration.
This leads to interesting behavior in that all people registered
for that nexthop will get callbacks even if nothing changes.
Modify the code to know if we have evaluated the rnh or not
and if so limit the re-evaluation to when absolutely necessary
This is of particular importance to do because of nht callbacks
for protocols cause those protocols to do not insignificant
work and as more protocols are registering for nht callbacks
we will cause more work than is necessary.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
David Lamparter [Sat, 25 Aug 2018 00:12:42 +0000 (02:12 +0200)]
doc/user: add protocols vs. platform table
A nicely-formatted colorful table of all our daemons and target OS'.
Based off & intended to replace / extend
https://github.com/FRRouting/frr/wiki/Features-and-Kernel-Support
Signed-off-by: David Lamparter <equinox@diac24.net>
Chirag Shah [Fri, 24 Aug 2018 22:15:36 +0000 (15:15 -0700)]
ospfd: interface speed change during intf add
The problem is seen where speed mismatch caused ECMP route
not being reflected with correct number paths (NHs).
During cold boot, some interface speed updated by zebra as
part of one shot timer and triggers interface add to clients.
In this case, ospf already have created interface (bond interface),
but speed was not updated, trigger to do interface speed change
as part of interface add, which will trigger all Router LSA to
use updated speed into cost calculation.
Ticket:CM-22170
Testing Done:
Bring up CLOS config with Spine and leafs. Leaf have CLAG pair,
with same VRR ip address.
At spine one of the bond connecting to leaf node was having
higher speed than the paired device, With this fix, at spine (DUT)
bond interface speed is equal from all peer nodes.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Donald Sharp [Fri, 24 Aug 2018 19:21:04 +0000 (15:21 -0400)]
bgpd: Fix CONFDATE to 2019 for a couple of items.
While perusing CONFDATE I noticed that we had a couple
CONFDATE 201805, which we were not picking up( for other
reasons and fixed in a different PR ). But upon investigation
of these I noticed that the commits where in 201805, so these
CONFDATES should be in 2019
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 24 Aug 2018 14:56:15 +0000 (10:56 -0400)]
doc, lib, zebra: Remove deprecated encode and decode functionality
The ZEBRA_IPV4_ROUTE_[ADD|DELETE] and ZEBRA_IPV6_ROUTE_[ADD|DELETE] functionality
has been deprecated for a year now, let's remove this code from the system.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 24 Aug 2018 14:49:20 +0000 (10:49 -0400)]
zebra: Remove unmaintained and uncompilable code
The zebra/client_main.c code is not being maintained or used.
Remove from system. Especially since the encode/decode
zapi functionality it `purports` to be testing is deprecated
and now being removed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Quentin Young [Wed, 22 Aug 2018 20:05:04 +0000 (20:05 +0000)]
bgpd: fix rpki exit command
If a command returns a nonzero exit status and VTYSH has a corresponding
command, VTYSH will skip executing its own version. If this happens in a
command that changes CLI nodes we get node desynchronization.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Donald Sharp [Thu, 23 Aug 2018 00:59:46 +0000 (20:59 -0400)]
lib: Limit depth of unused thread list
The master->unused list was unbounded during normal operation.
A full BGP feed on my machine left 11k threads on the unused
list, taking up over 2mb of data. This seemed a bit excessive,
reduce to a limit of 10.
Also fix a crash that this exposed where we assumed that a thread
structure was not deleted.
Future committers can make this configurable? or modify
the value to something better for their system. I am
dubious of the value of this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 16 Aug 2018 17:59:27 +0000 (13:59 -0400)]
lib: Remove default case statement from a enum driven switch
We are using a enum to drive a switch statement and we have
a default case statement that can never be entered because
we know all the enum states have been covered. Remove it
from the code as that it cannot happen.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 16 Aug 2018 17:51:13 +0000 (13:51 -0400)]
lib: Remove smux option for snmp
The smux.c code has not been able to compile for 2+ years
and no-one has noticed. Additionally net-snmp has marked
smux integration as deprecated for quite some time as well.
Since no-one has noticed and it's been broken and smux integration
is deprecated let's just remove this from the code base.
From looking at the code, it sure looks like SNMP could use
a decent cleanup.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 16 Aug 2018 16:21:25 +0000 (12:21 -0400)]
bgpd: convert zlog_warns to debugs or errors
Several zlog_warns were being used to tell the end
user that bgp had detected a bug. These all look like information
added during development that can be noted as debugs or logged
as an error situation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Thu, 23 Aug 2018 01:05:29 +0000 (21:05 -0400)]
lib: Seperate out Poll data from thread memory statistics
We were storing Poll data for the read and write
memory information in MTYPE_THREAD, so a show run
would not be able to show actual amount of memory
associated with the `struct thread`.
Remove unnecessary NULL checks on malloc.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Chirag Shah [Fri, 17 Aug 2018 20:38:06 +0000 (13:38 -0700)]
zebra: mark router flag for remote neigh update
Handle Remote Neigh entry state change from Router to Host.
Remote MAC-IP update may not continue EVPN NA Extended community,
Zebra need to accomodate if router_flag change for existing neigh
and install with or without Router Flag (R-bit).
Testing:
Have locally run MAC/IP (neigh entry) with R-bit set,
Checke on remote VTEP 'show bgp evpn route ...mac ip' and
'show evpn arp-cache ...' contians router flag.
Change host to remove R-bit, which locally learnt entry removes
Router flag. This results in remote vtep to remove R-bit from
neigh entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Chirag Shah [Wed, 15 Aug 2018 20:32:03 +0000 (13:32 -0700)]
zebra: check router_flag change for neigh update
Neigh update can have router_flag change, from unset to set and
viceversa. This is the case where MAC, IP and VLAN are same but
entry's flag moved from R to not R bit and reverse case.
Router flag change needs to trigger bgpd to inform all evpn peers
to remove from the evpn route.
Testing Done:
Send GARP with and without R bit from host and validate neigh entry
and evpn neigh and mac-ip route entry in zebra and bgpd.
Check Peer VTEP evpn route entry where router flag is (un)set.
With R-bit
Route [2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
VNI 1001
Imported from
27.0.0.16:5:[2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
4435 5551
27.0.0.16 from MSP1(uplink-1) (27.0.0.9)
Origin IGP, valid, external, bestpath-from-AS 4435, best
Extended Community: RT:5551:1001 ET:8 ND:Router
Flag
AddPath ID: RX 0, TX 1261
Last update: Wed Aug 15 20:52:14 2018
Without R-bit
Route [2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
VNI 1001
Imported from
27.0.0.16:5:[2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
4435 5551
27.0.0.16 from MSP2(uplink-2) (27.0.0.10)
Origin IGP, valid, external, bestpath-from-AS 4435, best
Extended Community: RT:5551:1001 ET:8
AddPath ID: RX 0, TX 1263
Last update: Wed Aug 15 20:53:10 2018
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Chirag Shah [Tue, 14 Aug 2018 00:07:22 +0000 (17:07 -0700)]
zebra: mark router flag for neigh update
The neigh update can come prior to mac add update.
In this case, the mac will be auto created for the vni.
set router flag to local neigh update for mac with auto flag.
The neigh update will be informed to bgpd once local mac is learnt.
Unset router flag if the neigh update comes without the router flag
for an existing neigh entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Chirag Shah [Wed, 1 Aug 2018 01:45:39 +0000 (18:45 -0700)]
bgpd: check existing l3vni for any l2vni creation
Scan all bgp vrf instances and respective L3VNI against the VNI which is being configured.
Ticket:CM-21859
Testing Done:
Configure l3vni,
try to configure same vni as l2vni under router bgp, address-family
l2vpn evpn.
The configuration is rejected.
show evpn vni
VNI Type VxLAN IF # MACs # ARPs # Remote VTEPs Tenant VRF
4001 L3 vx-4001 0 0 n/a vrf1
root [Wed, 22 Aug 2018 23:08:26 +0000 (16:08 -0700)]
bgpd: Fix memory leak show ip bgp json
Root Cause: In the function bgp_show_table(), we are creating a
json object and a json array with the same name as “json_paths”.
First it will create a json object variable "json_paths" pointing
to the memory allocated for the json object. Then it will create
a json array for each bap node rn (if rn->info is available) with
the same name as json_paths. Because of this, json_paths which was
pointing to the memory allocated for the json object earlier, now
will be overwritten with the memory allocated for the json array.
As per the existing code, at the end of each iteration loop of bgp
node, it will deallocate the memory used by the json array and
assigned NULL to the variable json_paths. Since we don’t have the
pointer pointing to the memory allocated for json object, will be
not able to de-allocate the memory, which is a memory leak here.
Fix: Removing this json object since it is never getting used in
this function.
Testing: Reproduced the memory leak with valgrind.
With the fix, memory leak gets resolved and checked with valgrind.
David Lamparter [Wed, 22 Aug 2018 15:39:12 +0000 (17:39 +0200)]
doc/developer: logging guide
This roughly outlines when to use which logging function. It's
certainly something to have to point people to, so they get things nice
and right - and so we get at least somewhat consistent behaviour for the
user.
Signed-off-by: David Lamparter <equinox@diac24.net>
In some situations rtrlib does not release the locks for its internal
data structures before calling a callback. This can lead to deadlocks
when a lot of routes must be revalidated because the sync socket buffer
will fill up and block the rtrlib thread. The bgpd main thread then
waits for rtrlibs internal locks to be released indefinitely.
This is fixed by using nonblocking sockets instead of blocking ones and
setting a flag to revalidate everything, if it would block.
kssoman [Wed, 22 Aug 2018 12:00:15 +0000 (05:00 -0700)]
bgpd : Change of options in redistribute command does not get applied
* Added parameter in bgp_redistribute_set() to indicate change
in redistribute option
* If there is change, call bgp_redistribute_unreg() to withdraw routes