Use dh_installinit capabilities to install frr.tmpfile
The debian/frr.conf was manually installed as systemd-tmpfiles configuration,
but the dh_installinit now has capability to install it automatically if named
debian/frr.tmpfile.
Donald Sharp [Thu, 24 Sep 2020 11:52:20 +0000 (07:52 -0400)]
lib: Tell the compiler we don't care about the return code
When calling yang_snodes_iterate_subtree we don't care about
the return code. So explicitly say we don't care so that
SA tools can be on the same page as us.
Donald Sharp [Thu, 24 Sep 2020 11:42:51 +0000 (07:42 -0400)]
zebra: Don't ignore setsockopt return
When attempting to limit the amount of data sent from the kernel
to FRR, some kernels we can run against may not have this ability
in which case the setsockopt will fail. Notice that in the log.
This problem was reported by the sanitizer -
=================================================================
==24764==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000115c8 at pc 0x55cb9cfad312 bp 0x7fffa0552140 sp 0x7fffa0552138
READ of size 8 at 0x60d0000115c8 thread T0
#0 0x55cb9cfad311 in zebra_evpn_remote_es_flush zebra/zebra_evpn_mh.c:2041
#1 0x55cb9cfad311 in zebra_evpn_es_cleanup zebra/zebra_evpn_mh.c:2234
#2 0x55cb9cf6ae78 in zebra_vrf_disable zebra/zebra_vrf.c:205
#3 0x7fc8d478f114 in vrf_delete lib/vrf.c:229
#4 0x7fc8d478f99a in vrf_terminate lib/vrf.c:541
#5 0x55cb9ceba0af in sigint zebra/main.c:176
#6 0x55cb9ceba0af in sigint zebra/main.c:130
#7 0x7fc8d4765d20 in quagga_sigevent_process lib/sigevent.c:103
#8 0x7fc8d4787e8c in thread_fetch lib/thread.c:1396
#9 0x7fc8d4708782 in frr_run lib/libfrr.c:1092
#10 0x55cb9ce931d8 in main zebra/main.c:488
#11 0x7fc8d43ee09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
#12 0x55cb9ce94c09 in _start (/usr/lib/frr/zebra+0x8ac09)
=================================================================
Donald Sharp [Wed, 23 Sep 2020 17:04:20 +0000 (13:04 -0400)]
zebra: Ensure that message received from mlag will fit
If we receive a message that is greater than our buffer
size we are in a situation where both the read and write
buffers are fubar'ed beyond the end. Assert when we notice
this fact.
Ticket: CM-31576 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Wed, 23 Sep 2020 16:26:13 +0000 (12:26 -0400)]
zebra: modify mlag code to only need 1 stream when generating data
The normal pattern of writing the type/length at the beginning
of the packet was not being quite followed. Modify the mlag
code to respect the proper way of doing things and get rid
of a stream_new and copy.
zebra: stop neigh hold timer when the neigh is deleted
The neigh hold timer was firing after the neigh was deleted resulting
in the following crash -
[
at ./zebra/zebra_evpn_neigh.h:155
at zebra/zebra_evpn_neigh.c:447
at lib/thread.c:1578
at zebra/main.c:488
]
zebra: changes for configuring mac and neigh holdtime
When an ES peer withdraws a MAC-IP route we hold the entry for N seconds
to allow an external daemon (neighmgr) to establish host reachability
independent of the peer. Add config commands to allow the user to set
this holdtime (N).
Donald Sharp [Wed, 23 Sep 2020 00:47:33 +0000 (20:47 -0400)]
zebra: Move debug information gathering to inside guard
Let's not make the entire `depend_finds` function pay
for the data gathering needed for the debug. There
are numerous other places in the code that check
the NEXTHOP_FLAG_RECURSIVE and do the same output.
Donald Sharp [Fri, 18 Sep 2020 19:47:27 +0000 (15:47 -0400)]
lib, zebra: Add ability to read kernel notice of TRAP/OFFLOAD
The linux kernel is getting RTM_F_TRAP and RTM_F_OFFLOAD for
kernel routes that have an underlying asic offload. Write the
code to receive these notifications from the linux kernel and
to store that data for display about the routes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 18 Sep 2020 19:41:19 +0000 (15:41 -0400)]
zebra: Add basic knowledge of asic offload available
Some linux kernels are starting to support the idea of knowledge
about the underlying asic. Add a boolean that we can set/unset
to track whether or not we think the router has this functionality
available.
Donald Sharp [Wed, 9 Sep 2020 03:59:18 +0000 (23:59 -0400)]
*: Remove solaris from FRR
The Solaris code has gone through a deprecation cycle. No-one
has said anything to us and worse of all we don't have any test
systems running Solaris to know if we are making changes that
are breaking on Solaris. Remove it from the system so
we can clean up a bit.
Igor Ryzhov [Mon, 21 Sep 2020 12:35:56 +0000 (15:35 +0300)]
lib: fix regcomp error processing
* use actual error code instead of "false"
* add missing new line
Before:
```
nfware# show interface | include (a]
% Regex compilation error: Success% Bad regexp '(a]'
% Unknown command: show interface | include (a]
```
After:
```
nfware# show interface | include (a]
% Regex compilation error: Unmatched ( or \(
% Bad regexp '(a]'
% Unknown command: show interface | include (a]
```
zebra: simplify and optimize vrf display in show ip route
In all outputs (text and json): simplify and optimize the vrf name
display, use the vrf_id_to_name() handler.
Note: vrf_id_to_name() has a safeguard system that prevents from
crashing when the vrf cannot be found because it changed in some
(unexpected) manner, it returns "n/a".
Note: "vrf n/a" will now be displayed instead of "vrf UNKNOWN" in this
case, like in most other frr components.
This safeguard was missing for show ip route json, so this
optimization also fixes a potential crash.
Variable "show ip route" commands invoke the same helper
(do_show_ip_route), potentially several times.
When asking to dump a non-default vrf, all vrfs or all tables, the
output is messy, the header summarizing abbreviations is repeated
several times, excess line feeds appear, the default table of default
VRF is concatenated to the previous table output...
Normalize the output:
- whatever the case, display the common header at most once, if there
is at least an entry to dump.
- when using a "vrf all" or "table all" command, prepend a line with
the VRF and table (even for the default vrf or table).
- when dumping a specific vrf or table, prepend a line with the VRF
and table.
Example (vrf all)
=================
router# show ip route vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:24:09
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:24:09
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:00:26
VRF private:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:00:29
C>* 10.125.0.0/24 is directly connected, loop0, 00:00:42
Example (main vrf)
==================
router# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:24:41
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:24:41
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:00:58
Example (specific vrf)
======================
router# show ip route vrf private
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF private:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:01:23
C>* 10.125.0.0/24 is directly connected, loop0, 00:01:36
Example (all tables)
====================
router# show ip route table all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:01:51
VRF main table 254:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:25:34
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:25:34
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:01:51
Example (all vrf, all table)
============================
router# show ip route table all vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:02:15
VRF main table 254:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:25:58
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:25:58
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:02:15
VRF private table 254:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:02:18
C>* 10.125.0.0/24 is directly connected, loop0, 00:02:31
Example (specific table)
========================
router# show ip route table 200
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:05:26
would end up leaving the 4.4.4.4/32 address on the interface under
freebsd.
This all boils down to the fact that the interface is not
considered connected yet we have a destination. If the
destination is the same and we are not connected ignore
it on freebsd.
I am sure there are other fun scenarios that someone
will have to squirrel out.
bgpd: Implement BGP-wide configuration for graceful shutdown
Add support for a BGP-wide setting to enter and exit graceful shutdown.
This will apply to all BGP peers across all BGP instances. Per-instance
configuration is disallowed if the BGP-wide setting is in effect.
1. Ospf dead-interval will be set as 4 times of hello-interval, incase
if it is not set by using "ip ospf dead-interval <dead-val>".
2. On resetting hello-interval using "no ip ospf hello-interval" the
dead interval and hello due will be changed accordingly.
Donald Sharp [Fri, 18 Sep 2020 11:14:55 +0000 (07:14 -0400)]
lib: Remove debug associated with vrf_get
The vrf_get function is called throughout the code base
so much so that when you turn on vrf debugging it eclipses
everything else to a degree that is completely unreasonable.
Quentin Young [Thu, 17 Sep 2020 19:46:55 +0000 (15:46 -0400)]
tools: fix vtysh failure error handling
Based on the current code, I think the intent was to gracefully handle
vtysh failures and print a useful error message. Barriers in the way of
that:
- Despite reading the results of subprocess.communicate(), there won't
be anything there, because we aren't passing subprocess.PIPE as stdin
and stderr when calling subprocess.Popen()
- Despite catching subprocess.TimeoutExpired, if we were to actually hit
this case frr-reload.py would just crash because it's calling
.communicate() on an unbound process variable, probably a copy-paste
error
- Aside from that, building a kwargs dict to pass to a function that
contains something if something else is not None and nothing if it is,
is pointless when we could just pass the thing itself
Net result is that if vtysh fails to read an frr.conf due to syntax
errors, instead of crashing with a traceback, we actually handle the
error condition, log the problem and vtysh's output, and exit. Actually
we were printing the failed line just by chance because stderr wasn't
captured from the subprocess and I guess showed up as part of systemd's
error capturing or something, but the traceback did a good job of
obscuring that with useless noise.
Old:
frrinit.sh[32183]: * Started watchfrr
frrinit.sh[32183]: line 20: % Unknown command: eee
frrinit.sh[32183]: Traceback (most recent call last):
frrinit.sh[32183]: File "/usr/lib/frr/frr-reload.py", line 1316, in <module>
frrinit.sh[32183]: newconf.load_from_file(args.filename)
frrinit.sh[32183]: File "/usr/lib/frr/frr-reload.py", line 231, in load_from_file
frrinit.sh[32183]: file_output = self.vtysh.mark_file(filename)
frrinit.sh[32183]: File "/usr/lib/frr/frr-reload.py", line 146, in mark_file
frrinit.sh[32183]: % (child.returncode, stderr))
frrinit.sh[32183]: __main__.VtyshException: vtysh (mark file) exited with status 2:
frrinit.sh[32183]: None
New:
frrinit.sh[30090]: * Started watchfrr
frrinit.sh[30090]: vtysh failed to process new configuration: vtysh (mark file) exited with status 2:
frrinit.sh[30090]: line 20: % Unknown command: eee
Quentin Young [Thu, 17 Sep 2020 16:38:12 +0000 (12:38 -0400)]
bgpd: rename bgp_fsm_event_update
This function is poorly named; it's really used to allow the FSM to
decide the next valid state based on whether a peer has valid /
reachable nexthops as determined by NHT or BFD.
In a topology like R1 -- R2 -- R5, with R2 being NSSA ABR and R5 being
ASBR redistributing external routes, the ABR R2 will translate type-7
LSA into type-5 and advertise to the backbone. In the current implementation
R2 is also advertising a type-4 LSA when there is no need.
RFC 3101: "...NSSA's border routers never originate Type-4 summary-LSAs
for the NSSA's AS boundary routers, since Type-7 AS-external-LSAs are
never flooded beyond the NSSA's border..."