Igor Ryzhov [Wed, 14 Oct 2020 20:01:49 +0000 (23:01 +0300)]
isisd: fix check for area-tag modification
Interface area-tag is not supposed to be modified once defined, but the
necessary check is currently broken, because the circuit is never in
init_circ_list if the area-tag is already configured for the interface.
Donald Sharp [Tue, 13 Oct 2020 12:16:15 +0000 (08:16 -0400)]
ospfd: Prevent crash if transferring config amongst instances
If we enter:
int eth0
ip ospf area 0
ip ospf 10 area 0
!
This will crash ospf. Prevent this from happening.
OSPF instances:
a) Cannot be mixed with non-instance
b) Are their own process.
Since in multi-instance world ospf instances are their own process,
when an ospf processes receives an instance command we must remove
our config( if present ) and allow the new config to be active
in the new process. The problem here is that if you have not
done a `router ospf` above the lookup of the ospf pointer will
fail and we will just crash. Put some code in to prevent a crash
in this case.
Igor Ryzhov [Tue, 13 Oct 2020 11:03:42 +0000 (14:03 +0300)]
ospfd: fix "no ip ospf area"
This commit fixes the following behavior:
```
nfware(config)# interface enp2s0
nfware(config-if)# ip ospf area 0
nfware(config-if)# no ip ospf area 0
% [ospfd]: command ignored as it targets an instance that is not running
```
We should be able to use the command without configuring the instance.
Igor Ryzhov [Tue, 20 Oct 2020 19:43:31 +0000 (22:43 +0300)]
ospf6d: fix crash on message receive
OSPF6 daemon starts listening on its socket and reading messages right
after the initialization before the ospf6 router is created. If any
message is received, ospf6d crashes because ospf6_receive doesn't
NULL-check ospf6 pointer.
Fix this by opening the socket and reading messages only after the
creation of ospf6 router.
Mark Stapp [Fri, 16 Oct 2020 21:37:09 +0000 (17:37 -0400)]
zebra: support multiple connected subnets on an interface
[7.5 version]
We support configuration of multiple addresses in the same
subnet on a single interface: make sure that zebra supports
multiple instances of the corresponding connected route.
Donald Sharp [Fri, 16 Oct 2020 17:51:52 +0000 (13:51 -0400)]
zebra: Fix use after free in debug path
When zebra is running with debugs turned on there
is a use after free reported by the address sanitizer:
2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing
2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3
=================================================================
==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8
READ of size 8 at 0x608000011d28 thread T0
#0 0x555555705ab5 in re_list_const_first zebra/rib.h:222
#1 0x555555705b54 in re_list_first zebra/rib.h:222
#2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248
#3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
#4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
#5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
#6 0x7ffff7450e9c in thread_call lib/thread.c:1581
#7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
#8 0x55555561a578 in main zebra/main.c:455
#9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
#10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429)
0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78)
freed by thread T0 here:
#0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f)
#1 0x7ffff739ccad in qfree lib/memory.c:129
#2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746
#3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240
#4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245
#5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
#6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
#7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
#8 0x7ffff7450e9c in thread_call lib/thread.c:1581
#9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
#10 0x55555561a578 in main zebra/main.c:455
#11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
#0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037)
#1 0x7ffff739cb98 in qcalloc lib/memory.c:110
#2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515
#3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576
#4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607
#5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012
#6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049
#7 0x55555571788b in rib_add zebra/zebra_rib.c:3327
#8 0x5555555e584a in connected_up zebra/connected.c:254
#9 0x5555555e42ff in connected_announce zebra/connected.c:94
#10 0x5555555e4fd3 in connected_update zebra/connected.c:195
#11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340
#12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213
#13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350
#14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941
#15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402
#16 0x7ffff7450e9c in thread_call lib/thread.c:1581
#17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
#18 0x55555561a578 in main zebra/main.c:455
#19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first
This is happening because we are using the dest pointer after a call into
rib_gc_dest. In process_subq_route, we call rib_process() and if the
dest is deleted dest pointer is now garbage. We must reload the
dest pointer in this case.
Trey Aspelund [Mon, 12 Oct 2020 19:39:11 +0000 (15:39 -0400)]
bgpd: fix show bgp neighbor routes for labeled-unicast
bgp_show_neighbor_route() was rewriting safi from LU to uni
before checking if the peer was enabled for LU. This resulted
in the peer's address-family check looking for unicast, which
would always fail for LU peers since unicast + LU are
mutually-exclusive AFIs.
This moves this safi reassignment after the peer AFI check,
ensuring that the peer's address-family check looks for LU
while the call to bgp_show() still uses uni.
spine01# show bgp ipv4 unicast neighbors 1.1.1.1 routes
% No such neighbor or address family
spine01# show bgp ipv4 labeled-unicast neighbors 1.1.1.1 routes
% No such neighbor or address family
after:
spine01# show bgp ipv4 unicast neighbors 1.1.1.1 routes
% No such neighbor or address family
spine01# show bgp ipv4 label neighbors 1.1.1.1 routes
BGP table version is 1, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 11.11.11.11/32 1.1.1.1 0 0 1 i
Displayed 1 routes and 1 total paths
Donald Sharp [Mon, 12 Oct 2020 14:36:37 +0000 (10:36 -0400)]
bgpd: Correctly calculate threshold being reached
if (pcout > (pcount * peer->max_threshold[afi][safi] / 100 ))
is always true. So the very first route received will always
trigger the warning. We actually want the warning to happen
when we hit the threshold.
Igor Ryzhov [Fri, 9 Oct 2020 12:14:58 +0000 (15:14 +0300)]
rip(ng)d: fix interfaces cleaning
rip(ng)d_instance_disable unlinks the vrf from the instance which means
that rip(ng)_interfaces_clean never works, because rip(ng)->vrf is
always NULL there. This leads to the crash #6477.
Clean interfaces before disabling the instance to fix the issue.
vdhingra [Fri, 9 Oct 2020 16:23:14 +0000 (09:23 -0700)]
staticd: To set the default value of blackhole type correctly
When nexthop is allocated, default value of blockhole type
was not getting set, this leads to below problem. The default
value should be in-sync with the deafult value in yang model.
c t
ip route 131.1.1.0/24 Null0
do show running-config
...
!
ip route 131.1.1.0/24 blackhole
!
end
Igor Ryzhov [Thu, 8 Oct 2020 16:23:08 +0000 (19:23 +0300)]
isisd: fix incorrect vrf lookups
Lookup in C_STATE_NA must be made before the new circuit creation, or it
will be leaked if the isis instance is not found. All other lookups are
unnecessary - we just need to remember the previously used instance.
Martin Buck [Tue, 29 Sep 2020 21:07:40 +0000 (23:07 +0200)]
ospf6d: Fix flooding of old copies of self-originated LSAs
When receiving old copies (e.g. originated before the local ospf6d was
restarted) of supposedly self-originated LSAs which we previously tried to
flush from the network (by setting them to MaxAge), neither flood them nor
add them to our LSDB. Instead, keep the MaxAge version until we actually
(re-)originate them.
Possible fix for #7030. Testcase in #7168
(tests/topotests/ospf6-dr-no-netlsa-bug7030).
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Donald Sharp [Thu, 1 Oct 2020 18:58:37 +0000 (14:58 -0400)]
zebra: Make connected routes their own entry on the meta_q
During quick ifdown / ifup events from the linux kernel there
exists a situation where a prefix that has both a kernel route
and a static route can queued up on the meta-q. If the static
route happens to point at a connected route for nexthop resolution
and we receive a series of quick up/down events *after* the
static route and kernel route are queued up for rib reprocessing.
Since the static route and kernel route are queued on meta-q 1
and the connected route is also on meta-q 1 there exists a situation
where the connected route will be resolved after the static route
fails to resolve, leaving the static route in a unresolved state.
Add a new queue level and put connected routes on their own level,
since they are the fundamental building blocks of pretty much
all the other routes.
Donald Sharp [Wed, 30 Sep 2020 21:55:44 +0000 (17:55 -0400)]
zebra: When processing route_entries ignore unusable routes
When zebra is processing routes to determine what to send
to the rib, suppose we have two routes (a) a route processed
earlier that none of it's nexthops were active and (b)
a route that has good nexthops but has a worse admin distance.
rib_process, would not relook at (a)'s nexthops because
the ROUTE_ENTRY_CHANGED flag was not true and it would
win when compared to (b) because it's admin distance
was better, leaving us with a state where we would
attempt and fail to install route (a) because it
was not valid.
Modify the code to consider the number of nexthops
we have as a determiner if we can use the route.
Donald Sharp [Wed, 30 Sep 2020 21:26:02 +0000 (17:26 -0400)]
zebra: Prevent uninstall attempts when new entry is not happy
In rib_process_update_fib, the function is sent two route entries
the old ( previously installed ) and new ( the one to install )
When the function detects that the new is unusable because
the number of nexthops that are usable for that route is 0,
then we uninstall the old route. The problem here is that
we should not attempt to uninstall any route that is
not owned by FRR. Modify the code to not attempt
this behavior
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bfdd: Make new multihop peer if local-address is unique
Previously if there were two multihop peers created that had the same
peer address but different local addresses then the second peer to be
created would be merged with the first one and niether would be able to
be deleted. This was due to an issue in the function bfd_key_lookup().
When the second peer was created its key would be sent into the lookup
function and would reach the last section, even though it shouldn't
have. A check has been placed around the section so that it will not be
entered if a peer is multihop.
Stephen Worley [Wed, 23 Sep 2020 18:17:15 +0000 (14:17 -0400)]
pbrd: use bool for pbr_send_pbr_map() return val
Use a bool as the return val for pbr_send_pbr_map() to make
the code a bit more readable. Dont expect there to be need
for values other than true or false anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Stephen Worley [Thu, 17 Sep 2020 19:34:36 +0000 (15:34 -0400)]
pbrd: cleanup pbr ifp info if not sent to zebra
Properly cleanup the pbr interface data if nothing actually
gets sent to zebra, since we will never get the callback
notification from zapi to issue final deletion.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Stephen Worley [Thu, 17 Sep 2020 19:32:01 +0000 (15:32 -0400)]
pbrd: add return val for pbr_send_pbr_map()
Add a return val so caller can know if something was actually sent to
zebra here. Some things need to be cleanued up by the caller
if we arent getting a callback from zapi.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Chirag Shah [Sun, 27 Sep 2020 21:09:43 +0000 (14:09 -0700)]
zebra: avoid duplication node in l3vni l2vni-list
With l2vni flap leading to duplicate entry creation
in l3vni's l2vni-list.
Use list sorted add with no duplicates.
root@TORC11:mgmt:~# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
State: Up
...
L2 VNIs: 1000 1000 1000 0 0 1002
root@TORC11:mgmt:~# ip link set down vx-1002
root@TORC11:mgmt:~# ip link set up vx-1002
root@TORC11:mgmt:~# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
State: Up
...
L2 VNIs: 1000 1000 1000 0 0 1002 1002
Ticket:CM-31545
Reviewed By:
Testing Done:
With Fix:
Multiple time flaps vni counts remained the same.
root@TORC11:mgmt:~# ip link set down vx-1002
root@TORC11:mgmt:~# ip link set up vx-1002
root@TORC11:mgmt:~# ip link set down vx-1002
root@TORC11:mgmt:~# ip link set up vx-1002
root@TORC11:mgmt:~# net show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
State: Up
...
L2 VNIs: 1000 1002
Donald Sharp [Tue, 29 Sep 2020 11:54:35 +0000 (07:54 -0400)]
zebra: Make nexthop_active check use the same debug
When debugging why a route was not successfully installed into the
rib, it would be preferable that the end user only have to turn
on `debug zebra rib detail` as that is what we have been telling
people to do for the last couple of years. Consolidate *back*
to this.
An adjacency should be removed when the holdtimer expires, but if the
system is overloaded we may end up doing it late. In the meanwhile vtysh
will display an incorrect value in the show isis neighbor output, due to
an overflow of the unsigned variable used to display the Holdtime, e.g.:
pe1# show isis neighbor
Area test:
System Id Interface L state Holdtime SNPA
Spirent-1 2.201 1 Down 26 2020.2020.2020
Spirent-1 2.203 1 Up 21 2020.2020.2020
Spirent-1 2.204 1 Up 18446744073709551615 2020.2020.2020
Spirent-1 2.207 1 Up 18446744073709551615 2020.2020.2020
Spirent-1 2.208 1 Up 18446744073709551615 2020.2020.2020
Spirent-1 2.209 1 Up 0 2020.2020.2020
Spirent-1 2.210 1 Up 18446744073709551615 2020.2020.2020
pe2 12.200 1 Up 30 2020.2020.2020
Guard against that by printing an "Expiring" message instead.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Donald Sharp [Thu, 24 Sep 2020 11:42:51 +0000 (07:42 -0400)]
zebra: Don't ignore setsockopt return
When attempting to limit the amount of data sent from the kernel
to FRR, some kernels we can run against may not have this ability
in which case the setsockopt will fail. Notice that in the log.
This problem was reported by the sanitizer -
=================================================================
==24764==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000115c8 at pc 0x55cb9cfad312 bp 0x7fffa0552140 sp 0x7fffa0552138
READ of size 8 at 0x60d0000115c8 thread T0
#0 0x55cb9cfad311 in zebra_evpn_remote_es_flush zebra/zebra_evpn_mh.c:2041
#1 0x55cb9cfad311 in zebra_evpn_es_cleanup zebra/zebra_evpn_mh.c:2234
#2 0x55cb9cf6ae78 in zebra_vrf_disable zebra/zebra_vrf.c:205
#3 0x7fc8d478f114 in vrf_delete lib/vrf.c:229
#4 0x7fc8d478f99a in vrf_terminate lib/vrf.c:541
#5 0x55cb9ceba0af in sigint zebra/main.c:176
#6 0x55cb9ceba0af in sigint zebra/main.c:130
#7 0x7fc8d4765d20 in quagga_sigevent_process lib/sigevent.c:103
#8 0x7fc8d4787e8c in thread_fetch lib/thread.c:1396
#9 0x7fc8d4708782 in frr_run lib/libfrr.c:1092
#10 0x55cb9ce931d8 in main zebra/main.c:488
#11 0x7fc8d43ee09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
#12 0x55cb9ce94c09 in _start (/usr/lib/frr/zebra+0x8ac09)
=================================================================
Donald Sharp [Wed, 23 Sep 2020 17:04:20 +0000 (13:04 -0400)]
zebra: Ensure that message received from mlag will fit
If we receive a message that is greater than our buffer
size we are in a situation where both the read and write
buffers are fubar'ed beyond the end. Assert when we notice
this fact.
Ticket: CM-31576 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Wed, 23 Sep 2020 16:26:13 +0000 (12:26 -0400)]
zebra: modify mlag code to only need 1 stream when generating data
The normal pattern of writing the type/length at the beginning
of the packet was not being quite followed. Modify the mlag
code to respect the proper way of doing things and get rid
of a stream_new and copy.
zebra: stop neigh hold timer when the neigh is deleted
The neigh hold timer was firing after the neigh was deleted resulting
in the following crash -
[
at ./zebra/zebra_evpn_neigh.h:155
at zebra/zebra_evpn_neigh.c:447
at lib/thread.c:1578
at zebra/main.c:488
]
Donald Sharp [Wed, 23 Sep 2020 00:47:33 +0000 (20:47 -0400)]
zebra: Move debug information gathering to inside guard
Let's not make the entire `depend_finds` function pay
for the data gathering needed for the debug. There
are numerous other places in the code that check
the NEXTHOP_FLAG_RECURSIVE and do the same output.
zebra: simplify and optimize vrf display in show ip route
In all outputs (text and json): simplify and optimize the vrf name
display, use the vrf_id_to_name() handler.
Note: vrf_id_to_name() has a safeguard system that prevents from
crashing when the vrf cannot be found because it changed in some
(unexpected) manner, it returns "n/a".
Note: "vrf n/a" will now be displayed instead of "vrf UNKNOWN" in this
case, like in most other frr components.
This safeguard was missing for show ip route json, so this
optimization also fixes a potential crash.
Variable "show ip route" commands invoke the same helper
(do_show_ip_route), potentially several times.
When asking to dump a non-default vrf, all vrfs or all tables, the
output is messy, the header summarizing abbreviations is repeated
several times, excess line feeds appear, the default table of default
VRF is concatenated to the previous table output...
Normalize the output:
- whatever the case, display the common header at most once, if there
is at least an entry to dump.
- when using a "vrf all" or "table all" command, prepend a line with
the VRF and table (even for the default vrf or table).
- when dumping a specific vrf or table, prepend a line with the VRF
and table.
Example (vrf all)
=================
router# show ip route vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:24:09
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:24:09
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:00:26
VRF private:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:00:29
C>* 10.125.0.0/24 is directly connected, loop0, 00:00:42
Example (main vrf)
==================
router# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:24:41
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:24:41
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:00:58
Example (specific vrf)
======================
router# show ip route vrf private
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF private:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:01:23
C>* 10.125.0.0/24 is directly connected, loop0, 00:01:36
Example (all tables)
====================
router# show ip route table all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:01:51
VRF main table 254:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:25:34
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:25:34
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:01:51
Example (all vrf, all table)
============================
router# show ip route table all vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:02:15
VRF main table 254:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:25:58
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:25:58
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:02:15
VRF private table 254:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:02:18
C>* 10.125.0.0/24 is directly connected, loop0, 00:02:31
Example (specific table)
========================
router# show ip route table 200
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:05:26
In a topology like R1 -- R2 -- R5, with R2 being NSSA ABR and R5 being
ASBR redistributing external routes, the ABR R2 will translate type-7
LSA into type-5 and advertise to the backbone. In the current implementation
R2 is also advertising a type-4 LSA when there is no need.
RFC 3101: "...NSSA's border routers never originate Type-4 summary-LSAs
for the NSSA's AS boundary routers, since Type-7 AS-external-LSAs are
never flooded beyond the NSSA's border..."
Igor Ryzhov [Mon, 21 Sep 2020 12:35:56 +0000 (15:35 +0300)]
lib: fix regcomp error processing
* use actual error code instead of "false"
* add missing new line
Before:
```
nfware# show interface | include (a]
% Regex compilation error: Success% Bad regexp '(a]'
% Unknown command: show interface | include (a]
```
After:
```
nfware# show interface | include (a]
% Regex compilation error: Unmatched ( or \(
% Bad regexp '(a]'
% Unknown command: show interface | include (a]
```
1. Ospf dead-interval will be set as 4 times of hello-interval, incase
if it is not set by using "ip ospf dead-interval <dead-val>".
2. On resetting hello-interval using "no ip ospf hello-interval" the
dead interval and hello due will be changed accordingly.
The "set metric" command wasn't processing metric additions and
subtractions (using + and -) correctly. Fix those problems.
Also, remove the "+metric" and "-metric" options since they don't
work and don't make any sense (they could be interpreted as unitary
increments/decrements but that was never supported).
When the ASBR stops announcing a prefix into the NSSA area, the LSA
type 7 is removed from the area. However the ABR is refreshing the
type 5 in its LSDB while removing the Type 7 LSA. Routers outside
the area do not get an update.
With this change the LSA type 5 is flushed from the LSDB and the
change is announced to routers outside the area
Donald Sharp [Fri, 18 Sep 2020 11:14:55 +0000 (07:14 -0400)]
lib: Remove debug associated with vrf_get
The vrf_get function is called throughout the code base
so much so that when you turn on vrf debugging it eclipses
everything else to a degree that is completely unreasonable.
Quentin Young [Thu, 17 Sep 2020 19:46:55 +0000 (15:46 -0400)]
tools: fix vtysh failure error handling
Based on the current code, I think the intent was to gracefully handle
vtysh failures and print a useful error message. Barriers in the way of
that:
- Despite reading the results of subprocess.communicate(), there won't
be anything there, because we aren't passing subprocess.PIPE as stdin
and stderr when calling subprocess.Popen()
- Despite catching subprocess.TimeoutExpired, if we were to actually hit
this case frr-reload.py would just crash because it's calling
.communicate() on an unbound process variable, probably a copy-paste
error
- Aside from that, building a kwargs dict to pass to a function that
contains something if something else is not None and nothing if it is,
is pointless when we could just pass the thing itself
Net result is that if vtysh fails to read an frr.conf due to syntax
errors, instead of crashing with a traceback, we actually handle the
error condition, log the problem and vtysh's output, and exit. Actually
we were printing the failed line just by chance because stderr wasn't
captured from the subprocess and I guess showed up as part of systemd's
error capturing or something, but the traceback did a good job of
obscuring that with useless noise.
Old:
frrinit.sh[32183]: * Started watchfrr
frrinit.sh[32183]: line 20: % Unknown command: eee
frrinit.sh[32183]: Traceback (most recent call last):
frrinit.sh[32183]: File "/usr/lib/frr/frr-reload.py", line 1316, in <module>
frrinit.sh[32183]: newconf.load_from_file(args.filename)
frrinit.sh[32183]: File "/usr/lib/frr/frr-reload.py", line 231, in load_from_file
frrinit.sh[32183]: file_output = self.vtysh.mark_file(filename)
frrinit.sh[32183]: File "/usr/lib/frr/frr-reload.py", line 146, in mark_file
frrinit.sh[32183]: % (child.returncode, stderr))
frrinit.sh[32183]: __main__.VtyshException: vtysh (mark file) exited with status 2:
frrinit.sh[32183]: None
New:
frrinit.sh[30090]: * Started watchfrr
frrinit.sh[30090]: vtysh failed to process new configuration: vtysh (mark file) exited with status 2:
frrinit.sh[30090]: line 20: % Unknown command: eee
Donald Sharp [Wed, 16 Sep 2020 21:48:15 +0000 (17:48 -0400)]
bgpd: Avoid memset when tip hash is empty
The tip hash is only used when we are dealing with
evpn. In bgp_nexthop_self we are doing a memset
irrelevant of whether we will ever find data. Yes
hash_lookup will return pretty quickly.
Modify the code to avoid doing a memset in the case
where the tip hash is empty as that we know we'll
never find anything. With full BGP feeds this
small memset does take some time.
lib: simplify handling of the sysrepo startup configuration
In the new Sysrepo, all SR_EV_ENABLED notifications are followed by
SR_EV_DONE notifications (assuming no errors occur), so there's no
need to special case the SR_EV_ENABLED event anymore (e.g. do full
transactions in one step).
While here, add a few more guarded debug messages to facilitate
troubleshooting.
lib: fix handling of deleted nodes in the sysrepo plugin
Make the sysrepo plugin ignore the deletion of configuration
nodes that don't exist anymore instead of logging an error and
rejecting the changes. This is necessary because Sysrepo delivers
delete notifications for all nodes of a deleted data tree instead
of delivering a single delete notification of the top-level subtree
node (which would suffice for the northbound layer).
From Sysrepo's documentation:
"Note: do not use fork() after creating a connection. Sysrepo
internally stores PID of every created connection and this way a
mismatch of PID and connection is created".
Introduce a new "frr_very_late_init" hook in libfrr that is only
called after the daemon is forked (when the '-d' option is used)
and after the configuration is read. This way we can initialize
the sysrepo plugin correctly even when the daemon is daemonized,
and after the Sysrepo CLI commands are processed (only "debug
northbound client sysrepo" for now).
Currently, when the is-type of an area is changed and its circuits resign,
we are not resetting the DIS flag. Consequently, if the area type is reverted
we are not running the DR election and not regenerating the pseudonode LSP.
Also adding event debug logs for circuit commence/resign.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Don Slice [Thu, 10 Sep 2020 12:40:28 +0000 (12:40 +0000)]
bgpd: correct community-list replace logic
Problem rerported that if you enter an existing community list
sequence number with new community information, the entire community
list would be deleted. This commit fixes the replace logic to do
the right thing.
Ticket: CM-30555 Signed-off-by: Don Slice <dslice@nvidia.com>
Donald Sharp [Fri, 11 Sep 2020 17:05:55 +0000 (13:05 -0400)]
pbrd: Ensure rule is installed on interface up
If we are experiencing an interface that is bouncing
very fast and the last operation that we experienced
was a ifdown we will send rule deletions associated
with that interface. If we have not received notification
that hte rule was removed *but* we immiedately get another
ifup notification when we go to install the rule we
are deciding that it's not ready to send down again,
as that we still think it is installed.
Force the rule installation when we have a interface up
event.
Ticket: CM-31042 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Thu, 10 Sep 2020 15:31:39 +0000 (11:31 -0400)]
bgpd, lib, pbrd, zebra: Pass by ifname
When installing rules pass by the interface name across
zapi.
This is being changed because we have a situation where
if you quickly create/destroy ephermeal interfaces under
linux the upper level protocol may be trying to add
a rule for a interface that does not quite exist
at the moment. Since ip rules actually want the
interface name ( to handle just this sort of situation )
convert over to passing the interface name and storing
it and using it in zebra.
Ticket: CM-31042 Signed-off-by: Stephen Worley <sworley@nvidia.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
lib: fix crashes with leafrefs that point to non-implemented modules
Whenever libyang loads a module that contains a leafref, it will
also implicitly load the module of the referring node if it's
not loaded already. That makes sense as otherwise it wouldn't be
possible to validate the leafref value correctly.
The problem is that loading a module implicitly violates the
assumption of the northbound layer that all loaded modules
are implemented (i.e. they have a northbound node associated
to each schema node). This means that loading a module that
isn't implemented can lead to crashes as the "priv" pointer
of schema nodes is no longer guaranteed to be valid. To fix this
problem, add a few null checks to ignore data nodes associated
to non-implemented modules.
The side effect of this change is harmless. If a daemon receives
configuration it doesn't support (e.g. BFD peers on staticd),
that configuration will be stored but otherwise ignored. This can
only happen when using a northbound client like gRPC, as the CLI
will never send to a daemon a command it doesn't support. This
minor problem should go away in the long run as FRR migrates to
a centralized management model, at which point the YANG-modeled
configuration of all daemons will be maintained in a single place.
Finally, update some daemons to stop implementing YANG modules
they don't need to (i.e. revert 1b741a01c and a74b47f5).
Donald Sharp [Fri, 11 Sep 2020 12:27:28 +0000 (08:27 -0400)]
pimd: Warn when we try to build MAXVIFS > 256
We use the pim mroute socket for kernel notifications of events.
Currently this is limited to 8 bits of data. There are patches
coming down the pike in kernel land to allow this to expand.
Rather than fix this and all the other places we assume MAXVIFS < 256
in the pim code right now. Leave a land mine for the developer
doing this work to point them in the right direction.