Don Slice [Fri, 28 Sep 2018 15:55:39 +0000 (15:55 +0000)]
bgpd: solve issue entering aggregate twice
Problem reported that frr-relaod.py was not installing an aggregate
properly. Problem was actually that frr-reload.py does the command
twice, and the second time the aggregate command was entered, it would
appear in the config but the aggregate was removed from the bgp table
and not advertised to peers. Solved by noticing when an aggregate
was marked for deletion (info_invalid) and allowing the re-entry if
the old one was being removed.
Ticket: CM-22509 Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Donald Sharp [Tue, 25 Jun 2019 04:30:11 +0000 (00:30 -0400)]
pimd: Dissallow query to be received from a non-connected source
When we receive an igmp query on a interface, ensure that the
source address of the packet is connected to the incoming
interface. This will prevent a meanie from crafting a igmp
packet with a source address less than ours and causing
us to suspend query activities.
Fixes: #1692 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 28 Nov 2018 23:46:36 +0000 (18:46 -0500)]
bgpd: interface based peers should automatically override it's peer group
When a interface based peer is setup and if it is part of a peer
group we should ignore this and just use the PEER_FLAG_CAPABILITY_ENHE
no matter what.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
David Lamparter [Sun, 9 Jun 2019 23:35:04 +0000 (01:35 +0200)]
tools: retain sanity when reloading under systemd
Without this, we end up restarting watchfrr with the systemd watchdog
non-functional & tripped a bit later. Also, if watchfrr is in the
"control" cgroup, systemd 232 will kill it. (241 apparently doesn't.
Can't find anything about this in systemd's ChangeLog though.)
Rafael Zalamena [Wed, 23 Jan 2019 12:25:30 +0000 (10:25 -0200)]
ospf6d: keep track of the socket set thread
When using the timer to set the socket multicast options, keep track
of the thread pointer. If we lose the thread reference we might have
situations where multicast is enabled when it should be disabled and
vice versa.
Mark Stapp [Tue, 14 May 2019 17:59:53 +0000 (13:59 -0400)]
zebra: [6.0] remove vrf LSPs when vrf is deleted
Try to remove any LSPs associated with a vrf when the vrf is
deleted. The vrf code was calling a helpful zebra_mpls api,
but that api was basically a no-op for vrfs other than
the default.
Quentin Young [Tue, 7 May 2019 20:14:44 +0000 (20:14 +0000)]
*: 6.0.3 release
* bgpd: Fix 'show bgp ipv4/ipv6 neighbors' to show only v4 or v6 neighbors
* bgpd: Fix display issue when showing labeled-unicast routes
* bgpd: Fix incorrect # peers in 'show bgp ipv6 summary' output
* bgpd: Fix issue with remote-private-as in combination with local-as
* bgpd: Fix memory error when prepending to AS-path
* bgpd: Improve error handling when using maximum-prefix
* ldpd: Fix startup permissions error on OpenBSD
* ldpd: add support for FreeBSD IP_BINDANY
* ospfd: Fix incorrect display of millisecond time values
* tools: Fix incorrect systemd dependencies causing failure to start on boot
* vtysh: Fix unnecessary reconnection under multi-instance OSPF
* watchfrr: Fix multi-instance support when using new init script
* zebra: Fix a display bug in 'show ip route ... json'
* zebra: Fix compilation issue on OpenBSD
* zebra: Fix issue with missed selection of system-sourced routes
* zebra: Fix race condition in label manager
* zebra: Reliability improvements to pseudowire route recovery
* zebra: Tweak metric values for macvlan devices
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
F. Aragon [Wed, 10 Apr 2019 17:08:50 +0000 (19:08 +0200)]
zebra: pseudowire event recovery (DoS fix)
When having a route recovery, because of the route installation
cycling and the next hop label check, it could happen that the PW
never gets recovered. The original code shows the intention of retrying,
but the code was missing. The fix includes the call to the timer programming
the recovery attempt.
Example for reproducing the issue:
|P1| <-> |P2| <-> |P3|
- Being P1, P2, P3 nodes, using IS-IS as IGP, and having a pseudowire
betwen P1 and P3 (P1, P2, P3 having configured LDP daemons).
- After 60 seconds, kill the IS-IS daemon in P2.
- Wait 30 seconds
- Launch again the IS-IS daemon in P2
- The bug/issue is that after P1 <-> P3 recovering connectivity sometimes
the PW is not recovered because the reason explained in the first paragraph.
F. Aragon [Fri, 5 Apr 2019 13:26:14 +0000 (15:26 +0200)]
zebra: label manager race condition fix
This fix covers the case where two or more events are processed but only one
becoming effective. E.g. when mixing a synchronous label request from a LDP
deamon and an asynchronous request from a BGP daemon it could happen to the
BGP having the label chunk, but the LDP stuck waiting for the response.
Donald Sharp [Fri, 29 Mar 2019 02:08:37 +0000 (22:08 -0400)]
bfdd, nhrpd, pimd: When deleting an interface clean up
When we delete an interface, we need to set the interface
ifindex to an internal value so that we don't end up in
a state where the re-addition of the same ifindex, due to
a rename operation, causes an infinite loop.
Fixes:#4007 Fix-Suggested-by: Saravanan K Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
while labeled_unicast routes should be fetched in the
unicast table, we cannot set the safi to SAFI_UNICAST
else the peer afc checks and subgroup retrieval will fail
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Donald Sharp [Mon, 11 Mar 2019 13:39:19 +0000 (09:39 -0400)]
zebra: System routes sometimes can not be properly selected
System Routes if received over the netlink bus in a
specific pattern that causes an update operation for that
route in zebra can leave the dest->selected_fib pointer NULL,
while having the ZEBRA_FLAG_SELECTED flag set. Specifically
one way to achieve this is to do this:
`ip addr del 4.5.6.7/32 dev swp1 ; ip addr add 4.5.6.7/32 dev swp1 metric 9`
Why is this a big deal?
Because nexthop tracking is looking at ZEBRA_FLAG_SELECTED to
know if we can use a route, while nexthop active checking uses
dest->selected_fib.
So imagine we have bgp registering a nexthop. nexthop tracking in
the above case will be able to choose the 4.5.6.7/32 route
if that is what the nexthop is, due to the ZEBRA_FLAG_SELECTED being
properly set. BGP then allows the peers connection to come up and we
install routes with a 4.5.6.7 nexthop. The rib processing for route
installation will then look at the 4.5.6.7 route see no
dest->selected_fib and then start walking up the tree to resolve
the route. In our case we could easily hit the default route and be
unable to resolve the route. Which then becomes inactive in the
rib so we never attempt to install it.
This commit fixes this problem because when the rib_process decides
that we need to update the fib( ie replace old w/ new ), the
replacement with new was not setting the `dest->selected_fib` pointer
to the new route_entry, when the route was a system route.
Ticket: CM-24203 Signed-off-by: Donald Sharp <sharpd@cumulusnetworkscom>
Donald Sharp [Wed, 5 Dec 2018 20:12:50 +0000 (15:12 -0500)]
zebra: `show ip route A.B.C.D json` would only show last route entry
The `show ip route A.B.C.D json` command was only displaying
the last route entry looked at and we would drop the data
associated with other route entries. This fixes the issue:
robot# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.
Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
> - selected route, * - FIB route
VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36
zebra: set connected route metric based on the devaddr metric
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
inet6 2001:aa:1::2/64 scope global
valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
inet6 2001:aa:1::a/64 metric 1024 scope global
valid_lft forever preferred_lft forever
root@TORC11:~#
The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.
Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.
Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
Known via "connected", distance 0, metric 1024, vrf vrf1
Last update 11:30:56 ago
* directly connected, vlan1002-v0
Routing entry for 2001:aa:1::/64
Known via "connected", distance 0, metric 0, vrf vrf1, best
Last update 11:30:56 ago
* directly connected, vlan1002
Donatas Abraitis [Mon, 25 Feb 2019 19:16:02 +0000 (21:16 +0200)]
bgpd: Add peer action for PEER_FLAG_IFPEER_V6ONLY flag
peer_flag_modify() will always return BGP_ERR_INVALID_FLAG because
the action was not defined for PEER_FLAG_IFPEER_V6ONLY flag.
```
global PEER_FLAG_IFPEER_V6ONLY = 16384;
global BGP_ERR_INVALID_FLAG = -2;
probe process("/usr/lib/frr/bgpd").statement("peer_flag_modify@/root/frr/bgpd/bgpd.c:3975")
{
if ($flag == PEER_FLAG_IFPEER_V6ONLY && $action->type == 0)
printf("action not found for the flag PEER_FLAG_IFPEER_V6ONLY\n");
}
probe process("/usr/lib/frr/bgpd").function("peer_flag_modify").return
{
if ($return == BGP_ERR_INVALID_FLAG)
printf("return BGP_ERR_INVALID_FLAG\n");
}
```
produces:
action not found for the flag PEER_FLAG_IFPEER_V6ONLY
return BGP_ERR_INVALID_FLAG
Kiran Kella [Fri, 8 Feb 2019 07:25:25 +0000 (12:55 +0530)]
bgpd: 'show bgp [ipv4|ipv6] neighbors' displays all af neighbors
- Display only ipv4 neighbors when 'show bgp ipv4 neighbors' command is
issued.
- Display only ipv6 neighbors when 'show bgp ipv6 neighbors'
command is issued.
- Take the address family of the peer address into account, while
displaying the neighbors.
Signed-off-by: Kiran Kella <kiran.kella@broadcom.com>
David Lamparter [Mon, 18 Feb 2019 23:27:45 +0000 (00:27 +0100)]
tools: fix new init script wrt. multi-instance
TBH when I looked at watchfrr I didn't see any MI support and hence
assumed this just didn't work to begin with. However, it actually does
(transparently to watchfrr, by just using "ospfd-1" as daemon name.)
So, fix this up and make it work again.
(Also remove 2 extraneous \n in messages.)
Signed-off-by: David Lamparter <equinox@diac24.net>
Donald Sharp [Wed, 21 Nov 2018 21:13:25 +0000 (16:13 -0500)]
vtysh: Don't attempt to reconnect the non-instanced ospf process
When running ospf instances we should not attempt to reconnect
the default ospf instance on running a command.
This commit should be targeted enough because in the case
of normal operation we connect to everything we should
and only set the VTYSH_WAS_ACTIVE flag for those we
truly have lost connection too.
Before:
donna.cumulusnetworks.com# config t
donna.cumulusnetworks.com(config)# router ospf 100
Warning: connecting to ospfd...failed!
donna.cumulusnetworks.com(config-router)#
After:
donna.cumulusnetworks.com# conf t
donna.cumulusnetworks.com(config)# router ospf 100
donna.cumulusnetworks.com(config-router)# end
donna.cumulusnetworks.com#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Note that sysinit.target does not depend on any network* service or
target.
In other words, unless there is a service that requires
network-online.service, even if FRR is enabled it will not be started.
Therefore network-online.target is the wrong unit to have in WantedBy=,
as it is not always started.
This patch updates our service file so that it is properly started by
the system when enabled, delayed until networking is up, and if possible
delayed until after NetworkManager, systemd-networkd or any other
networking configuration manager has finished performing its tasks -
i.e. after network-online.target.
After these changes our new dependency graph looks like this:
David Lamparter [Thu, 24 Jan 2019 17:17:40 +0000 (18:17 +0100)]
watchfrr: build in defaults for -r/-s/-k
There's no good reason to not have these options default to the
installation path of tools/watchfrr.sh. Doing so allows us to ditch
watchfrr_options from daemons/daemons.conf completely.
Fixes: #3652 Signed-off-by: David Lamparter <equinox@diac24.net>
David Lamparter [Wed, 23 Jan 2019 13:15:52 +0000 (14:15 +0100)]
vtysh: fix pager compatibility handling
I just straight up forgot checking VTYSH_PAGER at startup, and the
"terminal paginate" command is only installed to VIEW_NODE so it can't
be processed from vtysh.conf in CONFIG_NODE...
Signed-off-by: David Lamparter <equinox@diac24.net>