Donald Sharp [Mon, 26 Apr 2021 13:34:41 +0000 (09:34 -0400)]
zebra: Reduce per vrf memory usage from hash table creation
When creating a large number of vrf's we are creating a fairly
large number of hash tables per vrf. Reduce memory usage on
startup as well as let us identify the table these things come
from.
Donald Sharp [Mon, 26 Apr 2021 13:24:48 +0000 (09:24 -0400)]
zebra: Reduce size of vni hash tables to a more reasonable start size
We are creating 2 hash tables per vni in zebra. Once we start to
scale the number of vni's we start to see some serious memory
usage in zebra. Let's reduce the memory usage at startup
for scale of vni's.
bgpd: changing graceful-restart parameters should not be considered as error
vtysh will return an informational message to the user that changing any
graceful-shutdown related parameter will require a peer reset. This is should
not be treated as an error message (resulting in a return code of 1) but
rather as a simple information to the user.
This fixes GitHub issue https://github.com/FRRouting/frr/issues/8403
$ vtysh -c configure -c 'router bgp 100' -c 'bgp graceful-restart'
Graceful restart configuration changed, reset all peers to take effect
$ echo $?
0
Signed-off-by: Christian Poessinger <christian@poessinger.com>
Donald Sharp [Fri, 23 Apr 2021 18:31:41 +0000 (14:31 -0400)]
bgpd: Consolidate dampening show run output with the rest of that code
For whatever reason the dampening show run code was outside the normal
loop of code that handles the afi/safi portion. consolidate it into
the rest of the normal code.
Donald Sharp [Mon, 8 Feb 2021 16:48:40 +0000 (11:48 -0500)]
lib: Remove dead code
The distribute_list_init command is not used and is setup
code that will never be used because it makes assumptions about
how distribute-lists work that are fundamentally incorrect.
Donald Sharp [Mon, 8 Feb 2021 14:54:31 +0000 (09:54 -0500)]
lib: Abstract parsing of distribute lists
Abstract the parsing of distribute lists so that we
don't have as much cut-n-paste code.
This is a setup commit for future work. In effect
current distribute-list handling is all kinds of messed up
a) eigrp and babel both attempt to use distribute-lists, they just plain
don't work.
b) `distribute-list` is only sent to rip. `ipv6 distribute-list`
is sent to ripngd. If you use `distribute-list` under `router ripng`
it sends the command to rip but ripd is in the wrong mode and it
never works.
c) Should ripngd care about v4 and v6 specific distribute-lists?
This dichotomy was added for babel but babel has been broke
about this since day 1( see a ).
All in all we need to unwind this whole mess. Make distribute-list
commands specific to the daemons( so that we can be in the right
sub-mode ). But the parsing is going to be the same across all
daemons. So let's provide that functionality in `lib/distribute.c`
The check to validate large-community against UINT_MAX is added for
both standard and expanded community. But however it needs to be
validated only for standard community.
Donald Sharp [Sat, 17 Apr 2021 22:01:53 +0000 (18:01 -0400)]
ospfd: Do not use `case default` for switches that have enum
Found a couple spots where FRR was using `case default` when
using a switch over an enum. In this case we *must* enumerate
all states as part of the switch.
Problem Statement:
=================
In scale setup BGP sessions start flapping.
RCA:
====
In virtualized environment there are multiple places where
MTU need to be set. If there are some places were MTU is not set
properly then there is chances that BGP packets get fragmented,
in scale setup this will lead to BGP session flap.
Fix:
====
A new tcp option is provided as part of this implementation,
which can be configured per neighbor and helps to set the TCP
max segment size. User need to derive the path MTU between the BGP
neighbors and set that value as part of tcp-mss setting.
2. Running config
frr# show running-config
router bgp 100
neighbor 198.51.100.2 tcp-mss 150 => new entry
neighbor 2001:DB8::2 tcp-mss 400 => new entry
3. Show command
frr# show bgp neighbors 198.51.100.2
BGP neighbor is 198.51.100.2, remote AS 100, local AS 100, internal link
Hostname: frr
Configured tcp-mss is 150, synced tcp-mss is 138 => new display
4. Show command json output
frr# show bgp neighbors 2001:DB8::2 json
{
"2001:DB8::2":{
"remoteAs":100,
"bgpTimerKeepAliveIntervalMsecs":60000,
"bgpTcpMssConfigured":400, => new entry
"bgpTcpMssSynced":388, => new entry
Risk:
=====
Low - This is a config driven feature and it sets the max segment
size for the TCP session between BGP peers.
Tests Executed:
===============
Have done manual testing with three router topology.
1. Executed basic config and un config scenarios
2. Verified if the config is updated in running config
during config and no config operation
3. Verified the show command output in both CLI format and
JSON format.
4. Verified if TCP SYN messages carry the max segment size
in their initial packets.
5. Verified the behaviour during clear bgp session.
6. done packet capture to see if the new segment size
takes effect.
Donald Sharp [Mon, 3 May 2021 23:53:12 +0000 (19:53 -0400)]
zebra: Allow redistribution for routes selected
Current code has an inconsistent behavior with redistribute routes.
Suppose you have a kernel route that is being read w/ a distance
of 255:
eva# show ip route kernel
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/100] via 192.168.161.1, enp39s0, 00:06:39
K>* 4.4.4.4/32 [255/8192] via 192.168.161.1, enp39s0, 00:01:26
eva#
If you have redistribution already turned on for kernel routes
you will be notified of the 4.4.4.4/32 route. If you turn
on kernel route redistribution watching after the 4.4.4.4/32 route
has been read by zebra you will never learn of it.
There is no need to look for infinite distance in the redistribution
code. Either we are selected or not. In other words non kernel routes
with an 255 distance are never installed so the checks were pointless.
So let's just remove the distance checking and tell interested parties
about the 255 kernel route if it exists.
isisd: link protection optional fallback in ti-lfa
The current implementation of TI-LFA computes link-protecting
repair paths (even when node protection is enabled) to have repair
paths to all destinations when no node-protecting repair has been
found. This may be desired or not. E.g. the link-protecting paths
may use the protected node and be, therefore, useless if the node
fails. Also, computing link-protecting repairs incurs extra
calculations.
With this patch, when node protection is enabled, link protecting
repair paths are only computed if "link-fallback" is specified in
the configuration, on a per interface and IS-IS level.
Donald Sharp [Mon, 19 Apr 2021 23:23:45 +0000 (19:23 -0400)]
zebra: Allow one connected route per network mask on a interface
Currently FRR reads the kernel for interface state and FRR
creates a connected route per address on an interface. If
you are in a situation where you have multiple addresses
on an interface just create 1 connected route for them:
sharpd@eva:/tmp/topotests$ vtysh -c "show int dummy302"
Interface dummy302 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
vrf: default
index 3279 metric 0 mtu 1500 speed 0
flags: <UP,BROADCAST,RUNNING,NOARP>
Type: Ethernet
HWaddr: aa:4a:ed:95:9f:18
inet 10.4.1.1/24
inet 10.4.1.2/24 secondary
inet 10.4.1.3/24 secondary
inet 10.4.1.4/24 secondary
inet 10.4.1.5/24 secondary
inet6 fe80::a84a:edff:fe95:9f18/64
Interface Type Other
Interface Slave Type None
protodown: off
sharpd@eva:/tmp/topotests$ vtysh -c "show ip route connected"
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
C>* 10.4.1.0/24 is directly connected, dummy302, 00:10:03
C>* 192.168.161.0/24 is directly connected, enp39s0, 00:10:03
ldpd: make allowing broken-lsps to be installed with pop operation configurable
If LDP is miss configured in a setup and the router has LSPs with no remote
label, this code installs the LSP with a pop instruction of the top-level
label so the packet can be forwarded using IP. This is a best-effort
attempt to deliver labeled IP packets to their final destination instead of
dropping them. If this config is turned off the code will only install
LSPs that have a valid remote label.
ospf6d: Send Link LSAs when interface priority is changed
As per the ospfv3 conformance test 24.3
SETUP: Configure DIface-0 with priority set to <hprty>.
ANVL: Establish full adjacency with DUT for neighbor Rtr-0-A on DIface-0.
DUT: Exchange all the <OSPF-DD> packets, during adjacency establish- ment.
ANVL: Verify that the received <OSPF-DD> packets contain: • one header of Link-LSA, originated by DUT.
ANVL: Send <OSPF-LSR> packet from neighbor Rtr-0-A to DIface-0 con- taining:
• One Request Tuple for Link-LSA originated by DUT.
ANVL: Listen (for upto 2 * <RxmtInterval> seconds) on DIface-0. DUT: Send <OSPF-LSU> packet.
ANVL: Verify that the received <OSPF-LSU> packet contains:
• •
one Link-LSA, originated by DUT, contains: Rtr Pri field set to <hprty>.
----------
When interface priority is changed Link LSAs should be tranmitted
with the priority set.
When the link priorty chanages, the drbdr algorithm is called, which
can change the state of the interface. But if the state does not
changes then LINK LSAs are not transmitted.
This PR fixes this issue. If the state is changed, then LINK LSAs
will anyways be tranmitted. But in case the state is not changed,
even in that case Link LSAs are tranmitted.
David Lamparter [Tue, 16 Mar 2021 10:03:44 +0000 (11:03 +0100)]
lib: rework how we "override" assert()
The previous method, using zassert.h and hoping nothing includes
assert.h (which, on glibc at least, just does "#undef assert" and puts
its own definition in...) was fragile - and actually broke undetected.
Just provide our own assert.h and control overriding by putting it in a
separate directory to add to the include path (or not.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donald Sharp [Sun, 2 May 2021 11:39:36 +0000 (07:39 -0400)]
lib: Provide some better error handling for operator
When an operator encounters a situation where the number
of FD's open is greater than what we have been configured
to legitimately handle via uname or the `--limit-fds` command
line, abort with a message that they should be able to
debug and figure out what is going on.
Fixes: #8596 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Fri, 30 Apr 2021 23:24:40 +0000 (19:24 -0400)]
bgpd: Delay setting peer data until after decision to allow open
Delay setting local data about a remote peer until after BGP
has decided to allow an open connection to proceed.
Modifying local peer data structures based upon what is
received from a peer should not be done until after BGP
has decided that the open is allowed to proceed.
Donald Sharp [Tue, 27 Apr 2021 11:15:26 +0000 (07:15 -0400)]
zebra: Allow interface up events to read speed
Initially the reading of the speed of an interface happened
upon interface creation and happened until the speed of a link
settled down to a single value. The speed of an interface
can also change as that a new optic can be inserted that
changes the speed, in which case FRR would see a interface
down (optic removal) and then a interface up (optic insertion).
In this case FRR would not treat this as an event that changed
the speed. Let's expand the checking a bit more.
When enabling TI-LFA the forward SPF for neighbors adjacent to the
PLR is computed. Later, when computing the PQ spaces, the reverse
SPF trees for those adjacent neighbors affected by the protected
interface are computed.
When node protection is enabled, TI-LFA link protection is run
immediately afterwards to compute repairs in case no
node-protecting backup path exists. In this second run, the
existing code tries to compute the reverse SPF tree for the same
node, without freeing the SPF tree of the prior run.
This patch fixes this by not computing the reverse SPF again, thus
avoiding a memory leak and an unnecessary SPF run.
Philippe Guibert [Fri, 12 Mar 2021 13:32:53 +0000 (14:32 +0100)]
zebra: collect gre information and push it when needed
- gre keys are collected and stored locally.
- when gre source set is requested, and the link interface
configured is different, the gre information collected is
pushed in the query, namely source ip or gre keys if present.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 11 Mar 2021 14:33:41 +0000 (15:33 +0100)]
zebra: new dplane action to set gre link interface
This action is initiated by nhrp and has been stubbed when
moving to zebra. Now, a netlink request is forged to set
the link interface of a gre interface if that gre interface
does not have already a link interface.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 20 Dec 2019 10:10:34 +0000 (11:10 +0100)]
nhrpd: redirect netlink gre with zebra
as zebra has a new api to get gre and set gre source commands,
netlink gre get and netlink gre source function calls are redirected to zebra
by using the zapi interface.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 20 Dec 2019 09:34:01 +0000 (10:34 +0100)]
zebra: add 3 new gre commands, and enforce synchro mecanism
3 new gre commands are available:
- GRE_GET to permit a daemon to retrieve gre information.
- GRE_UPDATe is the reply message from zebra to the daemon. as it is a
syncronous request, the GRE_GET expected will have to match the vrf id
where the gre information is wished. this has an impact on label
manager with change in APIs.
- SET_GRE_SOURCE. this command will be stubbed for now, assuming that
the gre interface is set accordingly by external script.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 19 Dec 2019 17:33:56 +0000 (18:33 +0100)]
zebra: storage of gre information in zebra layer
zebra is able to get information about gre tunnels.
zebra_gre file is created to handle hooks, but is not yet used.
also, debug zebra gre command is done to add gre traces.
A zebra_gre file is used for complementary actions that may be needed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 29 Apr 2021 10:02:47 +0000 (12:02 +0200)]
zebra: at startup, fix links on all namespaces
when zebra has vrf backend mapped to namespaces, the polling
of interfaces leads to fix all linkages of interfaces. This
was not done on non default namespace. do it for other namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>