Quentin Young [Mon, 26 Apr 2021 22:59:48 +0000 (18:59 -0400)]
bgpd: avoid allocating very large stack buffer
As pointed out on code review of BGP extended messages, increasing the
maximum BGP message size has the consequence of growing the dynamically
sized stack buffer up to 650K. While unlikely to exceed modern stack
sizes it is still unreasonably large. Remedy this with a heap buffer.
rgirada [Fri, 19 Feb 2021 04:15:40 +0000 (20:15 -0800)]
lib: Routemap is not getting applied upon changing the routemap action
Description:
This looks broken after NB changes in routemap. When routemap
action modified from permit to deny, it is expected to apply
the new action on the filtered routes before the action in the
routemap data structure has been changed. But currently this is
not handled by the corresponding northbound API.
Quentin Young [Wed, 23 Sep 2020 19:31:52 +0000 (15:31 -0400)]
bgpd, zebra: encode ip addr len as uint16
This is always a 16 bit unsigned value.
- signed int is the wrong type to use
- encoding a signed int as a uint32 is bad practice
- decoding a signed int encoded as a uint32 into a uint16 is bad
practice
Igor Ryzhov [Tue, 27 Apr 2021 23:52:58 +0000 (02:52 +0300)]
tests: fix topotest polling log
The current log prints maximum wait time which is not actually correct,
because it doesn't include the command execution time. We usually have
"failed after X seconds" log with X being far longer than this maximum.
Igor Ryzhov [Fri, 23 Apr 2021 18:36:12 +0000 (21:36 +0300)]
bgpd: fix bgp_get_vty return values
There are multiple problems:
- commit ef7c53e2 introduced a new return value 2 which broke things,
because a lot of code treats non-zero return as an error,
- there is an incorrect error returned when AS number mismatches.
Fredi Raspall [Thu, 18 Feb 2021 22:45:08 +0000 (23:45 +0100)]
ldpd: defer register for info until configured
Instead of registering to receive default-VRF information and routes
when first connected to zebra, defer the registration until some ldp
configuration is entered.
This avoids redistributing IPv4/IPv6 routes to ldpd when not needed.
Signed-off-by: Fredi Raspall <fredi@voltanet.io> Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Donald Sharp [Sat, 24 Apr 2021 03:50:31 +0000 (23:50 -0400)]
bgpd: Prevent race condition loss of config
If we have a situation where BGP is partially reading in a config
file for a neighbor, *and* the neighbor is coming up *and* we
have a doppelganger. There exists a race condition when we transfer
the config from the doppelganger to the config peer that we will
overwrite later config because we are copying the config data
from the doppelganger peer( which was captured at the start of initiation
of the peering ).
From what I can tell the peer->af_flags variable is to hold configuration
flags for the local peer. The doppelganger should never overwrite this.
Igor Ryzhov [Fri, 23 Apr 2021 22:32:53 +0000 (01:32 +0300)]
tests: fix bfd-bgp-cbit-topo3 test
This test is completely incorrect on test_bfd_loss_intermediate step.
It shuts down the interface and then "waiting" for the BGP session to
fail. But instead of the actual wait it compares the output of "show bfd
peers" with the "up" state. As it does this comparison right after the
interface shutdown, the BFD session has not yet failed and the comparison
is always successful except very rare cases when the command takes a lot
of time to execute (due to the heavy load on CI system I suppose).
David Lamparter [Fri, 23 Apr 2021 13:17:07 +0000 (15:17 +0200)]
ldpd: set `frr_is_after_fork` in lde/ldpe
These subprocesses don't use frr_config_fork(), so frr_is_after_fork is
never set. While the frr_pthread stuff isn't currently used there, set
the flag anyway to avoid future headaches.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donald Sharp [Thu, 22 Apr 2021 19:04:15 +0000 (15:04 -0400)]
isisd: Remove warnings and add some data to debugs for isis_csm.c
When running isis and not running isis on all interfaces results
in a bunch of warn messages to the log about circuit state
changes. These warn messages also didn't bother to inform
the end user what interface was causing the fun. Since
the end operator cannot do anything with these warn messages
and nor should they in the vast array of normal operations
modify the code to use event debugging and turn the warns
to debugs.
Additionally add some information to clue the operator
in on to what actual interface we are talking about.
the code processing an NHT update was only resetting the BGP_NEXTHOP_VALID
flag, so labeled nexthops were considered valid even if there was no
nexthop. Reset the flag in response to the update, and also make the
isvalid_nexthop functions a little more robust by checking the number
of nexthops.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Stephen Worley [Thu, 22 Apr 2021 21:21:12 +0000 (17:21 -0400)]
zebra: handle gracefulRS/retain with proto NHGs
Properly handle refcounting of Proto-owned NHGs when
zebra is operating under graceful restart and retain
conditions.
We have an extra refcnt of 1 we keep for proto-owned NHGs to
indicate the upper level proto has created and owns it.
When we are reading these in from the kernel, we need to set them
to 1 as appropriate. Without this, we fail in the assert() during
zebra_nhg_proto_add() after the owning daemons resends the NHG
and the refcnts are off by one.
Also add in the same logic we use for routes when sweeping with
respect to uptimes.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Donald Sharp [Thu, 22 Apr 2021 19:47:37 +0000 (15:47 -0400)]
tests: Remove kill_mininet_router_process
This function kills all processes that happen to have the same
name to frr processes and it was only ever used in the setup.
Setup should not be used to kill old runs. That should be a
separate process.
Igor Ryzhov [Thu, 22 Apr 2021 12:24:49 +0000 (15:24 +0300)]
lib: remove enabled flag for bfd sessions
Currently this flag is only helpful in an extremely rare situation when
the BFD session registration was unsuccessful and after that zebra is
restarted. Let's remove this flag to simplify the API. If we ever want
to solve the problem of unsuccessful registration/deregistration, this
can be done using internal flags, without API modification.
Also add the error log to help user understand why the BFD session is
not working.
David Lamparter [Thu, 22 Apr 2021 10:10:27 +0000 (12:10 +0200)]
lib: hard-fail creating threads before fork()
Creating any threads before we fork() into the background (if `-d` is
given) is an extremely dangerous footgun; the threads are created in
the parent and terminated when that exits.
This is extra dangerous because while testing, you'd often run the
daemon in foreground without `-d`, and everything works as expected.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Thu, 22 Apr 2021 11:18:19 +0000 (13:18 +0200)]
lib: add frr_config_pre hook
... for any initialization that needs to run after forking, but that
would be racy if it were just scheduled on the thread_master (since the
config load is also just a thread callback, ordering would be undefined
for another scheduled thread callback.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Mark Stapp [Thu, 1 Apr 2021 15:56:30 +0000 (11:56 -0400)]
zebra: include inner labels with recursive backups
When capturing backup nexthops with recursive resolution,
ensure that inner labels from the recursive nexthop are
included in each backup (as they are with the resolving
primary nexthops).
David Lamparter [Thu, 8 Apr 2021 11:35:09 +0000 (13:35 +0200)]
lib: correctly exit CLI nodes on file config load
The (legacy) code for reading split configs tries to execute config
commands in parent nodes, but doesn't call the node_exit function when
it goes up to a parent node. This breaks BGP RPKI setup (and extended
syslog, which is in the next commit.)
Doing this correctly is a slight bit involved since the node_exit
callbacks should only be called if the command is actually executed on a
parent node.
Signed-off-by: David Lamparter <equinox@diac24.net>
David Lamparter [Sat, 10 Apr 2021 19:02:06 +0000 (21:02 +0200)]
lib: fix possible assert() fail in zlog_fd()
If the last message in a batched logging operation isn't printed due to
priority, this skips the code that flushes prepared messages through
writev() and can trigger the assert() at the end of zlog_fd().
Since any logmsg above info priority triggers a buffer flush, running
into this situation requires a log file target configured for info
priority, at least 1 message of info priority buffered, a debug message
buffered after that, and then a buffer flush (explicit or due to buffer
full).
I haven't seen this chain of events happen in the wild, but it needs
fixing anyway.
Signed-off-by: David Lamparter <equinox@diac24.net>
David Lamparter [Wed, 21 Apr 2021 09:54:48 +0000 (11:54 +0200)]
build: properly split CFLAGS from AC_CFLAGS
`CFLAGS` is a "user variable", not intended to be controlled by
configure itself. Let's put all the "important" stuff in AC_CFLAGS and
only leave debug/optimization controls in CFLAGS.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>