Donald Sharp [Thu, 2 Feb 2023 19:13:12 +0000 (14:13 -0500)]
bgpd: Don't try to recursively hold peer io mutex
BGP was modified in a0b937de428e14e869b8541f0b7810113d619c2e
to grab the peer->io_mtx before validating the header to ensure
that the input Queue was not being modified by anyone else at that
moment in time. Unfortunately validate_header can detect a problem
and attempt to relock the mutex, which deadlocks. This deadlock in
the bgp_io pthread is the lone deadlock at first, eventually though
bgp attempts to write another packet to the peer( say when the
it's time to send the next packet ) and the main pthread of bgpd
becomes deadlocked and then the whole bgpd process is stuck at that
point in time leaving us dead in the water.
The point of locking the mutex earlier was to ensure that the input
Queue wasn't being modified by anyone else, (Say reading off it )
as that we wanted to ensure that we don't hold more packets then necessary.
Let's grab the mutex long enough to look at the input Q size, this
ensure that we have room and then we can validate_header and do the right
thing from there. We'll need to lock the mutex when we actually move it
into the input Q as well.
Fixes: #12725 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Thu, 2 Feb 2023 15:40:07 +0000 (10:40 -0500)]
bgpd: Convert evpn output to not pretty print json
Commit: 3cdb03fba7b40240fb38469a12b7b05a11043e09
changed the vty_json output to not be pretty printing.
The previous commit in the tree added vty_json_no_pretty
let's use that instead
Donald Sharp [Thu, 2 Feb 2023 15:28:19 +0000 (10:28 -0500)]
lib, bgpd: Add ability to specify that some json output should not be pretty
Initial commit: 23b2a7ef524c9fe083b217c7f6ebaec0effc8f52
changed the json output of `show bgp <afi> <safi> json` to
not have pretty print because when under a situation where
there are a bunch of routes with a large scale ecmp show
output was taking forever and this commit cut 2 minutes out
of vtysh run time.
When upgrading to latest version the long run time was noticed
due to testing. Let's add back this functionality such that
FRR can have reduced run times with vtysh when it's really
needed.
Donatas Abraitis [Tue, 24 Jan 2023 08:32:13 +0000 (10:32 +0200)]
bgpd: Set attr to NULL when passing NLRI_UPDATE with treat-as-withdraw
Before this patch, we always passed `struct attr` for NLRI_UPDATE, but if we
have a situation with treat-as-withdraw (for example: malformed attribute, or
using a command like `neighbor path-attribute treat-as-withdraw`) the route
MUST be withdrawn form the BGP table.
Hence, we MUST pass attr as NULL, in this case we already have this check
under NLRI_ATTR_ARG() macro, just reuse it properly.
Donald Sharp [Thu, 19 Jan 2023 20:46:31 +0000 (15:46 -0500)]
tests: zebra_rib remove a sleep
The test was sometimes failing around the sleep(4) for
waiting for the routes to be installed. Instead of blindly
sleeping let's check to see that the routes are actually
there in zebra and then continue on.
Donald Sharp [Mon, 30 Jan 2023 18:53:44 +0000 (13:53 -0500)]
zebra: Send nht resolved entry up to concerned protocols in all cases
There existed the idea, from Volta, that a nexthop group would not have
the same nexthops installed -vs- what FRR actually sent down. The
dplane would notify you.
The flag ROUTE_ENTRY_USE_FIB_NHG flag was being used
to control which set was being sent up to concerned parties
in nexthop tracking. Put this flag behind the wall and
do not necessarily set it when we receive a data plane
notification about a route being installed or not.
Fixes: #12706 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Donald Sharp [Mon, 30 Jan 2023 21:02:23 +0000 (16:02 -0500)]
bgpd: bgp_update and bgp_withdraw never return failures
These two functions always return 0. As such any and all
tests against this make no sense. Remove the return 0
to a void and follow the chain, logically, to remove all
the dead code.
Donald Sharp [Sat, 28 Jan 2023 14:32:34 +0000 (09:32 -0500)]
babeld: During intf startup, ignore address already in use
When listening on a multicast group. No need to actually
fail the operation when it's already being used.
Let's not treat the Address already in use error message
as one that is stopping everything from working. Especially
since multiple interface events cause this to happen.
Without this, if config is read in before full connection
to zebra, babel will never establish neighbors.
Trey Aspelund [Wed, 25 Jan 2023 18:25:20 +0000 (13:25 -0500)]
bgpd: move tunnel-ip comparison into handler
Moves the old/new IP comparison into handle_tunnel_ip_change instead of
expecting the caller to do the check on their own.
Also changes handle_tunnel_ip_change to return void since it only ever
returned 0 in all cases.
Trey Aspelund [Wed, 25 Jan 2023 18:07:43 +0000 (13:07 -0500)]
bgpd: only unimport routes if tunnel-ip changes
When processing a new local VNI, we were always walking the global EVPN
table to look for routes that needed to be removed due to a martian
nexthop change (specifically a tunnel-ip change).
Since the martian TIP table is global (all VNIs) + the walk is also in
the global table (all VNIs), we can trust that any new TIP from any VNI
would result in routes getting removed from the global table and
unimported from all live (L2)VNIs.
i.e.
The only time this update is actionable is if we are adding/removing an
IP from the martian TIP table, and we do not need to walk the table for
normal refcount adjustments.
David Lamparter [Thu, 26 Jan 2023 13:53:47 +0000 (14:53 +0100)]
*: no-warn pragmas for non-const format strings
We do use non-constant/literal format strings in a few places for more
or less valid reasons; put `ignored "-Wformat-nonliteral"` around those
so we can have the warning enabled for everywhere else.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
1. Renamed "gates" to "nexthops"
2. Displaying afi of the nexthops being dispalyed in place of
"nexthops" JSON object in the old JSON output
3. Calling show_route_nexthop_helper() and show_nexthop_json_helper()
instead of print_nh() inorder to keeps the fields in "nexthops"
JSON object in sync with "nexthops" JSON object of
"show nexthop-group rib json".
Updated vtysh:
r1# show ip nht
192.168.0.2
resolved via connected
is directly connected, r1-eth0 (vrf default)
Client list: static(fd 28)
192.168.0.4
resolved via connected
is directly connected, r1-eth0 (vrf default)
Client list: static(fd 28)
David Lamparter [Mon, 6 Apr 2020 17:28:56 +0000 (19:28 +0200)]
debian: make cross-compile work
This allows e.g. "sbuild --host=arm64" to build packages for other
architectures on, say, fat amd64 servers. As a side effect, the Debian
build uses a separate builddir, which helps noting issues on that front.
David Lamparter [Tue, 24 Jan 2023 17:17:05 +0000 (18:17 +0100)]
yang: fix race condition in embedmodel.py mkdir
Parallel build may be executing another copy of embedmodel.py at the
same time, with both getting "False" on the isdir check, and then both
trying to mkdir - one of which will error out.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Tue, 24 Jan 2023 16:45:13 +0000 (17:45 +0100)]
build: consistently mkdir -p output for redirect
When running the build in a separate build directory, redirecting output
into a file can error out if the directory does not exist yet. Some
places already had `mkdir -p` calls, but not all.
Make all occurences of this consistently use `@$(MKDIR_P)`.
(Extension of PR #12575 to catch more places.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Daemons like staticd already implement nexthop zapi (un)registration. b7ca809d1c ("lib: BFD automatic source selection") has implemented a
concurrent nexthop (un)registration. Some nexthop could be unregistred
by the bfd whereas they were still needed by the daemon.
Let the deamons deal with nexthop zapi (un)registration.