bgpd: Do not try to redistribute routes if we are shutting down
When switching `router bgp`, `no router bgp` and doing redistributions, we should
ignore this action, otherwise memory leak happens:
```
Indirect leak of 400 byte(s) in 2 object(s) allocated from:
0 0x7f81b36b3a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
1 0x7f81b327bd2e in qcalloc lib/memory.c:105
2 0x55f301d28628 in bgp_node_create bgpd/bgp_table.c:92
3 0x7f81b3309d0b in route_node_new lib/table.c:52
4 0x7f81b3309d0b in route_node_set lib/table.c:61
5 0x7f81b330be0a in route_node_get lib/table.c:319
6 0x55f301ce89df in bgp_redistribute_add bgpd/bgp_route.c:8907
7 0x55f301dac182 in zebra_read_route bgpd/bgp_zebra.c:593
8 0x7f81b334dcd7 in zclient_read lib/zclient.c:4179
9 0x7f81b331d702 in event_call lib/event.c:1995
10 0x7f81b325d597 in frr_run lib/libfrr.c:1213
11 0x55f301b94b12 in main bgpd/bgp_main.c:505
12 0x7f81b2b57082 in __libc_start_main ../csu/libc-start.c:308
```
Useful to have it for datacenter profile only, disabled for traditional.
If the peer is not established or established, but has no description set,
we will show the FRR version instead, which is kinda handy to have instead of
nothing.
Zhiyuan Wan [Mon, 3 Apr 2023 08:21:15 +0000 (16:21 +0800)]
ospfd: Support show intra-area network type in 'show ip ospf route' command
User can now use 'show ip ospf route detail' command to distinguish
intra-area stub network and transit network.
Transit network will be displayed as 'N T prefix ...'.
NOTICE: Json output format has been changed, intra-area transit networks
will have a new attribute 'transit' and value is 'true'.
And 'adv' (means advertise router) change to 'advertisedRouter'.
Example output:
bsp-debianrt-exp1# show ip ospf route detail
Codes: N - network T - transitive
IA - inter-area E - external route
D - destination R - router
============ OSPF network routing table ============
N T 10.0.0.0/24 [32] area: 0.0.0.0
via 192.168.124.67, ens192
adv 10.0.0.5
N 10.0.30.0/24 [33] area: 0.0.0.0
via 192.168.124.67, ens192
adv 10.0.0.5
...
Donald Sharp [Mon, 17 Jul 2023 14:00:32 +0000 (10:00 -0400)]
zebra: Further handle route replace semantics
When an upper level protocol is installing a route X that needs to be
route replaced and at the same time the same or another protocol installs a
different route that depends on route X for nexthop resolution can leave
us with a state where the route is not accepted because zebra is still
really early in the route replace semantics ( route X is still on the work
Queue to be processed ) then the dependent route would not be installed.
This came up in the bgp_default_originate test cases frequently.
Further extendd the ROUTE_ENTR_ROUTE_REPLACING flag to cover this case
as well. This has come up because the early route processing queueing
that was implemented late last year.
Christian Hopps [Sat, 15 Jul 2023 04:04:05 +0000 (00:04 -0400)]
tests: backtraces/cores now fail tests
Previously we just raised an error for the test file with possibly all tests
passing. Now we fail any test that produces new backtraces or cores. This
just works a lot better with analysis and even CI.
Also be less verbose to the console after failure runs, just show error
level and above logs. The log files still capture all levels (DEBUG).
Christian Hopps [Fri, 14 Jul 2023 22:21:55 +0000 (18:21 -0400)]
vtysh: track and fix file-lock use in the workaround from 2004
There's a workaround in the code from a bug from back in 2004, it ends
and re-enters config mode anytime an `exit` is done from a level below
the top-level config node (e.g., from a `router isis` node). We need to
re-enter config mode with or without a lock according to how we actually
entered it to begin with.
Christian Hopps [Fri, 14 Jul 2023 20:13:54 +0000 (16:13 -0400)]
lib: mgmtd: only clear pending for the in-progress command
The lock/unlocks are being done short-circuit so they are never pending;
however, the handling of the unlock notification was always resuming the command
if pending was set. In all cases pending is set for another command. For example
implicit commit locks then when notified its done unlocks which was clearing the
set-config pending flag and resuming that command incorrectly.
Donald Sharp [Fri, 14 Jul 2023 16:14:20 +0000 (12:14 -0400)]
bgpd: Prevent use after free
When running bgp_always_compare_med, I am frequently seeing a crash
After running with valgrind I am seeing this and a invalid write
immediately after this as well.
==311743== Invalid read of size 2
==311743== at 0x4992421: route_map_counter_decrement (routemap.c:3308)
==311743== by 0x35664D: peer_route_map_unset (bgpd.c:7259)
==311743== by 0x306546: peer_route_map_unset_vty (bgp_vty.c:8037)
==311743== by 0x3066AC: no_neighbor_route_map (bgp_vty.c:8081)
==311743== by 0x49078DE: cmd_execute_command_real (command.c:990)
==311743== by 0x4907A63: cmd_execute_command (command.c:1050)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
==311743== by 0x1F7655: main (bgp_main.c:505)
==311743== Address 0x9ec2180 is 64 bytes inside a block of size 120 free'd
==311743== at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==311743== by 0x495A1BA: qfree (memory.c:130)
==311743== by 0x498D412: route_map_free_map (routemap.c:748)
==311743== by 0x498D176: route_map_add (routemap.c:672)
==311743== by 0x498D79B: route_map_get (routemap.c:857)
==311743== by 0x499C256: lib_route_map_create (routemap_northbound.c:102)
==311743== by 0x49702D8: nb_callback_create (northbound.c:1234)
==311743== by 0x497107F: nb_callback_configuration (northbound.c:1578)
==311743== by 0x4971693: nb_transaction_process (northbound.c:1709)
==311743== by 0x496FCF4: nb_candidate_commit_apply (northbound.c:1103)
==311743== by 0x496FE4E: nb_candidate_commit (northbound.c:1136)
==311743== by 0x497798F: nb_cli_classic_commit (northbound_cli.c:49)
==311743== by 0x4977B4F: nb_cli_pending_commit_check (northbound_cli.c:88)
==311743== by 0x49078C1: cmd_execute_command_real (command.c:987)
==311743== by 0x4907B44: cmd_execute_command (command.c:1068)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
==311743== by 0x1F7655: main (bgp_main.c:505)
==311743== Block was alloc'd at
==311743== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==311743== by 0x495A068: qcalloc (memory.c:105)
==311743== by 0x498D0C8: route_map_new (routemap.c:646)
==311743== by 0x498D128: route_map_add (routemap.c:658)
==311743== by 0x498D79B: route_map_get (routemap.c:857)
==311743== by 0x499C256: lib_route_map_create (routemap_northbound.c:102)
==311743== by 0x49702D8: nb_callback_create (northbound.c:1234)
==311743== by 0x497107F: nb_callback_configuration (northbound.c:1578)
==311743== by 0x4971693: nb_transaction_process (northbound.c:1709)
==311743== by 0x496FCF4: nb_candidate_commit_apply (northbound.c:1103)
==311743== by 0x496FE4E: nb_candidate_commit (northbound.c:1136)
==311743== by 0x497798F: nb_cli_classic_commit (northbound_cli.c:49)
==311743== by 0x4977B4F: nb_cli_pending_commit_check (northbound_cli.c:88)
==311743== by 0x49078C1: cmd_execute_command_real (command.c:987)
==311743== by 0x4907B44: cmd_execute_command (command.c:1068)
==311743== by 0x490801F: cmd_execute (command.c:1217)
==311743== by 0x49C5535: vty_command (vty.c:551)
==311743== by 0x49C7459: vty_execute (vty.c:1314)
==311743== by 0x49C97D1: vtysh_read (vty.c:2223)
==311743== by 0x49BE5E2: event_call (event.c:1995)
==311743== by 0x494786C: frr_run (libfrr.c:1204)
Effectively the route_map that is being stored has been freed already
but we have not cleaned up properly yet. Go through and clean the
code up by ensuring that the pointer actually exists instead of trusting
it does when doing the decrement operation.
Philippe Guibert [Thu, 11 May 2023 15:11:14 +0000 (17:11 +0200)]
bgpd: upon if event, evaluate bnc with matching nexthop
In BGP, when an interface event is detected or triggered,
the BNC that have a next-hop that matches the interface
are not evaluated.
The paths attached to the bnc context are evaluated in the
following situation:
- In the up event case, if at least one next-hop interface
matched the event interface.
- In the down event case, if there is no alternate next-hop
that does not use the event interface.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Thu, 11 May 2023 15:10:58 +0000 (17:10 +0200)]
bgpd: rename bnc->ifindex to bnc->ifindex_ipv6_ll
This commit changes the 'ifindex' name of the bnc structure.
As it is used only to handle ipv6 link local addresses, let
us use the 'ifindex_ipv6_ll' naming to avoid any confusions
with the ifindex value of the resolved next-hops of the bnc
structure.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
lib: fix on-match when added to existing route-map entry
Currently, "on-match (next|goto)" only works if already present in a
route-map entry when the route-map is applied to the routes. However, if
the command is added to an existing route-map entry, the route-map is
not reapplied to the routes in order to accommodate the changes. And
service restart is needed. The problem is that setting the command
doesn't signal about the change to the listener (i.e. to a routing
daemon).
With this fix, signal to the listener about addition of "on-match
(next|goto)" to a route-map entry.
Signed-off-by: Alexander Chernavin <achernavin@netgate.com>
anlan_cs [Mon, 22 May 2023 10:32:23 +0000 (18:32 +0800)]
zebra: Fix wrong vrf change procedure
Currently the vrf change procedure for the deleted interface is after
its deletion, it causes problem for upper daemons.
Here is the problem of `bgp`:
After deletion of one **irrelevant** interface in the same vrf, its
`ifindex` is set to 0. And then, the vrf change procedure will send
"ZEBRA_INTERFACE_DOWN" to `bgpd`.
Normally, `bgp_nht_ifp_table_handle()` should igore this message for
no correlation. However, it wrongly matched `ifindex` of 0, and removed
the related routes for the down `bnc`.
Adjust the location of the vrf change procedure to fix this issue.
Donald Sharp [Wed, 5 Jul 2023 13:28:50 +0000 (09:28 -0400)]
ospf6d: Convert ospf6_lsa_unlock to a better api
Make the ospf6_lsa_unlock take the same parameters
that the ospf_lsa_unlock does to make it consistent
and to also ensure that no-one can make the mistake
of getting the pointer cleared up.
Farid Mihoub [Tue, 7 Mar 2023 09:59:14 +0000 (10:59 +0100)]
topotests: add basic bgp bmp test
Test BMP messages logging.
Configure the BMP monitoring policy, add and withdraw IPv4/v6 prefixes.
Check if the received messages coincide with the monitored router BGP
session activity.
anlan_cs [Wed, 17 Aug 2022 07:26:24 +0000 (03:26 -0400)]
zebra: remove unnecessary check for default vrf
The default vrf is generally non-NULL, except when shutdown. So, most
of the time it is not necessary to check if it is NULL, we should
remove the useless checks for it.
Searched them with exact match:
```
grep -rI "zebra_vrf_lookup_by_id(VRF_DEFAULT)" | wc -l
31
```
Mark Stapp [Wed, 28 Jun 2023 19:33:57 +0000 (15:33 -0400)]
tests: add route-install test using NHRP tunnel
Add a test-case to the NHRP test that installs routes over the
NHRP tunnel endpoint routes. This confirms that zebra will use
NHRP routes when validating incoming routes from other daemons
(sharpd in this test).
Mark Stapp [Wed, 28 Jun 2023 12:11:41 +0000 (08:11 -0400)]
zebra: use NHRP routes as valid in nexthop check
Treat NHRP-installed routes as valid, as if they were
CONNECTED routes, when checking candidate routes'
nexthops for validity. This allows use of NHRP by an
IGP, for example, that doesn't normally want recursive
nexthop resolution.
Donald Sharp [Mon, 10 Jul 2023 14:40:38 +0000 (10:40 -0400)]
bgpd: Fix table manager to use the synchronous client
bgp_zebra_tm_connect calls bgp_zebra_get_table_range which
just used the global zclient. Which of course still had
us exposing the global zclient to read and drop important
data from zebra. This fixes commit 787c61e03c760ffdb422bfc44c72d83fb451e0c8
Donald Sharp [Mon, 10 Jul 2023 14:40:38 +0000 (10:40 -0400)]
bgpd: Fix table manager to use the synchronous client
bgp_zebra_tm_connect calls bgp_zebra_get_table_range which
just used the global zclient. Which of course still had
us exposing the global zclient to read and drop important
data from zebra. This fixes commit 787c61e03c760ffdb422bfc44c72d83fb451e0c8
David Schweizer [Mon, 10 Jul 2023 13:46:05 +0000 (15:46 +0200)]
tests: fix BGP delayopen timer expiration test
The changes allow the test to correctly pass in case the connection
between two peers is be established in less than 0.5 seconds after the
delayopen timer expires.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Donald Sharp [Mon, 3 Jul 2023 20:12:16 +0000 (16:12 -0400)]
tests: bgp_flowspec expand timings
Attempt to set the hold time in the bgp flowspec exabgp
config. In addition it was noticed that upstream bgp_flowspec
tests are still not negotiating peering within the time frame
specified. This is because the first tcp packet is missed
and no keepalive/hold time are negotiated and exabgp will
not attempt a reconnect for quite some time. Make this
test slower when things go south.
Donald Sharp [Sat, 1 Jul 2023 17:30:12 +0000 (13:30 -0400)]
tests: isis_tilfa_topo1 fails sometimes due to insufficient time
The isis_tilfa_topo1 test is failing because insufficient time was
given for isis to converge on the system under system load. Extend
the time and decrease the hello-interval timers to give it more
of a chance to converge.
Donald Sharp [Sat, 1 Jul 2023 17:27:52 +0000 (13:27 -0400)]
tests: zebra_rib route-map run times fixup
I have a local test run where the sharp route-map usage
was being checked for 5 seconds. I saw usages going up
for each 1 second check and the 5th one was at 497 out
of 500. Looks like the system was really loaded. Let's
give it more time to coalesce under heavy load.