Quentin Young [Wed, 19 Feb 2020 21:52:00 +0000 (16:52 -0500)]
zebra: handle gr weirdness under libFuzzer
Two workarounds here. The #ifndef around assert(0) is to get around a
bug, in which a client that connects, announces GR capability,
disconnects, reconnects then sends anything other than a ZAPI hello will
hit the assert. GR resync is done in zread_hello(), so if a reconnecting
client doesn't send a hello then GR will notice that it's received a
disonnect for the same client twice in a row and assert. This is a bug,
GR should be able to handle that.
The rest of the code works around GR having a timer-based memory free.
Since we've disabled all thread.h code to increase determinism and avoid
mutex locks and other weirdness, clients repeatedly announcing GR
capability messages will result in a soft memleak.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Fri, 17 Jan 2020 15:23:24 +0000 (10:23 -0500)]
bgpd: clean up some libfuzzer / afl stuff
- Unify the paths a bit more
- Switch to maintaining the peer and not deleting it after each packet;
this increased coverage in zebra a lot, might help here
- Free processed packets, in libfuzzer case so we dont leak them
- Add check that size is at least BGP_HEADER_SIZE, validate_header()
assumes it
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Mon, 13 Jan 2020 20:25:02 +0000 (15:25 -0500)]
tweak configure options for --enable-libfuzzer
* Compile (but don't link) with libfuzzer support for all daemons - this
is -fsanitize=fuzzer-no-link
* Remove forcing ASAN for libfuzzer - better to control which sanitizer
you want with --enable-<sanitizer>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Mon, 13 Jan 2020 20:20:01 +0000 (15:20 -0500)]
bgpd: cleanup suboptimal AFL paths after libFuzzer
The AFL path through LLVMFuzzerTestOneInput is running a bit slow
because we are initializing BGP twice. Fix this. Also, since we know at
compile time whether we need to create a peer (libFuzzer) or use one
created already (AFL fork() case) we can save a branch in the hot path,
so let's do that.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Sat, 11 Jan 2020 03:20:33 +0000 (22:20 -0500)]
enable libfuzzer for bgpd
Wow that was painful
libFuzzer replaces main(), so while we can compile with
-fsanitize=fuzzer, we can't link with it unless we have a way to
undefine main(). So I've added a #define, FUZZING_LIBFUZZER, that
daemons who want to support libfuzzer need to guard their main() with.
This also means we can't use the SAN_FLAGS automake variable, since that
is included in both AM_CFLAGS and AM_LDFLAGS, to add -fsanitize=fuzzer
to. We need new daemon specific flags. Actually, we can add
-fsanitize=fuzzer-no-link to SAN_FLAGS, but we need daemon specific
LDFLAGS so we can control who links with -fsanitize=fuzzer. Aside from
replacing main(), LibFuzzer.a also needs to link to a user-defined
function called `LLVMFuzzerTestOneInput` which is the fuzzer entrypoint.
For bgpd, because libfuzzer is in-process, we now need to actuall clean
up after ourselves each fuzz run to avoid leaking memory. We also can't
touch global state. This also means we run slower because we have to
create and destroy a peer struct every iteration.
Finally I've almost certainly broken afl for now, will fix later.
Quentin Young [Sat, 11 Jan 2020 03:20:33 +0000 (22:20 -0500)]
enable libfuzzer for bgpd
Wow that was painful
libFuzzer replaces main(), so while we can compile with
-fsanitize=fuzzer, we can't link with it unless we have a way to
undefine main(). So I've added a #define, FUZZING_LIBFUZZER, that
daemons who want to support libfuzzer need to guard their main() with.
This also means we can't use the SAN_FLAGS automake variable, since that
is included in both AM_CFLAGS and AM_LDFLAGS, to add -fsanitize=fuzzer
to. We need new daemon specific flags. Actually, we can add
-fsanitize=fuzzer-no-link to SAN_FLAGS, but we need daemon specific
LDFLAGS so we can control who links with -fsanitize=fuzzer.
Also, compiling with libfuzzer also requires that you define a function
named LLVMFuzzerTestOneInput(). So I defined a stub version in libfrr.c
and added a macro to undefine it for daemons who actually implement it.
Now that I write it down this probably isn't necessary at all given the
previous paragraph. I think that function is only checked for at link
time.
For bgpd, because libfuzzer is in-process, we now need to actuall clean
up after ourselves each fuzz run to avoid leaking memory. We also can't
touch global state. This also means we run slower because we have to
create and destroy a peer struct every iteration.
Finally I've almost certainly broken afl for now, will fix later.
Quentin Young [Thu, 2 Jan 2020 07:02:31 +0000 (02:02 -0500)]
bgpd: enable deferred forkserver mode
Having narrowed down the issue with deferred mode to capability privs,
and having disabled those, we can now use afl deferred mode for...10x
performance gainz?!?!? zomg!
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Thu, 12 Dec 2019 19:53:06 +0000 (14:53 -0500)]
ospfd allow fuzzing LSUPD, LSACK, LSREQ packets
- Anything except HELLO wants a neighbor created, so do that
- Skip some unnecessary stuff
- Most stuff checks the LSDB and returns early, so skip those
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Thu, 21 Nov 2019 04:35:44 +0000 (23:35 -0500)]
bgpd: add checks performed by i/o code
Getting some false positives from AFL because we aren't performing
checks that are performed by the I/O thread before the packet processor
is ever invoked. Add those checks.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Fri, 12 Nov 2021 19:45:36 +0000 (14:45 -0500)]
doc: update & clarify language in process arch doc
There was a historical blurb at the top of the process architecture
document that in several instances caused some confusion regarding
whether or not FRR supports multithreading. Remove this paragraph and
replace it with a summary of the page contents.
Donald Sharp [Fri, 12 Nov 2021 15:52:03 +0000 (10:52 -0500)]
tests: Ensure BGP has had time to import routes through the vpn
Currently I get bgp_instance_del-test as well as bgp_l3vpn_to_bgp_vrf
failures every ~3-4 runs when under a 40 parallel run with micronet.
Examination of the failure and passing cases always leads to the
failures showing convergence of bgp bestpath immediately after
the show commands to ensure that the routes are there.
Modify the code to look for the fact that the vrf has
converged from routes being passed around across vrf's
and ensure that bestpath has run on them.
Igor Ryzhov [Fri, 12 Nov 2021 16:32:06 +0000 (19:32 +0300)]
bgpd: fix source-address for BFD sessions when using update-source IFNAME
When "update-source IFNAME" is used for the neighbor, p->update_source
is set to NULL, so we can't use it as a source address and should use
the address from p->su_local.
Donald Sharp [Thu, 11 Nov 2021 18:25:35 +0000 (13:25 -0500)]
ospfd: Prevent use after free on shutdown
Running ospf_topo_vrf1 leads us to this valgrind issue:
==2386518== Invalid read of size 8
==2386518== at 0x4971520: route_top (table.c:401)
==2386518== by 0x181F08: ospf_interface_bfd_apply (ospf_bfd.c:126)
==2386518== by 0x182069: ospf_interface_disable_bfd (ospf_bfd.c:158)
==2386518== by 0x18BF51: ospf_del_if_params (ospf_interface.c:557)
==2386518== by 0x18C584: ospf_if_delete_hook (ospf_interface.c:712)
==2386518== by 0x490CA0B: hook_call_if_del (if.c:61)
==2386518== by 0x490D1F3: if_delete_retain (if.c:286)
==2386518== by 0x490D337: if_delete (if.c:309)
==2386518== by 0x490CDED: if_destroy_via_zapi (if.c:200)
==2386518== by 0x49940A9: zclient_interface_delete (zclient.c:2237)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518== Address 0x5df39a0 is 0 bytes inside a block of size 56 free'd
==2386518== at 0x48399AB: free (vg_replace_malloc.c:538)
==2386518== by 0x492A03E: qfree (memory.c:141)
==2386518== by 0x4970C6F: route_table_free (table.c:141)
==2386518== by 0x4970A36: route_table_finish (table.c:61)
==2386518== by 0x18C543: ospf_if_delete_hook (ospf_interface.c:708)
==2386518== by 0x490CA0B: hook_call_if_del (if.c:61)
==2386518== by 0x490D1F3: if_delete_retain (if.c:286)
==2386518== by 0x490D337: if_delete (if.c:309)
==2386518== by 0x490CDED: if_destroy_via_zapi (if.c:200)
==2386518== by 0x49940A9: zclient_interface_delete (zclient.c:2237)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518== Block was alloc'd at
==2386518== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==2386518== by 0x4929EFC: qcalloc (memory.c:116)
==2386518== by 0x49709F8: route_table_init_with_delegate (table.c:53)
==2386518== by 0x49717F4: route_table_init (table.c:528)
==2386518== by 0x18C328: ospf_if_new_hook (ospf_interface.c:659)
==2386518== by 0x490C97D: hook_call_if_add (if.c:60)
==2386518== by 0x490CE85: if_create_name (if.c:223)
==2386518== by 0x490DF32: if_get_by_name (if.c:622)
==2386518== by 0x4993F73: zclient_interface_add (zclient.c:2186)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518==
Fix the ordering to do the individual node tree cleanup after we delete
the data we care about.
Donald Sharp [Thu, 11 Nov 2021 14:39:52 +0000 (09:39 -0500)]
*: use compiler.h MIN/MAX macros instead of everyone having one
We had various forms of min/max macros across multiple daemons
all of which duplicated what we have in compiler.h. Convert
everyone to use the `correct` ones
Table manager is allocated when the VRF is created, but it is freed when
the VRF is disabled. When this VRF is re-enabled, zebra ends up with
table manager being NULL pointer and it crashes on any dereference.
Table manager should be freed when the VRF is deleted, not when it's
disabled.
Igor Ryzhov [Wed, 3 Nov 2021 09:40:40 +0000 (12:40 +0300)]
isisd: use time_t for last update and last flap
These variables are only assigned with time() which returns time_t.
This should also fix occasional CI build failures because of comparisons
of signed and unsigned integers.
Igor Ryzhov [Fri, 5 Nov 2021 22:22:07 +0000 (01:22 +0300)]
lib: fix vrf deletion when the last interface is deleted
Currently, we automatically delete an inactive VRF when its last
interface is deleted. This code introduces a couple of crashes because
of the following problems:
- vrf_delete is called before calling if_del hook, so daemons may try to
dereference an ifp->vrf pointer which is freed
- in if_terminate, we continue to use the VRF in the loop condition
after the last interface is deleted
This check is needed only when the interface is deleted by the user,
because if the interface is deleted by the system, VRF must still exist
in the system. Move the check to appropriate places to fix crashes.
Igor Ryzhov [Sat, 6 Nov 2021 00:36:48 +0000 (03:36 +0300)]
zebra: fix netns deletion
We don't receive interface down/delete notifications from kernel when a
netns is deleted. Therefore we have to manually replicate the necessary
actions, otherwise interfaces are kept in the system with stale pointers
to the deleted netns.
Igor Ryzhov [Thu, 4 Nov 2021 12:11:18 +0000 (15:11 +0300)]
lib: remove confusing bfd TTL API
There are two APIs to control the expected number of hops for a BFD
session – `bfd_sess_set_mininum_ttl` and `bfd_sess_set_hop_count`.
The former is very confusing, as it takes an expected TTL in the
BFD packet which is actually a protocol internal value. The latter is
simple and straightforward – it takes an expected number of hops, which
is always 1 for single-hop and >1 for multi-hop.
As the former API is not used anywhere, just remove it to avoid any
confusion.
David Lamparter [Thu, 11 Nov 2021 10:57:59 +0000 (11:57 +0100)]
build: work around AC_PROG_LEX deprecation warning
`AC_PROG_LEX without either yywrap or noyywrap is obsolete`, says
autoconf 2.70. Sadly, there is no transition window for this, in 2.69
the macro takes no arguments.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>