David Lamparter [Thu, 2 Apr 2020 19:16:04 +0000 (21:16 +0200)]
bgpd, ospfd, ospf6d: long is not bool :(
... Oops ...
(for context, the defaults code originally didn't have a dedicated
"bool" variant and just used long for bools... I derp'd this when
adding bool as a separate case :( )
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@diac24.net>
Stephen Worley [Wed, 1 Apr 2020 19:31:40 +0000 (15:31 -0400)]
zebra: free unhashable (dup) NHEs via ID table cleanup
Free unhashable (duplicate NHEs from the kernel) via ID table
cleanup. Since the NHE ID hash table contains extra entries,
that's the one we need to be calling zebra_nhg_hash_free()
on, otherwise we will never free the unhashable NHEs.
This was found via a memleak:
==1478713== HEAP SUMMARY:
==1478713== in use at exit: 10,267 bytes in 46 blocks
==1478713== total heap usage: 76,810 allocs, 76,764 frees, 3,901,237 bytes allocated
==1478713==
==1478713== 208 (88 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 41
==1478713== at 0x483BB1A: calloc (vg_replace_malloc.c:762)
==1478713== by 0x48E35E8: qcalloc (memory.c:110)
==1478713== by 0x451CCB: zebra_nhg_alloc (zebra_nhg.c:369)
==1478713== by 0x453DE3: zebra_nhg_copy (zebra_nhg.c:379)
==1478713== by 0x452670: nhg_ctx_process_new (zebra_nhg.c:1143)
==1478713== by 0x4523A8: nhg_ctx_process (zebra_nhg.c:1234)
==1478713== by 0x452A2D: zebra_nhg_kernel_find (zebra_nhg.c:1294)
==1478713== by 0x4326E0: netlink_nexthop_change (rt_netlink.c:2433)
==1478713== by 0x427320: netlink_parse_info (kernel_netlink.c:945)
==1478713== by 0x432DAD: netlink_nexthop_read (rt_netlink.c:2488)
==1478713== by 0x41B600: interface_list (if_netlink.c:1486)
==1478713== by 0x457275: zebra_ns_enable (zebra_ns.c:127)
Repro with:
ip next add id 1 blackhole
ip next add id 2 blackhole
valgrind /usr/lib/frr/zebra
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
lynne [Sun, 29 Mar 2020 17:47:36 +0000 (13:47 -0400)]
ldpd: fixing host-only configuration filter.
There is configuration in LDP to only create labels for
host-routes. If the user remove this configuration the code
was not readvertising non-host routes to it's LDP neighbors.
The issue is the same in reverse also. If the user adds this
configuration on an active LDP session the non-host routes were
not withdrawn.
Donald Sharp [Tue, 31 Mar 2020 11:55:17 +0000 (07:55 -0400)]
ospf6d: Recent changes in our build cause const to be respected
We are seeing this crash:
New LWP 7673]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/ospf6d -d -F datacenter -M snmp -A ::1'.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
vtysh=vtysh@entry=0) at lib/command.c:1288
(gdb)
The command entered is `debug ospf6 lsa inter-router examin`. Code
inspection leads us to the fact that FRR is declaring the data as
const but we are attempting to modify it, causing the crash.
Remvoe the const of this set/get and let things work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 31 Mar 2020 22:38:01 +0000 (18:38 -0400)]
tests: More cbit extensions
We are still seeing cbit test failures in the ci system. I am
gonna try extending the timeout a bit more as that 8 seconds
doesn't seem to be long enough.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Quentin Young [Mon, 30 Mar 2020 18:47:15 +0000 (14:47 -0400)]
bgpd: display ingress packet queue size
In the past, we always displayed the number of buffered ingress packets
as zero because there was no packet buffering in the input path and
therefore never any queue size to report. They're buffered now so we can
display something meaningful instead of 0.
Also change the inq / outq lookups to be atomic, since they can be
modified elsewhere. These should still compile down to an unfenced word
read but it's good to be explicit.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Mon, 30 Mar 2020 18:28:10 +0000 (14:28 -0400)]
lib: defer grpc plugin initialization to post fork
When using the GRPC northbound plugin, initialization occurs at the
frr_late_init hook. This is called before fork() when daemonizing (using
-d). Because the GRPC library internally creates threads, this means our
threads go away in the child process, so GRPC doesn't work when used
with -d. Rectify this situation by deferring plugin init to after fork
by scheduling a task on the threadmaster, since those are executed by
the child.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Quentin Young [Mon, 30 Mar 2020 18:25:37 +0000 (14:25 -0400)]
watchfrr: change some messages from errors to info
When watchfrr starts up, it first tries to connect to daemons. This is
expected to fail if we are just starting up FRR, but we log it as an
error, and it shows up red in journalctl. Similarly when we fork
background commands that is also logged as an error. This is scaring
users, let's change these to info.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
It was pointed out that we are not properly passing the nexthop
through and instead we were replacing the nexthop as a Route Server
with our own.
https://tools.ietf.org/html/rfc4456#section-4
10. Implementation Considerations
Care should be taken to make sure that none of the BGP path
attributes defined above can be modified through configuration when
exchanging internal routing information between RRs and Clients and
Non-Clients. Their modification could potentially result in routing
loops.
In addition, when a RR reflects a route, it SHOULD NOT modify the
following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED.
Their modification could potentially result in routing loops.
Modify the code such that when FRR is instructed to act as a
Route-Server to pass through the nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Mark Stapp [Fri, 13 Mar 2020 20:52:53 +0000 (16:52 -0400)]
zebra: don't include backup nhs in main nhe dependency tree
We don't want to install backup nexthops - yet - as part of the
nexthop-id-based kernel interactions on netlink platforms. Avoid
mixing backup and primary nexthops in the tree of dependencies
in the ecmp cases.
Mark Stapp [Tue, 24 Dec 2019 19:22:03 +0000 (14:22 -0500)]
zebra: add per-nexthop backup index
Use a backup index in a nexthop directly (if it has a backup
nexthop); revise the zebra nhe/nhg code; revise zapi route
decoding to match; revise the dataplane route datastructs.
Refactor some of the rib_add_multipath code to be prepared to
be called with an nhe, carrying nexthop and (possibly) backup
info together.
saravanank [Fri, 27 Mar 2020 00:36:37 +0000 (17:36 -0700)]
vtysh: Crash during show running-config
RCA:
when client is killed, show running-config command crashes vtysh.
vtysh_client_config function temporarily makes vty->of which is standard output file
pointer to null inorder to suppress output to user.
This call further tries to communicate with each client and when the client
is terminated, socket call fails and hits the exception path to print the
connection has failed using vty_out.
vty_out crashes because vtysh_client_config has temporarily made vty->of
pointer to NULL to supress o/p to user.
Fix:
vty_out function should check for the sanity of vty->of pointer.
If it doesn't exist, this must have hit exception path, so use the
vty->saved_of if exists.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Stephen Worley [Tue, 28 Jan 2020 20:20:18 +0000 (15:20 -0500)]
zebra: add debug for duplicate NH in dataplane array conversion
When we find a nexthop ID thats a duplicate in the code that converts
NHG rb trees into a flat list of nexthop IDs for the dataplane,
output a debug message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Stephen Worley [Tue, 28 Jan 2020 19:33:10 +0000 (14:33 -0500)]
zebra: don't add ID to kernel nh_grp if not installed/queued
When we transform the nexthop group rb trees into a flat
array of IDs to send into the dataplane code (zebra_nhg_nhe2grp),
don't put an ID in there that has not been in installed or is
not currently queued to be installed into the dataplane.
Otherwise, if some of the nexthops fail to install, we will
still try to create a group with them and then the entire group
will fail.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Stephen Worley [Tue, 28 Jan 2020 00:36:01 +0000 (19:36 -0500)]
zebra: handle NHG in NHG dataplane group conversion
We were not properly handling the case of a NHG inside of
another NHG when converting the rb tree of a multilevel NHG
into a flat list of IDs. When constructing, we call the function
zebra_nhg_nhe2grp_internal() recursively so that the rare
case of a group within a group is handled such that its
singleton nexthops are appended to the grp array of IDs
we send to the dataplane code.
Ex)
1:
-> 2:
-> 3
-> 4
->5:
->6
becomes this:
1:
->3
->4
->6
when its sent to the dataplane code for final kernel installation.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Stephen Worley [Thu, 26 Mar 2020 14:39:16 +0000 (10:39 -0400)]
zebra: remove unnecessary `cmd =` check
In the netlink code for determining whether to set
a src on the route, we check if the cmd=NEW_ROUTE
but its not possible for this to ever be anything
but a new route since we do a goto skip further up
if its a DEL_ROUTE cmd.
So remove this unnecessary check.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Donatas Abraitis [Thu, 26 Mar 2020 14:06:34 +0000 (16:06 +0200)]
bgpd: Show that prefix is malformed if aggregated by 0
Show if this malformed under `show [ip] bgp <prefix>`:
```
eva# sh ip bgp 103.79.124.0/22
BGP routing table entry for 103.79.124.0/22
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.201.136
64539 15096 6939 7545 7545 136001, (aggregated by 0(malformed) 0.0.0.0)
192.168.201.136 from 192.168.201.136 (192.168.201.136)
Origin IGP, valid, external, best (First path received)
Last update: Thu Mar 26 10:02:07 2020
```
once again, for both hello-multiplier and hello-interval
the order in which the number and level were shown in the
cli_show methods was inverted compared to the vtysh command,
which created issues with frr-reload.py.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
David Lamparter [Tue, 24 Mar 2020 18:15:04 +0000 (19:15 +0100)]
*: remove line breaks from log messages
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Tue, 10 Dec 2019 16:23:25 +0000 (17:23 +0100)]
sharpd: add "logpump" to bulk test logging
This just generates log messages in bulk for testing logging backend
performance. It's in sharpd so the full "context" of being in a daemon
is available (e.g. different logging configs, parallel load in the main
thread.)
Signed-off-by: David Lamparter <equinox@diac24.net>