Donald Sharp [Tue, 10 Sep 2019 23:48:21 +0000 (19:48 -0400)]
tests: Add admin distance 255 static routes
Add a couple of test cases to ensure that admin distance of
255 actually causes the route to be accepted by zebra but
not installed into the linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Wed, 11 Sep 2019 03:16:01 +0000 (23:16 -0400)]
tests: Fix topotests due to json error
Recent commit: 5fba22485b added a new topotest that used
an older version of FRR that referenced some json code
that was changed in between when the PR was submitted
and when it got in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The Pim RFC does not appear to state any length requirements
of pim, other than the checksum must be correct.
Certain vendors are sending extra data at the end of a pim assert
message. This while not explicitly against the rules was a bit
of surprise to pim when we threw the assert message on the floor
for being too long.
Modify the test to see if length left will allow us to read
the 8 bytes of data that we need. If it is sufficient for
that allow the packet to be used.
Fixes: #4957 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Chirag Shah [Tue, 27 Aug 2019 21:41:00 +0000 (14:41 -0700)]
bgpd: clear l3vni prefix-only flag upon deletion
When L3vni is created with prefix-only flag,
the flag is set at bgp vrf instance level.
In the case of bgp instance is non auto created,
means user configured instance (i.e 'router bgp x vrf <name>')
Upon deletion of l3vni, clear the prefix-only flag from
bgp vrf instance.
Donald Sharp [Fri, 6 Sep 2019 12:46:27 +0000 (08:46 -0400)]
tests: Ensure we wait 1 bgp timeout period before declaring failure
The lib/bgp.py test code is bringing up neighbors and clearing them
to test that things are working appropriately. The problem we have
is that we are only waiting 30 seconds for declaration of failure.
In a high load system packets can be lost and as such the initial
convergence may not happen. Modify the test to wait for 1 retry
window test period before declaring failure.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Commit eaf6705d7a fixed a problem caused by configuration changes
coming from the kernel. The fix consisted of regenerating the
candidate configuration before every configuration command (when
using the non-transactional CLI mode). There's no need, however,
to regenerate the candidate when it's identical to the running
configuration. Since the northbound keeps track of the version
of each configuration, we can use that information to prevent
regenerating the candidate configuration when that is not necessary.
Mark Stapp [Thu, 5 Sep 2019 16:58:58 +0000 (12:58 -0400)]
zebra: avoid using zebra datastructs in evpn dataplane path
Some netlink-facing code used for evpn/vxlan programming was
being run in the dataplane pthread, but accessing zebra core
datastructs. Move some additional data into the dataplane
context, and use it in the netlink path instead.
Donald Sharp [Thu, 5 Sep 2019 16:30:26 +0000 (12:30 -0400)]
ospfd: Remove flog_warn for a situation user can never do anything with
When OSPF receives a Database description packet and is in
`Down`, `Attempt` or `2-Way` state we are creating a warning
for the end user.
rfc2328 states(10.6):
Down - The packet should be rejected
Attempt - The packet should be rejected
2-Way - The packet should be ignored
I cannot find any instructions in the rfc to state what the operational
difference is between rejected and ignored. Neither can I figure
out what FRR expects the end user to do with this information.
I can see this information being useful if we encounter a bug
down the line and we have gathered a bunch of data. As such
let's modify the code to remove the flog_warn and convert
the message to a debug level message that can be controlled by
appropriate debug statements.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Stephen Worley [Wed, 4 Sep 2019 16:38:56 +0000 (12:38 -0400)]
staticd: Re-send/Remove routes on interface events
We were not processing interface up/down events for device only
static routes. This patch looks up the ifp and then calls
the same API we are using for interface add/remove events.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
bgpd: Fixes to error message printed for failed peerings
There was a silly bug introduced when the command to show failed sessions
was added. A missing "," caused the wrong error message to be printed.
Debugging this led down a path that:
- Led to discovering one more error message that needed to be added
- Providing the error code along with the string in the JSON output
to allow programs to key off numbers rather than strings.
- Fixing the missing ","
- Changing the error message to "Waiting for Peer IPv6 LLA" to
make it clear that we're waiting for the link local addr.
Signed-off-by: Dinesh G Dutt <5016467+ddutt@users.noreply.github.com>
David Lamparter [Fri, 21 Jun 2019 08:58:02 +0000 (10:58 +0200)]
lib: add frr_with_mutex() block-wrapper
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited. This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
upon vrf disable, an event informs bfd daemon that the vrf contexts
should be removed. in the case a vrf backend is netns based, all sockets
opened under that netns have to be closed. otherwise it is impossible
for the system to completely close the network namespace. that implies
that some interfaces may not be deleted, and may not be given back to
default vrf.
PR=65291 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com> Acked-by: Julien Floret <julien.floret@6wind.com>
Donald Sharp [Tue, 27 Aug 2019 11:45:02 +0000 (07:45 -0400)]
lib: Cleanup return codes to use enum values
A couple functions in routemap.c were returning
0/1 that were being mapped into the appropriate
enum values on the calling functions to check return
values. This matches the return values to the actual
enum for future readability.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
bgpd: Add Established and Dropped counts to JSON output of bgp summary
Based on a suggestion by Donald Sharp, this patch adds the counts of the
number of times a BGP peering session has transitioned from Estd->NotEstd
and from NotEstd->Estd to the JSON output only of the
"show [ip] bgp [vrf <vrf>] summary" command. The idea is that even if the
current session is well and up, but a sessions has trasnitionined in and
out of Estd state multiple times, its worth noting that. We cannot change
the non-JSON output as easily, and so this command only addresses the JSON
part for now. The fields added are the ones that were provided only as part
of the "show bgp neighbor" command.
Signed-off-by: Dinesh G Dutt <5016467+ddutt@users.noreply.github.com>
David Lamparter [Mon, 2 Sep 2019 18:56:57 +0000 (20:56 +0200)]
zebra/fpm: deprecation warning for protobuf
We agreed on this several weeks ago on the weekly call, I just forgot to
actually put it in a PR...
A call for any Protobuf FPM users to raise their hand came up empty on
both the mailing list as well as Slack. Let's see if this gets any
response. If not, it'll be time to remove Protobuf FPM.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Mon, 2 Sep 2019 18:52:56 +0000 (20:52 +0200)]
build: only build without libcap on request
Linux FRR builds without libcap are massively slow due to the
signal-based UID/GID synchronization across threads. This disables the
automatic fallback to build without libcap; it can still be requested
with "--disable-capabilities" but if the option isn't given in either
direction and we can't find libcap that's an error now.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
circuit deletion was being enforced by sending a fake IF_DOWN_FROM_Z
event for the circuit interface. This created a problem when the
circuit was enabled again, since isisd internal state machine was
expecting to see an IF_UP_FROM_Z that never came, as the interface
had not actually gone down.
As a consequence, disabling + re-enabling isis on an interface or
area would leave interfaces in a CONFIG state, and adjacencies were
not restored. Fix this by following the state machine and simply
disabling circuits rather than attempting to delete them forcefully.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Dinesh G Dutt [Sat, 31 Aug 2019 16:24:49 +0000 (16:24 +0000)]
bgpd: Add a new command to only show failed peerings
In a data center, having 32-128 peers is not uncommon. In such a situation, to find a
peer that has failed and why is several commands. This hinders both the automatability of
failure detection and the ease/speed with which the reason can be found. To simplify this
process of catching a failure and its cause quicker, this patch does the following:
1. Created a new function, bgp_show_failed_summary to display the
failed summary output for JSON and vty
2. Created a new function to display the reset code/subcode. This is now used in the
failed summary code and in the show neighbors code
3. Added a new variable failedPeers in all the JSON outputs, including the vanilla
"show bgp summary" family. This lists the failed session count.
4. Display peer, dropped count, estd count, uptime and the reason for failure as the
output of "show bgp summary failed" family of commands
5. Added three resset codes for the case where we're waiting for NHT, waiting for peer
IPv6 addr, waiting for VRF to init.
This also counts the case where only one peer has advertised an AFI/SAFI.
The new command has the optional keyword "failed" added to the classical summary command.
The changes affect only one existing output, that of "show [ip] bgp neighbors <nbr>". As
we track the lack of NHT resolution for a peer or the lack of knowing a peer IPv6 addr,
the output of that command will show a "waiting for NHT" etc. as the last reset reason.
This patch includes update to the documentation too.
Signed-off-by: Dinesh G Dutt <5016467+ddutt@users.noreply.github.com>
Donald Sharp [Fri, 30 Aug 2019 20:14:38 +0000 (16:14 -0400)]
ospfd: Cleanup oi->obuf to always be created
This looks like a finish up of the partial cleanup that
ocurred at some point in time in the past. When we
alloc oi also always alloc the oi->obuf. When we delete
oi always delete the oi->obuf right before.
This cleans up a bunch of code to be simpler and hopefully
easier to follow.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Fri, 30 Aug 2019 10:03:09 +0000 (06:03 -0400)]
ospfd: Do not turn on write thread unless we have something in it
I am rarely seeing this crash:
r2: ospfd crashed. Core file found - Backtrace follows:
[New LWP 32748]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/ospfd'.
Program terminated with signal SIGABRT, Aborted.
2019-08-29 15:59:36,149 ERROR: assert failed at "test_ospf_sr_topo1/test_memory_leak":
Which translates to this code:
node = listhead(ospf->oi_write_q);
assert(node);
oi = listgetdata(node);
assert(oi);
So if we get into ospf_write without anything on the oi_write_q
we are stopping the program.
This is happening because in ospf_ls_upd_queue_send we are calling
ospf_write. Imagine that we have a interface already on the on_write_q
and then ospf_write handles the packet send for all functions. We
are not clearing the t_write thread and we are popping and causing
a crash.
Additionally modify OSPF_ISM_WRITE_ON(O) to not just blindly
turn on the t_write thread. Only do so if we have data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
ospfd: Remove redundant asserts
assert(oi) is impossible all listgetdata(node) directly proceeding
it already asserts here, besides a node cannot be created
with a null pointer!
If list_isempty is called directly before the listhead call
it is impossilbe that we do not have a valid pointer here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
David Lamparter [Wed, 17 Jul 2019 13:24:28 +0000 (15:24 +0200)]
bgpd: add timestamp to bgp_adj_in
If we reject a received update in a filter, it never turns into a
bgp_path_info but stays in adj_in. For that case, we don't have any
timestamp for the update.
Currently, this isn't visible anywhere; BMP will make use of this
timestamp (and we can add a CLI option if we want.)
Signed-off-by: David Lamparter <equinox@diac24.net>