Ameya Dharkar [Mon, 22 Jun 2020 23:38:48 +0000 (16:38 -0700)]
bgpd: Incorrect auto-RT formed when L3VNI is not configured
We use ASN:VNI format to calculate auto RT for L3VNI.
When L3VNI is not configured, if we delete the configured RT, incorrect auto-RT
value is generated as VRF VNI is 0.
Fix:
Do not configure auto-RT if L3VNI is not configured.
Donald Sharp [Mon, 15 Jun 2020 14:35:50 +0000 (10:35 -0400)]
bgpd: Allow extending peer timeout in rare case
Currently the I/O pthread handles incoming/outgoing data
communication with all peers. There is no attempt at modifying
the hold timers. It's sole goal is to read/write data to appropriate
channels. All this data is handled as *events* on the master pthread
in BGP. The problem is that if the master pthread is extremely busy
then any packet read that would be treated as a keepalive event may
happen after the hold timer pops, due to the way thread events are handled
in lib/thread.c.
In a last gap attempt, if we notice that we have incoming data
to proceses on the input Queue, slightly delay the hold timer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the refactored frr-reload.py is adding 'no-header' to the
'show running' command of vtysh, but if a daemon is specified
the no-header option should only be added after the daemon name.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Jakub Urbańczyk [Sat, 13 Jun 2020 11:31:13 +0000 (13:31 +0200)]
zebra: more clean-ups in netlink code
* Use nl_attr_add32 instead of nl_attr_add where it is possible.
* Move common code from build_singlepath() and build_multipath()
to separate function.
Jakub Urbańczyk [Mon, 8 Jun 2020 21:37:26 +0000 (23:37 +0200)]
zebra: clean up netlink api
* Rename netlink utility functions like addattr to be less ambiguous
* Replace rta_attr_* functions with nl_attr_* since they introduced
inconsistencies in the code
* Add helper functions for adding rtnexthop struct to the Netlink
message
Marcel Röthke [Tue, 24 Mar 2020 13:36:04 +0000 (14:36 +0100)]
bgpd: preinitialize rtrlib tr structures
The tr_*_config structs were previously not pre initialized because
every field is initialized explicitly. But future rtrlib version will
introduce additional fields. Preinitialising the entire struct will
ensure forward compatibility.
Mark Stapp [Thu, 11 Jun 2020 15:16:02 +0000 (11:16 -0400)]
*: have daemons call frr_fini() at termination
Fix a number of library and daemon issues so that daemons can
call frr_fini() during normal termination. Without this,
temporary logging files are left behind in /var/tmp/frr/.
Donald Sharp [Thu, 11 Jun 2020 13:47:15 +0000 (09:47 -0400)]
tests: After clear give it more than 90 seconds to come up
Error Message seen:
2020-06-11 14:00:35,288 ERROR: assert failed at "test_ebgp_ecmp_topo2/test_ecmp_after_clear_bgp[redist_static]": Testcase test_ecmp_after_clear_bgp[redist_static] : Failed
Error: TIMEOUT!! BGP is not converged in 30 seconds for router r3
assert 'TIMEOUT!! BGP is not converged in 30 seconds for router r3' is True
if a retry for a failed connection is 120 seconds we should wait slightly
longer than a retry session, which this clear test was not doing.
Especially since we know our topotests are lossy on data under load.
Apparently I changed this earlier to 90 seconds, but a retry window
is 120. Not sure wtf I was thinking
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don Slice [Mon, 8 Jun 2020 18:05:40 +0000 (18:05 +0000)]
bgpd: remove extcommunity attribute on leaked route if empty
Problem reported where bgp sessions were being torn down for ibgp
peers with the reason being optional attribute error. Found that
when a route was leaked, the RTs were stripped but the actual
EXTCOMMUNUNITY attribute was not cleared so an empty ecommunity
attribute stayed in the bgp table and was sent in updates.
Ticket: CM-30000 Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Donald Sharp [Thu, 11 Jun 2020 03:12:34 +0000 (23:12 -0400)]
tests: Add some scale tests to ensure things work
Add some basic route scale tests to ensure that we can
install a large number of routes. Also grab some timings
so that we can keep track and see if anything substantially
changes over time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Chirag Shah [Fri, 29 May 2020 04:44:37 +0000 (21:44 -0700)]
yang: redefine nexthop operational model
Separate out nexthop config and operational models.
nexthop-group config model has list of nexthop-groups
where else operational nexthop group is single entity
underneath list of nexthops.
The common code is fectored into grouping to use among
config and operational model.
nexthop operational model caters to RIB operational model.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Mark Stapp [Mon, 8 Jun 2020 20:38:36 +0000 (16:38 -0400)]
sharpd,zebra: unicast support for zapi messages
Distinguish between unicast and broadcast opaque messages
in zebra handler code. Add cli and internal api changes to
have sharpd send unicast opaque messages. Add opaque cli
commands to the sharp user doc.
before the last commit, it was possible under some
circumstances to call isis_circuit_af_set on a circuit
with a NULL area, e.g. if the circuit was deconfigured
due to a validation error. While this should not happen
now, let's add an explicit check to avoid crashing if
a regression is introduced.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
if we are not able to bring a circuit up due to some config
issue, e.g. a low MTU compared to the area lsp-mtu, we should
not remove the configuration, as this will push out of sync
with the YANG state and create more issues down the line.
Instead, keeping the circuit state at C_STATE_CONF should be
sufficient.
For the specific case of the MTU mismatch above, this also means
that when we receive a new IF_UP_FROM_Z when the MTU is changed
we will be able to bring the circuit up as we should.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Jakub Urbańczyk [Sun, 24 May 2020 17:03:25 +0000 (19:03 +0200)]
zebra: convert ip rule installation to use dplane thread
* Implement new dataplane operations
* Convert existing code to use dataplane context object
* Modify function preparing netlink message to use dataplane
context object
Jakub Urbańczyk [Sun, 24 May 2020 16:44:36 +0000 (18:44 +0200)]
zebra: prepare dplane to deal with pbr rules
This commit is the first step to convert IP rule installation to
use dplane thread.
* Add dataplane's internal representation of a pbr rule
* Add dplane stats related to rules
* Introduce a new type of dplane operation
Mark Stapp [Thu, 4 Jun 2020 17:11:35 +0000 (13:11 -0400)]
lib,zebra,sharpd: modify opaque zapi message to support unicast
Start modifying the OPAQUE zapi message to include optional
unicast destination zapi client info. Add a 'decode' api and
opaque msg struct to encapsulate that optional info.
Quentin Young [Wed, 10 Jun 2020 04:20:04 +0000 (00:20 -0400)]
docker: don't fail on chown /etc/frr
If we can chown /etc/frr then fine, but there's circumstances where we
won't be able to - for instance, if running FRR in Kubernetes where
/etc/frr/* is actually a virtual filesystem.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
there are some paths, e.g. when an established neighbor
sends us hellos with a different IS level, where we go
from adj_state UP to INIT. In such cases we might not
update our SPFs or the circuit state, as the state change
function was only testing for the UP and DOWN cases.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
When a peer is bound to a peer-group, the GR flags set on the
peer are over-written.
Update the GR flags for the peer after it has been bound to a
peer-group.