Mark Stapp [Thu, 26 Mar 2020 18:11:56 +0000 (14:11 -0400)]
lib: support replacement in the nexthop-group cli
Use more limited matching logic so that nexthops within a
nexthop-group are unique based only on vrf, type, and gateway.
Treat configuration of a nexthop that matches an existing
nexthop as a replace operation.
zebra should only check whether a get_chunk operation succeeded
when processing the response, rather than insde the get_chunk
call itself. Spllitting the request and response hooks was done
precisely to allow for asynchronous calls to an external label
manager; in this case, the requested chunk is not necessarily
going to be available at request time.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Yang constraints enforced by the northbound callbacks require that
the maximum lifetime be >= than (refresh interval + 300). When we are
moving from one config to another through frr-reload.py, we issue
a number of vtysh -c commands ('no lsp-refresh-interval level-1 500',
'no max-lsp-lifetime level-1 1000'), which reset these parameters to their
default values, respectively 900 and 1200. Depending on the actual
values in the current config, the order in which these commands are sent
might be the wrong one, in that we hit an invalid intermediate state and
make vtysh (and by extension frr-reload.py) return an error.
As a workaround, let's add a one-liner command that sets all these
inter-related parameters in one go, and make isisd display them as a
single line too, so that the diff will be computed as a single command.
The old individual commands are kept to ensure backwards compatibility.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Quentin Young [Sun, 5 Apr 2020 21:11:25 +0000 (17:11 -0400)]
bgpd: fix multiple bugs with cluster_list attrs
Multiple different issues causing mostly UAFs but maybe other more
subtle things.
- Cluster lists were the only attributes whose pointers were not being
NULL'd when freed, resulting in heap UAF
- When performing an insert into the cluster hash, our temporary struct
used for hash_get() was inconsistent with our hash keying and
comparison functions. In the case of a zero length cluster list, the
->length field is 0 and the ->list field is NULL. When performing an
insert, we set the ->list field regardless of whether the length is 0.
This resulted in the two cluster lists hashing equal but not comparing
equal. Later, when removing one of them from the hash before freeing
it, because the key matched and the comparison succeeded (because it
was set to NULL *after* the search but *before* inserting into the
hash) we would sometimes release the duplicated copy of the struct,
and then free the one that remained in the hash table. Later accesses
constitute UAF. This is fixed by making sure the fields used for the
existence check match what is actually inserted into the hash when
that check fails.
This patch also makes cluster_unintern static, because it should be.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
lib: consolidate flexible array hack in a single place
Old gcc versions (< 5.x) have a bug that prevents C99 flexible
arrays from working properly on shared libraries.
We already have a hack in place to work around this problem, but it
needs to be replicated in every declaration of a frr_yang_module_info
variable within libfrr. This clearly isn't a good solution if we
consider that many more libfrr YANG modules are about to come in
the future.
This commit introduces a different workaround that operates within
the northbound layer itself, such that implementers of libfrr YANG
modules won't need to worry about this problem anymore.
lib, tools: silence harmless warnings in the northbound tools
Our two northbound tools don't have embedded YANG modules like the
other FRR binaries. As such, ly_ctx_set_module_imp_clb() shouldn't be
called when the YANG subsystem it being initialized by a northbound
tool. To make that possible, add a new "embedded_modules" parameter
to the yang_init() function to control whether libyang should look
for embedded modules or not.
With this fix, "gen_northbound_callbacks" and "gen_yang_deviations"
won't emit "YANG model X not embedded, trying external file"
warnings anymore.
David Lamparter [Thu, 2 Apr 2020 19:16:04 +0000 (21:16 +0200)]
bgpd, ospfd, ospf6d: long is not bool :(
... Oops ...
(for context, the defaults code originally didn't have a dedicated
"bool" variant and just used long for bools... I derp'd this when
adding bool as a separate case :( )
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Lamparter <equinox@diac24.net>
Stephen Worley [Wed, 1 Apr 2020 19:31:40 +0000 (15:31 -0400)]
zebra: free unhashable (dup) NHEs via ID table cleanup
Free unhashable (duplicate NHEs from the kernel) via ID table
cleanup. Since the NHE ID hash table contains extra entries,
that's the one we need to be calling zebra_nhg_hash_free()
on, otherwise we will never free the unhashable NHEs.
This was found via a memleak:
==1478713== HEAP SUMMARY:
==1478713== in use at exit: 10,267 bytes in 46 blocks
==1478713== total heap usage: 76,810 allocs, 76,764 frees, 3,901,237 bytes allocated
==1478713==
==1478713== 208 (88 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 41
==1478713== at 0x483BB1A: calloc (vg_replace_malloc.c:762)
==1478713== by 0x48E35E8: qcalloc (memory.c:110)
==1478713== by 0x451CCB: zebra_nhg_alloc (zebra_nhg.c:369)
==1478713== by 0x453DE3: zebra_nhg_copy (zebra_nhg.c:379)
==1478713== by 0x452670: nhg_ctx_process_new (zebra_nhg.c:1143)
==1478713== by 0x4523A8: nhg_ctx_process (zebra_nhg.c:1234)
==1478713== by 0x452A2D: zebra_nhg_kernel_find (zebra_nhg.c:1294)
==1478713== by 0x4326E0: netlink_nexthop_change (rt_netlink.c:2433)
==1478713== by 0x427320: netlink_parse_info (kernel_netlink.c:945)
==1478713== by 0x432DAD: netlink_nexthop_read (rt_netlink.c:2488)
==1478713== by 0x41B600: interface_list (if_netlink.c:1486)
==1478713== by 0x457275: zebra_ns_enable (zebra_ns.c:127)
Repro with:
ip next add id 1 blackhole
ip next add id 2 blackhole
valgrind /usr/lib/frr/zebra
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
lynne [Sun, 29 Mar 2020 17:47:36 +0000 (13:47 -0400)]
ldpd: fixing host-only configuration filter.
There is configuration in LDP to only create labels for
host-routes. If the user remove this configuration the code
was not readvertising non-host routes to it's LDP neighbors.
The issue is the same in reverse also. If the user adds this
configuration on an active LDP session the non-host routes were
not withdrawn.
Donald Sharp [Tue, 31 Mar 2020 11:55:17 +0000 (07:55 -0400)]
ospf6d: Recent changes in our build cause const to be respected
We are seeing this crash:
New LWP 7673]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/ospf6d -d -F datacenter -M snmp -A ::1'.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
vtysh=vtysh@entry=0) at lib/command.c:1288
(gdb)
The command entered is `debug ospf6 lsa inter-router examin`. Code
inspection leads us to the fact that FRR is declaring the data as
const but we are attempting to modify it, causing the crash.
Remvoe the const of this set/get and let things work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Donald Sharp [Tue, 31 Mar 2020 22:38:01 +0000 (18:38 -0400)]
tests: More cbit extensions
We are still seeing cbit test failures in the ci system. I am
gonna try extending the timeout a bit more as that 8 seconds
doesn't seem to be long enough.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
vivek [Sat, 28 Mar 2020 19:15:01 +0000 (12:15 -0700)]
tests: Add tests for BGP link-bandwidth and weighted ECMP
Implement tests to verify BGP link-bandwidth and weighted ECMP
functionality. These tests validate one of the primary use cases for
weighted ECMP (a.k.a. Unequal cost multipath) using BGP link-bandwidth:
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz
The included tests are:
Test #1: Test BGP link-bandwidth advertisement based on number of multipaths
Test #2: Test cumulative link-bandwidth propagation
Test #3: Test weighted ECMP - multipath with next hop weights
Test #4: Test weighted ECMP rebalancing upon change (link flap)
Test #5: Test weighted ECMP for a second anycast IP
Test #6: Test paths with and without link-bandwidth - receiver should resort to regular ECMP
Test #7: Test different options for processing link-bandwidth on the receiver
vivek [Tue, 24 Mar 2020 22:00:56 +0000 (15:00 -0700)]
bgpd: Ensure RMAC extended community is unique
The BGP Router MAC extended community should be unique and not occur
multiple times. In a VRF-to-VRF route-leak scenario where EVPN routes
from a source VRF are leaked into the target VRF and then injected
back into EVPN from the target VRF, the resulting route had more than
one RMAC. With this fix, the resulting route will have only the
target VRF's RMAC.
vivek [Tue, 24 Mar 2020 21:58:42 +0000 (14:58 -0700)]
bgpd: Allow generating EVPN type-5 routes with existing extended community
The EVPN advertise route-map may generate extended communities for an IPv4
or IPv6 route injected into EVPN as type-5. If so, allow for it and add
to it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
vivek [Tue, 24 Mar 2020 21:38:37 +0000 (14:38 -0700)]
bgpd: Implement options for link bandwidth handling
Support configurable options to control how link bandwidth is handled
by the receiver. The default behavior is to automatically honor the
link bandwidths received and use it to perform a weighted ECMP BUT only
if all paths in the multipath have associated link bandwidth; if one or
more paths do not have link bandwidth, normal ECMP is performed among
the multipaths. This behavior is as recommended by
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth.
The additional options available are to (a) completely ignore any link
bandwidth (i.e., weighted ECMP is effectively disabled), (b) skip paths
in the multipath which do not have link bandwidth and perform weighted
ECMP among the other paths (if at least some paths have the bandwidth)
or (c) use a default weight (value chosen is 1) for the paths which
do not have link bandwidth.
The command syntax is
bgp bestpath bandwidth <ignore|skip-missing|default-weight-for-missing>
vivek [Tue, 24 Mar 2020 21:25:56 +0000 (14:25 -0700)]
bgpd: Announce cumulative link bandwidth to EBGP peers
When announcing ourselves as the next hop (e.g., to EBGP peers), if the
best path has the link bandwidth extended community and it is transitive,
change the value of the link bandwidth to the cumulative downstream
bandwidth (sum of the link bandwidths of all our multipaths) as this
makes the most sense. It is also implied by
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Of course, do
not override the link bandwidth if it has been specified by policy.
Note: Transitive extended communities will be automatically passed along
to EBGP peers; this commit is updating the value that is announced to
something that is the most appropriate.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
vivek [Tue, 24 Mar 2020 20:57:44 +0000 (13:57 -0700)]
bgpd: Additional options for generating link bandwidth
Implement the code to handle the other route-map options to generate
the link bandwidth, namely, to use the cumulative bandwidth or to
base this on the number of multipaths. In the latter case, a reference
bandwidth is internally chosen - the implementation uses a value of
1 Gbps.
These additional options mean that the prefix may need to be advertised
if there is a link bandwidth change, which is a new criteria. Define a
new path (change) flag to support this and implement the advertisement.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
vivek [Tue, 24 Mar 2020 20:53:09 +0000 (13:53 -0700)]
bgpd: Ensure link bandwidth extcommunity is not repeated
The BGP link bandwidth extended community must not be repeated. If the
attribute already carries this and the route-map specifies a new value,
the implementation will honor the policy configuration and overwrite
the existing values.
vivek [Tue, 24 Mar 2020 20:50:20 +0000 (13:50 -0700)]
bgpd: Ability to add/update unique extended communities
Certain extended communities cannot be repeated. An example is the
BGP link bandwidth extended community. Enhance the extended community
add function to ensure uniqueness, if requested.
Note: This commit does not change the lack of uniqueness for any of
the already-supported extended communities. Many of them such as the
BGP route target can obviously be present multiple times. Others like
the Router's MAC should most probably be present only once. The portions
of the code which add these may already be structured such that duplicates
do not arise.
vivek [Tue, 24 Mar 2020 19:25:28 +0000 (12:25 -0700)]
bgpd: Install multipath routes with weights
Perform weighted ECMP if the multipaths have link bandwidth. This involves
assigning weights to each of the next hops associated with the prefix based
on the link bandwidth of the corresponding path as a factor of the total
(cumulative) link bandwidth for the prefix. The weight values used are
between 1 and 100. Weights are assigned only if all paths in the multipath
have link bandwidth, otherwise any bandwidths are ignored and regular
ECMP is performed. This is as recommended in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
A subsequent commit will implement additional (user-configurable) behaviors.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
vivek [Tue, 24 Mar 2020 19:19:37 +0000 (12:19 -0700)]
bgpd: Add link-bandwidth fields for multipath calc
Introduce fields in the multipath structure for link bandwidth handling.
In the process, the mp_count field is changed to a uint16 as that is the
value set anyway.
vivek [Tue, 24 Mar 2020 18:50:44 +0000 (11:50 -0700)]
bgpd: Add link bandwidth route-map commands
Implement route-map option to set the link-bandwidth extended
community. The command is of the form:
set extcommunity bandwidth <(1-26214400)|cumulative|num-multipaths>
[non-transitive]
The options available are to specify the actual bandwidth value in
Mbps, base it on the cumulative downstream bandwidth or base it on
the number of multipaths. The last option is based on
https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz. Further,
in alignment with the use case described in this IETF draft, the
extended community is encoded as transitive by default. There is an
option available to specify that it should be non-transitive.
The link-bandwidth itself is carried in bytes per second as specifed in
https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth
Note: This commit only handles the processing for bandwidth specifed
as a value; subsequent commits will handle the processing of the other
options.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Quentin Young [Mon, 30 Mar 2020 18:47:15 +0000 (14:47 -0400)]
bgpd: display ingress packet queue size
In the past, we always displayed the number of buffered ingress packets
as zero because there was no packet buffering in the input path and
therefore never any queue size to report. They're buffered now so we can
display something meaningful instead of 0.
Also change the inq / outq lookups to be atomic, since they can be
modified elsewhere. These should still compile down to an unfenced word
read but it's good to be explicit.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>