Chirag Shah [Wed, 26 Jan 2022 23:48:44 +0000 (15:48 -0800)]
bgpd: copy asn of src vrf to evpn type5 route
When a unicast route from source vrf is imported into
evpn as type5 route, prepend the asn of the source vrf into
type5 asn path list.
The condition of evpn type5 prefix path info is present but
source vrf route has been cleared via clear command. In this
case existing path info needs to rewrite the source vrf asn.
prepends asn of the source vrf, but the further condition
for existing path attribute for the same route needs to prepend
source vrf asn.
Ticket: #2943080
Testing:
Before fix:
r4# clear ip bgp vrf overlay prefix 0.0.0.0/0
Route Distinguisher: 128.117.243.209:4
*> [5]:[0]:[0]:[0.0.0.0]
203.0.113.1 0 0 194 ? <--- 64512 is missing
ET:8 RT:64532:104001 Rmac:06:ec:bf:59:e8:93
After fix:
r4# clear ip bgp vrf overlay prefix 0.0.0.0/0
Route's source vrf bgp output containing ASN 64512:
r4# show bgp vrf overlay
BGP table version is 2, local router ID is 128.117.243.209, vrf id 10
Default local pref 100, local AS 64512
...
Trey Aspelund [Fri, 5 Nov 2021 15:47:57 +0000 (11:47 -0400)]
bgpd: only import specific route-types into EVIs
Prior to this we were only filtering EVPN routes from the import logic
if they were not route-type 1/2/3/5, which allowed things like RT-5s to
be imported into an L2VNI/MAC-VRF. This adds additional logic to ensure
routes are only imported into EVIs where they make sense.
No more nonsensical route importing!
Donald Sharp [Mon, 18 Oct 2021 17:49:35 +0000 (13:49 -0400)]
zebra: Fix another ships in the night issue with WFI
Effectively When bgp would send a route update down
to zebra and immediately after that a asic update
from the kernel was read. Zebra would choose the
asic update and drop the bgp update leaving us in
a state where bgp was not used as the true source.
Modify the code so that in rib_multipath_nhe
we notice that we have an unprocessed route update
from bgp. And if so just drop this kernel update
about an older version of the route since it is
no longer needed.
Ticket: 2722533 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Quentin Young [Wed, 11 Aug 2021 18:11:02 +0000 (14:11 -0400)]
watchfrr: increase restart timer 20s -> 90s
This commit:
"tools: run `vtysh -b` once for all-startup"
changed things so that `vtysh -b` is run after all daemons have started
up instead of doing it for each daemon as they are started up. This
results in one long `vtysh -b`, which for large configs and many daemons
(in the case I saw, 4 daemons and 30,000 line config) can exceed the 20
second timer watchfrr uses to kill "hung" background tasks.
Shouldn't be any harm to increasing this to 90 seconds to give us some
leeway while still making sure we kill anything truly misbehaving.
pimd: skip init of mlag roles based on the zebra capabilities message
Looks like the cap setting was added for testing mlag via zebra test cli
to config the mlag role. However it is interfering with the valid state
updates rxed from the MLAG daemon based on timing (in some cases the
MLAG state changes are rxed before the capabilities).
Donald Sharp [Thu, 10 Jun 2021 14:54:11 +0000 (10:54 -0400)]
zebra: Reduce memory usage of streams for encoding packets
For those packets that we are not sending 16k of data, but something
far less than 256 bytes. Reduce those stream sizes we allocate
to something much more reasonable.
vivek [Wed, 9 Jun 2021 20:56:22 +0000 (13:56 -0700)]
bgpd: Check L3VNI status before announcing default
Check that the L3VNI is "up" before taking action to announce or
withdraw the EVPN type-5 default based on configuration. Otherwise,
there can be timing conditions where a EVPN type-5 default route
gets announced without a VNI and with invalid route targets.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Ticket: #2684144
Reviewed By: Chirag Shah
Testing Done:
1. Rerun failed test multiple times successfully
2. Some manual testing
3. precommit and partial evpn-smoke
vivek [Wed, 10 Jun 2020 05:15:33 +0000 (22:15 -0700)]
zebra: Reset MAC's remote sequence number appropriately
When a MAC gets deleted but associated neighbors remain, the MAC is
kept in the zebra MAC database as an internal ("auto") entry. When
this happens, reset the MAC's remote sequence number. This ensures that
when the host with the MAC later comes up behind a remote VTEP, the
local switch accepts the MAC and installs it into the bridge FDB and
we don't end up in a situation where remote MACs are not installed
into the bridge FDB.
This fix is a corollary of CM-22753 and is this time done for local
MACs upon delete.
Note: Commit is marked Cumulus-only because I need to evalute more
comprehensive changes before upstreaming it.
Ticket: CM-29581
Reviewed By: As above
Testing Done:
1. Multiple rounds of manual testing
2. Two rounds of evpn-smoke, 1 round of precommit
Corey Siltala [Thu, 14 Nov 2024 19:08:28 +0000 (13:08 -0600)]
tests: Add basic multicast boundary test
Add simple test to show filtering of IGMP joins using new "ip multicast
boundary" filtering with access-lists, include test of existing prefix-
list based "ip multicast boundary oil" command.
Add new interface command ip multicast boundary ACCESSLIST4_NAME. This
allows filtering on both source and group using the extended access-list
syntax vs. group-only as with the existing "ip multicast boundary oil"
command, which uses prefix-lists. If both are configured, the prefix-
list is evaluated first. The default behavior for both prefix-lists and
access-lists remains "deny", so the prefix-list must have a terminating
"permit" statement in order to also evaluate against the access-list.
The following example denies groups in range 229.1.1.0/24 and groups in
range 232.1.1.0/24 with source 10.0.20.2:
!
ip prefix-list pim-oil-plist seq 10 deny 229.1.1.0/24
ip prefix-list pim-oil-plist seq 20 permit any
!
access-list pim-acl seq 10 deny ip host 10.0.20.2 232.1.1.0 0.0.0.255
access-list pim-acl seq 20 permit ip any any
!
interface r1-eth0
ip address 10.0.20.1/24
ip igmp
ip pim
ip multicast boundary oil pim-oil-plist
ip multicast boundary pim-acl
!
Donald Sharp [Wed, 4 Dec 2024 17:22:59 +0000 (12:22 -0500)]
tests: Convert to using `neighbor X timers connect 1` for exabgp using tests
For those tests using exabgp convert them all to use `neighbor X timers
connect 1`. I have noticed that occassionally when looking at the
support files for tests run that peers are in a wait period for
reconnecting which is longer than the test is waiting to converge.
Donald Sharp [Thu, 5 Dec 2024 15:16:03 +0000 (10:16 -0500)]
bgpd: When bgp notices a change to shared_network inform bfd of it
When bgp is started up and reads the config in *before* it has
received interface addresses from zebra, shared_network can
be set to false in this case. Later on once bgp attempts to
reconnect it will refigure out the shared_network again( because
it has received the data from zebra now ). In this case
tell bfd about it.
At startup, there is no peer up message for loc-rib instance peer.
Instead, a global peer up message with address 0.0.0.0 is sent.
Such message is wrong, violates the RFC and should be dropped by
a strict collector. Actually, the peer type message sent is wrong,
and should be set to LOC-RIB peer type.
Fix this by changing the peer type of peer up message to either
loc-rib or global instance peer type.
Fixes: 035304c25a38 ("bgpd: bmp loc-rib peer up/down for vrfs") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Donald Sharp [Wed, 4 Dec 2024 21:14:34 +0000 (16:14 -0500)]
lib: Speed up reconnection attempts for zapi
Currently the zapi reconnection is once every 10 seconds
for the first 3 times and then once every 60 seconds from then
on out. We are seeing interesting behavior under loaded systems
where zebra is just slow to come up and daemons are spending a long
time waiting to connect. Let's just make things a bit more aggressive.
Change the code to attempt to reconnect once every second for 30 seconds
and then change to once every 5 seconds from then on out.
This should help with non-integrated configuration on system startup.
Donald Sharp [Wed, 4 Dec 2024 15:47:33 +0000 (10:47 -0500)]
pimd: Prevent crash of pim when auto-rp's socket is not initialized
If the socket associated with the auto-rp fails to initialize then
the memory for the auto-rp is just dropped on the floor. Additionally
any type of attempt at using the feature will just cause pimd to crash,
when the pointer is derefed. Since it is derefed all over the place
without checking.
Clearly if you cannot bind/use the socket let's allow continuation.
Fixes: #17540 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Introduced the idea of setting the socket buffer
send/receive sizes. BSD's in general have the fun
issue of not allowing nearly as large as a size as
linux. Since the above commit was developed on linux
and not run on bsd it was never tested. Modify the
codebase to use the backoff setsockopt that we have
in the code base and use the returned values to allow
us to notice what was set and respond appropriately.
Donald Sharp [Tue, 3 Dec 2024 17:08:12 +0000 (12:08 -0500)]
lib: Fix session re-establishment
Currently if you have this sequence of events:
a) BGP starts
b) BGP reads cli that has bfd configuration
c) BGP attempts to install bfd configuration but fails because
zebra is not connected to yet
d) BGP connects to zebra
e) BGP receives resend bfd code from bfdd
f) BGP was not sending down the unsent data to bfd, never causing
the bfd session to be established.
So effectively bfd was attempting to install but failed
and then when it was asked to replay everything it decided
that the bfd information for a particular peer was actually
installed and does not need to be resent. Modify the code
such that the bfd code now tracks failed installation and
allows the resend of data to bfdd.
Mark Stapp [Wed, 30 Oct 2024 15:02:17 +0000 (11:02 -0400)]
zebra: separate zebra ZAPI server open and accept
Separate zebra's ZAPI server socket handling into two phases:
an early phase that opens the socket, and a later phase that
starts listening for client connections.
Philippe Guibert [Mon, 28 Oct 2024 17:20:13 +0000 (18:20 +0100)]
topotests: fix bmp_collector handling of empty as-path
When configuring the bgp_bmp test with an additional
peer that sends empty AS-PATH, the bmp collector is stopping:
> [2024-10-28 17:41:51] Finished dissecting data from ('192.0.2.1', 33922)
> [2024-10-28 17:41:52] Data received from ('192.0.2.1', 33922): length 195
> [2024-10-28 17:41:52] Got message type: <class 'bmp.BMPRouteMonitoring'>
> [2024-10-28 17:41:52] unpack_from requires a buffer of at least 2 bytes for unpacking 2 bytes at offset 0 (actual buffer size is 0)
> [2024-10-28 17:41:52] TCP session closed with ('192.0.2.1', 33922)
> [2024-10-28 17:41:52] Server shutting down on 192.0.2.10:1789
The parser attempts to read an empty AS-path and considers the length
value as a length in bytes, whereas RFC mentions this value as
definining the number of AS-PAths. Fix this in the parser.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 25 Oct 2024 20:06:11 +0000 (22:06 +0200)]
topotests: bmp, create shared library for bmp
The bgp_bmp and bgp_bmp_vrf tests use similar routines
to test the bmp, but are duplicates. Gather the bgp_bmp
and the bgp_bmp_vrf tests in a single bgp_bmp folder.
- Create a bgpbmp.py library under the bgp_bmp test folder
- The bgp_bmp and bgp_bmp_vrf test are renamed to bgp_bmp_1
and bgp_bmp_2 test.
- Maintain separate folder for config and output results. Adapt
the bgp_bmp library accordingly.
- The json output for bgp_bmp_2 test has no referenc to hostame.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Mon, 11 Mar 2024 10:51:55 +0000 (11:51 +0100)]
bgpd: fix use real SID in BGP nexthop tracking
When receiving an SRv6 BGP update, the nexthop tracking is used
to find out the reachability of the BGP update.
> # show bgp ipv6 vpn fd00:200::/64
> Paths: (1 available, best #1)
> [..]
> 4:4::4:4 from 4:4::4:4 (4.4.4.4)
> Origin incomplete, metric 0, localpref 100, valid, internal, best (First path received)
> Extended Community: RT:52:100
> Remote label: 16
> Remote SID: 2001:db8:f4::
> Last update: Mon Mar 11 11:50:04 2024
The IPv6 address used is the "Remote SID". Actually, this value is
incomplete. Remote SID stands for the attribute value received in BGP,
while the label value determines a complement of SRv6 SID value. The
transposition technique authorises that in BGP, and in the above case,
the incoming BGP update has used the transposition length.
When there is a transposition in the SID attribute available, use the
real SID address. The nexthop tracking will use that forged address.
> # show bgp nexthop
> Current BGP nexthop cache:
> 4:4::4:4 valid [IGP metric 30], #paths 0, peer 4:4::4:4
> gate fe80::dced:1ff:fed6:878c, if ntfp3
> Last update: Mon Mar 11 11:50:02 2024
> 2001:db8:f4:1:: valid [IGP metric 0], #paths 2
> gate fe80::dced:1ff:fed6:878c, if ntfp3
Fixes: 26c747ed6c0b ("bgpd: extend make_prefix to form srv6-based prefix") Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Philippe Guibert [Fri, 22 Nov 2024 14:57:25 +0000 (15:57 +0100)]
topotests: bgp_evpn_rt5, add test for advertise route-map service
Use the advertise route-map command, and check that it
filters out correctly the undesirable prefixes. Reversely,
check that undoing that route-map recovers all prefixes.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>