Christian Hopps [Fri, 24 Feb 2023 01:23:51 +0000 (20:23 -0500)]
lib: fix init. use of nb_context to be by value not by reference
Pass context argument by value on initialization to be clear that the
value is used/saved but not a pointer to the value. Previously the
northbound code was incorrectly holding a pointer to stack allocated
context structs.
However, the structure definition also had some musings (ifdef'd out
code) and a comment that might be taken to imply that user data could
follow the structure and thus be maintained by the code; it won't; so it
can't; so get rid of the disabled misleading code/text from the
structure definition.
The common use case worked b/c the transaction which cached the pointer
was created and freed inside a single function
call (`nb_condidate_commit`) that executed below the stack allocation.
All other use cases (grpc, confd, sysrepo, and -- coming soon -- mgmtd)
were bugs.
Donald Sharp [Thu, 23 Feb 2023 18:29:32 +0000 (13:29 -0500)]
bgpd: Flowspec overflow issue
According to the flowspec RFC 8955 a flowspec nlri is <length, <nlri data>>
Specifying 0 as a length makes BGP get all warm on the inside. Which
in this case is not a good thing at all. Prevent warmth, stay cold
on the inside.
Reported-by: Iggy Frankovic <iggyfran@amazon.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
David Lamparter [Fri, 6 May 2022 13:39:26 +0000 (15:39 +0200)]
pimd: don't try to check RPF for incoming SSM data
For incoming no-receiver SSM traffic, there isn't going to be a RP, much
less a RPF. We should install an MFC entry with empty oif regardless,
so we don't get swamped with further notifications.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
David Lamparter [Wed, 1 Jun 2022 07:54:31 +0000 (09:54 +0200)]
pimd: try to reinstall MFC when we get NOCACHE
Whether due to a pimd bug, some expiry, or someone just deleting MFC
entries, when we're in NOCACHE we *know* there's no MFC entry. Add an
install call to make sure pimd's MFC view aligns with the actual kernel
MFC.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donatas Abraitis [Wed, 22 Feb 2023 20:22:28 +0000 (22:22 +0200)]
bgpd: Align `show bgp ...` output with the header for wide option
Before:
```
r1# sh ip bgp wide
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* 172.16.255.254/32 192.168.2.2 0 0 (65003) i
*> 192.168.1.2 0 0 (65002) i
Displayed 1 routes and 2 total paths
r1#
```
After:
```
r1# sh ip bgp wide
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* 172.16.255.254/32 192.168.2.2 0 0 (65003) i
*> 192.168.1.2 0 0 (65002) i
Donald Sharp [Wed, 22 Feb 2023 16:38:00 +0000 (11:38 -0500)]
bgpd: Give better debug message when configuration is being read in
Sometimes bgp connections can be rejected for a variety of reasons. Give
a bit more context about what is going wrong so that the operator can
make better decisions about their network.
David Lamparter [Thu, 10 Mar 2022 12:59:26 +0000 (13:59 +0100)]
pimd: make logs useful for input drops
This path here is pretty far on top of the list of issues that operators
will run into and have to debug when setting up PIM. Make the log
messages actually tell what's going on. Also escalate some from
`debug mroute detail` to `debug mroute`.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Donatas Abraitis [Tue, 21 Feb 2023 21:10:45 +0000 (23:10 +0200)]
bgpd: Pass global ASN for confederation peers if not AS_SPECIFIED
When we specify remote-as as external/internal, we need to set local_as to
bgp->as, instead of bgp->confed_id. Before this patch, (bgp->as != *as) is
always valid for such a case because *as is always 0.
Also, append peer->local_as as CONFED_SEQ to avoid other side withdrawing
the routes due to confederation own AS received and/or malformed as-path.
Trey Aspelund [Fri, 17 Feb 2023 21:47:09 +0000 (21:47 +0000)]
lib: skip route-map optimization if !AF_INET(6)
Currently we unconditionally send a prefix through the optimized
route-map codepath if the v4 and v6 LPM tables have been allocated and
optimization has not been disabled.
However prefixes from address-families that are not IPv4/IPv6 unicast
always fail the optimized route-map index lookup, because they occur on
an LPM tree that is IPv4 or IPv6 specific.
e.g.
Even if you have an empty permit route-map clause, Type-3 EVPN routes
are always denied:
```
--config
route-map soo-foo permit 10
--logs
2023/02/17 19:38:42 BGP: [KZK58-6T4Y6] No best match sequence for pfx: [3]:[0]:[32]:[2.2.2.2] in route-map: soo-foo, result: no match
2023/02/17 19:38:42 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: [3]:[0]:[32]:[2.2.2.2], result: deny
```
There is some existing code that creates an AF_INET/AF_INET6 prefix
using the IP/prefix information from a Type-2/5 EVPN route, which
allowed only these two route-types to successfully attempt an LPM lookup
in the route-map optimization trees via the converted prefix.
This commit does 3 things:
1) Reverts to non-optimized route-map lookup for prefixes that are not
AF_INET or AF_INET6.
2) Cleans up the route-map code so that the AF check is part of the
index lookup + the EVPN RT-2/5 -> AF_INET/6 prefix conversion occurs
outside the index lookup.
3) Adds "debug route-map detail" logs to indicate when we attempt to
convert an AF_EVPN prefix into an AF_INET/6 prefix + when we fallback
to a non-optimized lookup.
Additional functionality for optimized lookups of prefixes from other
address-families can be added prior to the index lookup, similar to how
the existing EVPN conversion works today.
New behavior:
```
2023/02/17 21:44:27 BGP: [WYP1M-NE4SY] Converted EVPN prefix [5]:[0]:[32]:[192.0.2.7] into 192.0.2.7/32 for optimized route-map lookup
2023/02/17 21:44:27 BGP: [MT1SJ-WEJQ1] Best match route-map: soo-foo, sequence: 10 for pfx: 192.0.2.7/32, result: match
2023/02/17 21:44:27 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: 192.0.2.7/32, result: permit
2023/02/17 21:44:27 BGP: [WYP1M-NE4SY] Converted EVPN prefix [2]:[0]:[48]:[aa:bb:cc:00:22:22]:[32]:[20.0.0.2] into 20.0.0.2/32 for optimized route-map lookup
2023/02/17 21:44:27 BGP: [MT1SJ-WEJQ1] Best match route-map: soo-foo, sequence: 10 for pfx: 20.0.0.2/32, result: match
2023/02/17 21:44:27 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: 20.0.0.2/32, result: permit
2023/02/17 21:44:27 BGP: [KHG7H-RH4PN] Unable to convert EVPN prefix [3]:[0]:[32]:[2.2.2.2] into IPv4/IPv6 prefix. Falling back to non-optimized route-map lookup
2023/02/17 21:44:27 BGP: [MT1SJ-WEJQ1] Best match route-map: soo-foo, sequence: 10 for pfx: [3]:[0]:[32]:[2.2.2.2], result: match
2023/02/17 21:44:27 BGP: [H5AW4-JFYQC] Route-map: soo-foo, prefix: [3]:[0]:[32]:[2.2.2.2], result: permit
```
Martin Winter [Sat, 18 Feb 2023 01:16:26 +0000 (02:16 +0100)]
tests: Change bgp_gr_retained_routes to use json output of "ip route"
Depending on ip_route and kernel, the output might include a nhid
which causes the test to fail with a strict text output check.
Change to json output to avoid the issue
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Donatas Abraitis [Fri, 17 Feb 2023 16:28:17 +0000 (18:28 +0200)]
bgpd: Deprecate BGP `internet` community
Quite a few well-known communities from IANA's list do
not receive special treatment in Cisco IOS XR, and at least one
community on Cisco IOS XR's special treatment list, internet == 0:0,
is not formally a well-known community as it is not in [IANA-WKC] (it
is taken from the Reserved range [0x00000000-0x0000FFFF]).
https://datatracker.ietf.org/doc/html/rfc8642
This is Cisco-specific command which is causing lots of questions when it
comes to debugging and/or configuring it properly, but overall, this behavior
is very odd and it's not clear how it should be treated between different
vendor implementations.
Let's deprecate it and let the operators use 0:0/0 communities as they want.
Trey Aspelund [Mon, 13 Feb 2023 22:14:16 +0000 (22:14 +0000)]
tests: update tests using 'show bgp json detail'
There were a few tests using "show bgp ... json detail" that did json
comparisons against a predefined json structure. This updates those
predefined json structures to match the new format of the output.
(new output moves path array under "paths" key and adds header keys)
Trey Aspelund [Fri, 10 Feb 2023 19:05:27 +0000 (19:05 +0000)]
bgpd: fix 'json detail' output structure
"show bgp <afi> <safi> json detail" was incorrectly displaying header
information from route_vty_out_detail_header() as an element of the
"paths" array. This corrects the behavior for 'json detail' so that a
route holds a dictionary with keys for "paths" and header info, which
aligns with how we structure the output for a specific prefix, e.g.
"show bgp <afi> <safi> <prefix> json".
Before:
```
ub20# show ip bgp json detail
{
"vrfId": 0,
"vrfName": "default",
"tableVersion": 3,
"routerId": "100.64.0.222",
"defaultLocPrf": 100,
"localAS": 1,
"routes": { "2.2.2.2/32": [
{ <<<<<<<<< should be outside the array
"prefix":"2.2.2.2/32",
"version":1,
"advertisedTo":{
"192.168.122.12":{
"hostname":"ub20-2"
}
}
},
{
"aspath":{
"string":"Local",
"segments":[
],
"length":0
},
<snip>
```
Donald Sharp [Thu, 2 Feb 2023 21:28:27 +0000 (16:28 -0500)]
lib: Fix non-use of option
Commit d7c6467ba2f55d1055babbb7fe82716ca3efdc7e added the
ability to specify non pretty printing but unfortunately
forgot to use the option variable to make the whole
thing work.
vivek [Fri, 18 Dec 2020 18:55:40 +0000 (10:55 -0800)]
bgpd: Prevent multipathing among EVPN and non-EVPN paths
Ensure that a multipath set is fully comprised of EVPN paths (i.e.,
paths imported into the VRF from EVPN address-family) or non-EVPN
paths. This is actually a condition that existed already in the code
but was not properly enforced.
This change, as a side effect, eliminates the known trigger condition
for bad or missing RMAC programming in an EVPN deployment, described
in tickets CM-29043 and CM-31222. Routes (actually, paths) in a VRF
routing table that require VXLAN tunneling to the next hop currently
need some special handling in zebra to deal with the nexthop (neigh)
and RMAC programming, and this is implemented for the entire route
(prefix), not per-path. This can lead to the bad or missing RMAC
situation, which is now eliminated by ensuring all paths in the route
are 'similar'.
The longer-term solution in CL 5.x will be to deal with the special
programming by means of explicit communication between bgpd and zebra.
This is already implemented for EVPN-MH via CM-31398. These changes
will be extended to non-MH also and the special code in zebra removed
or refined.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com> Acked-by: Trey Aspelund <taspelund@nvidia.com> Acked-by: Anuradha Karuppiah <anuradhak@nvidia.com> Acked-by: Chirag Shah <chirag@nvidia.com>
Ticket: CM-29043
Testing Done:
1. Manual testing
2. precommit on both MLX and BCM platforms
3. evpn-smoke - BCM and VX
vivek [Thu, 3 Dec 2020 04:04:19 +0000 (20:04 -0800)]
bgpd: Fix deterministic-med check for stale paths
When performing deterministic MED processing, ensure that the peer
status is not checked when we encounter a stale path. Otherwise, this
path will be skipped from the DMED consideration leading to it potentially
not being installed.
Test scenario: Consider a prefix with 2 (multi)paths. The peer that
announces the path with the winning DMED undergoes a graceful-restart.
Before it comes back up, the other path goes away. Prior to the fix, a
third router that receives both these paths would have ended up not
having any path installed to the prefix after the above events.
Karl Quan [Tue, 14 Feb 2023 15:54:59 +0000 (07:54 -0800)]
bgpd: BGP troubleshooting - Add a keyword self-originate to display only self-originated prefixes when looking at the BGP table for a given address-family
Add a keyword self-originate" to extend current CLI commands to filter out self-originated routes only
a\) CLI to show ipv4/ipv6 self-originated routes
"show [ip] bgp [afi] [safi] [all] self-originate [wide|json]"
b\) CLI to show evpn self-originated routes
"show bgp l2vpn evpn route [detail] [type <ead|macip|multicast|es|prefix|1|2|3|4|5>] self-originate [json]"
```
% ./gobgp neighbor 192.168.10.124
BGP neighbor is 192.168.10.124, remote AS 65001
BGP version 4, remote router ID 200.200.200.202
BGP state = ESTABLISHED, up for 00:01:49
BGP OutQ = 0, Flops = 0
Hold time is 3, keepalive interval is 1 seconds
Configured hold time is 90, keepalive interval is 30 seconds
Neighbor capabilities:
multiprotocol:
ipv4-unicast: advertised and received
ipv6-unicast: advertised
route-refresh: advertised and received
extended-nexthop: advertised
Local: nlri: ipv4-unicast, nexthop: ipv6
UnknownCapability(6): received
UnknownCapability(9): received
graceful-restart: advertised and received
Local: restart time 10 sec
ipv6-unicast
ipv4-unicast
Remote: restart time 120 sec, notification flag set
ipv4-unicast, forward flag set
4-octet-as: advertised and received
add-path: received
Remote:
ipv4-unicast: receive
enhanced-route-refresh: received
long-lived-graceful-restart: advertised and received
Local:
ipv6-unicast, restart time 10 sec
ipv4-unicast, restart time 20 sec
Remote:
ipv4-unicast, restart time 0 sec, forward flag set
fqdn: advertised and received
Local:
name: donatas-pc, domain:
Remote:
name: spine1-debian-11, domain:
software-version: advertised and received
Local:
GoBGP/3.10.0
Remote:
FRRouting/8.5-dev-MyOwnFRRVersion-gdc92f44a45-dirt
cisco-route-refresh: received
Message statistics:
```
Philippe Guibert [Mon, 13 Feb 2023 11:18:33 +0000 (12:18 +0100)]
bgpd: clarify when the vpnv6 nexthop length must be modified
Using a route-map to update the local ipv6 address has to be
better clarified. Actually, when a VPN SAFI is used, the nexthop
length must be changed to 48 bytes. Other cases, the length will
be 32 bytes.
Fixes: 9795e9f23465 ("bgpd: fix when route-map changes the link local
nexthop for vpnv6")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zyxwvu Shi [Wed, 15 Feb 2023 15:55:00 +0000 (23:55 +0800)]
zebra: Fix other table inactive when ip import-table is on
In `rib_link`, if is_zebra_import_table_enabled returns
true, `rib_queue_add` will not called, resulting in other
table route node never processed. This actually should not
be dependent on whether the route is imported.
In `rib_delnode`, if is_zebra_import_table_enabled returns
true, it will use `rib_unlink` instead of enqueuing the
route node for process. There is no reason that imported
route nodes should not be reprocessed. Long ago, the
behaviour was dependent on whether the route_entry comes
from a table other than main.