Donald Sharp [Fri, 4 Nov 2022 00:39:39 +0000 (20:39 -0400)]
bgpd: Limit snmp trap for backwards state movement from established
Currently the bgp mib specifies two traps:
a) Into established state
b) transition backwards from a state
b) really is an interesting case. It means transitioning
from say established to starting over. It can also
mean when bgp is trying to connect and that fails and
the state transitions backwards.
Now let's imagine 500 peers with tight timers (say a data center)
and there is network trauma you have just created an inordinately
large number of traps for each peer.
Let's limit FRR to changing from the old status as Established
to something else. This will greatly limit the trap but it
will also be something end operators are actually interested in.
I actually had several operators say they had to write special code
to ignore all the backward state transitions that they didn't care
about.
Donald Sharp [Wed, 2 Nov 2022 17:24:48 +0000 (13:24 -0400)]
bgpd: Ensure that bgp open message stream has enough data to read
If a operator receives an invalid packet that is of insufficient size
then it is possible for BGP to assert during reading of the packet
instead of gracefully resetting the connection with the peer.
anlan_cs [Mon, 31 Oct 2022 10:14:07 +0000 (06:14 -0400)]
include: remove one unused macro
In "rtnetlink.h", four items are a group, e.g. 116/117/118/119 should be
a group. But "RTM_SETHWFLAGS" is not in use and has nothing to do with
"NEXTHOPBUCKET".
After comparing with kernel header, better remove it imo.
`srv6_locator_chunk_free()` is a wrapper around the `XFREE()` macro.
Passing a NULL pointer to `XFREE()` is safe. Therefore, checking that
the pointer passed to the `srv6_locator_chunk_free()` is not null is
unnecessary.
`srv6_locator_chunk_free()` takes care of freeing the memory allocated
for a `struct srv6_locator_chunk` and setting the
`struct srv6_locator_chunk` pointer to NULL.
It is not necessary to explicitly set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
lib, bgpd: Enhance `srv6_locator_chunk_free()` API
A programmer can use the `srv6_locator_chunk_free()` function to free
the memory allocated for a `struct srv6_locator_chunk`.
The programmer invokes `srv6_locator_chunk_free()` by passing a single
pointer to the `struct srv6_locator_chunk` to be freed.
`srv6_locator_chunk_free()` uses `XFREE()` to free the memory.
It is the responsibility of the programmer to set the
`struct srv6_locator_chunk` pointer to NULL after freeing memory with
`srv6_locator_chunk_free()`.
This commit modifies the `srv6_locator_chunk_free()` function to take a
double pointer instead of a single pointer. In this way, setting the
`struct srv6_locator_chunk` pointer to NULL is no longer the
programmer's responsibility but is the responsibility of
`srv6_locator_chunk_free()`. This prevents programmers from making
mistakes such as forgetting to set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Donald Sharp [Thu, 13 Oct 2022 19:54:43 +0000 (15:54 -0400)]
pimd: Remove pim_br vestiges
If PIM had received a register packet with the Border Router
bit set, pimd would have crashed. Since I wrote this code
in 2015 and really have pretty much no memory of this and
no-one has ever reported this crash, let's just remove this
code.
Louis Scalbert [Wed, 19 Oct 2022 12:34:43 +0000 (14:34 +0200)]
lib: fix the default TE bandwidth
When enabling the interface link-params, a default bandwidth is assigned
to the Max, Reservable and Unreserved Bandwidth variables. If the
bandwidth is set at in the interface context, this value is used.
Otherwise, a default bandwidth value of 10 Gbps is set.
Revert the default value to 10 Mbps as it was intended in the initial
commit. 10 Mbps is a low value so that the link will not be prioritized
when computing the paths.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Donald Sharp [Thu, 27 Oct 2022 13:21:41 +0000 (09:21 -0400)]
bgpd: Clarify what NHT error message means
When waiting on a path to reach the peer, modify the debug/show
output to give a better understanding to the operator about what
they should be looking for.
Sai Gomathi N [Thu, 27 Oct 2022 09:36:00 +0000 (02:36 -0700)]
pimd: Dereference before null check
In pim_ecmp_nexthop_search: All paths that lead to this null pointer comparison already dereference the pointer earlier
There may be a null pointer dereference, or else the comparison against null is unnecessary.
Sai Gomathi N [Thu, 27 Oct 2022 08:52:31 +0000 (01:52 -0700)]
pimd: Unchecked return value
In tib_sg_oil_setup: Value returned from a function is not checked for errors before being used.
If the function returns an error value, the error value may be mistaken for a normal value.
Here, only the nexthop value is being used. So casted the return type to void.
Ryoga Saito [Thu, 27 Oct 2022 01:17:50 +0000 (10:17 +0900)]
bgpd: Fix the condition whether nexthop is changed
Given that the following topology, route server MUST not modify NEXT_HOP
attribute because route server isn't in the actual routing path. This
behavior is required to comply RFC7947
(Router A) <-(eBGP peer)-> (Route Server) <-(eBGP peer)-> (Router B)
RFC7947 says as follows:
> As the route server does not participate in the actual routing of
> traffic, the NEXT_HOP attribute MUST be passed unmodified to the route
> server clients, similar to the "third-party" next-hop
> feature described in Section 5.1.3. of [RFC4271].
However, current FRR is violating RFC7947 in some cases. If routers and
route server established BGP peer over IPv6 connection and routers
advertise ipv4-vpn routes through route server, route server will modify
NEXT_HOP attribute in these advertisements.
This is because the condition to check whether NEXT_HOP attribute should
be changed or not is wrong. We should use (afi, safi) as the key to
check, but (nhafi, safi) is actually used. This causes the RFC7947
violation.
Trey Aspelund [Wed, 26 Oct 2022 20:53:09 +0000 (20:53 +0000)]
bgpd: Check for IP-format Site-of-Origin
When deciding whether to apply "neighbor soo" filtering towards a peer,
we were only looking for SoO ecoms that use either AS or AS4 encoding.
This makes sure we also check for IPv4 encoding, since we allow a user
to configure that encoding style against the peer.
Config:
```
router bgp 1
address-family ipv4 unicast
network 100.64.0.2/32 route-map soo-foo
neighbor 192.168.122.12 soo 3.3.3.3:20
exit-address-family
!
route-map soo-foo permit 10
set extcommunity soo 3.3.3.3:20
exit
```
Before:
```
ub20# show ip bgp neighbors 192.168.122.12 advertised-routes
BGP table version is 5, local router ID is 100.64.0.222, vrf id 0
Default local pref 100, local AS 1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 2.2.2.2/32 0.0.0.0 0 100 32768 i
*> 100.64.0.2/32 0.0.0.0 0 100 32768 i
Total number of prefixes 2
```
After:
```
ub20# show ip bgp neighbors 192.168.122.12 advertised-routes
BGP table version is 5, local router ID is 100.64.0.222, vrf id 0
Default local pref 100, local AS 1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 2.2.2.2/32 0.0.0.0 0 100 32768 i
Donald Sharp [Fri, 21 Oct 2022 11:20:44 +0000 (07:20 -0400)]
zebra: Fix handling of recursive routes when processing closely in time
When zebra receives routes from upper level protocols it decodes the
zapi message and places the routes on the metaQ for processing. Suppose
we have a route A that is already installed by some routing protocol.
And there is a route B that has a nexthop that will be recursively
resolved through A. Imagine if a route replace operation for A is
going to happen from an upper level protocol at about the same time
the route B is going to be installed into zebra. If these routes
are received, and decoded, at about the same time there exists a
chance that the metaQ will contain both of them at the same time.
If the order of installation is [ B, A ]. B will be resolved
correctly through A and installed, A will be processed and
re-installed into the FIB. If the nexthops have changed for
A then the owner of B should be notified about the change( and B
can do the correct action here and decide to withdraw or re-install ).
Now imagine if the order of routes received for processing on the
metaQ is [ A, B ]. A will be received, processed and sent to the
dataplane for reinstall. B will then be pulled off the metaQ and
fail the install since A is in a `not Installed` state.
Let's loosen the restriction in nexthop resolution for B such
that if the route we are dependent on is a route replace operation
allow the resolution to suceed. This requires zebra to track a new
route state( ROUTE_ENTRY_ROUTE_REPLACING ) that can be looked at
during nexthop resolution. I believe this is ok because A is
a route replace operation, which could result in this:
-route install failed, in which case B should be nht'ing and
will receive the nht failure and the upper level protocol should
remove B.
-route install succeeded, no nexthop changes. In this case
allowing the resolution for B is ok, NHT will not notify the upper
level protocol so no action is needed.
-route install succeeded, nexthops changes. In this case
allowing the resolution for B is ok, NHT will notify the upper
level protocol and it can decide to reinstall B or not based
upon it's own algorithm.
This set of events was found by the bgp_distance_change topotest(s).
Effectively the tests were looking for the bug ( A, B order in the metaQ )
as the `correct` state. When under very heavy load, the A, B ordering
caused A to just be installed and fully resolved in the dataplane before
B is gotten to( which is entirely possible ).
David Lamparter [Tue, 4 Oct 2022 16:44:36 +0000 (18:44 +0200)]
build, vtysh: extract vtysh commands from .xref
Rather than running selected source files through the preprocessor and a
bunch of perl regex'ing to get the list of all DEFUNs, use the data
collected in frr.xref.
This not only eliminates issues we've been having with preprocessor
failures due to nonexistent header files, but is also much faster.
Where extract.pl would take 5s, this now finishes in 0.2s. And since
this is a non-parallelizable build step towards the end of the build
(dependent on a lot of other things being done already), the speedup is
actually noticeable.
Also files containing CLI no longer need to be listed in `vtysh_scan`
since the .xref data covers everything. `#ifndef VTYSH_EXTRACT_PL`
checks are equally obsolete.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Lou Berger [Fri, 21 Oct 2022 20:35:13 +0000 (20:35 +0000)]
ospf: optimization for FRR's P2MP mode
FRR implements a non-standard, but compatible approach for
sending update LSAs (it always send to 224.0.0.5) on P2MP
interfaces. This change makes it so acks are also sent to
224.0.0.5.
Since the acks are multicast, this allows an optimization
where we don't send back out the incoming P2MP interface
immediately allow time to rx multicast ack from neighbors
on the same net that rx'ed the original (multicast) update.
Wayne Morrison [Tue, 25 Oct 2022 14:45:35 +0000 (10:45 -0400)]
bgpd: fixed misaligned columns in BGP routes table
Column headers in BGP routes table are not aligned with data when
RPKI status is available. This was fixed to insert a space at the
beginning of the header and at the beginning of lines that do not
have RPKI status.
This fix requires that several testing templates be adjusted to
match the new output.
Signed-off-by: Wayne Morrison <wmorrison@netgate.com>
Manoj Naragund [Tue, 25 Oct 2022 07:43:10 +0000 (00:43 -0700)]
ospf6d: Fix for memory leak issues in ospf6.
Problem:
Multiple memory leaks in ospf6.
260 ==6637== 32 bytes in 1 blocks are definitely lost in loss record 5 of 24
261 ==6637== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
262 ==6637== by 0x4E8A1BF: qcalloc (memory.c:111)
263 ==6637== by 0x11EE27: ospf6_summary_add_aggr_route_and_blackhole (ospf6_asbr.c:2779)
264 ==6637== by 0x11EEBA: ospf6_originate_new_aggr_lsa (ospf6_asbr.c:2811)
265 ==6637== by 0x4E7C6A7: hash_clean (hash.c:325)
266 ==6637== by 0x11FA93: ospf6_handle_external_aggr_update (ospf6_asbr.c:3164)
267 ==6637== by 0x11FA93: ospf6_asbr_summary_process (ospf6_asbr.c:3386)
268 ==6637== by 0x4EB739B: thread_call (thread.c:1692)
269 ==6637== by 0x4E85B17: frr_run (libfrr.c:1068)
270 ==6637== by 0x119535: main (ospf6_main.c:228)
356 ==6637== 240 bytes in 12 blocks are indirectly lost in loss record 13 of 24
357 ==6637== at 0x4C2FE96: malloc (vg_replace_malloc.c:309)
358 ==6637== by 0x4E8A0DA: qmalloc (memory.c:106)
359 ==6637== by 0x13545C: ospf6_lsa_alloc (ospf6_lsa.c:724)
360 ==6637== by 0x1354E3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
361 ==6637== by 0x1355F2: ospf6_lsa_copy (ospf6_lsa.c:790)
362 ==6637== by 0x13B58B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
363 ==6637== by 0x13B58B: ospf6_dbdesc_recv (ospf6_message.c:1038)
364 ==6637== by 0x13B58B: ospf6_read_helper (ospf6_message.c:1838)
365 ==6637== by 0x13B58B: ospf6_receive (ospf6_message.c:1875)
366 ==6637== by 0x4EB739B: thread_call (thread.c:1692)
367 ==6637== by 0x4E85B17: frr_run (libfrr.c:1068)
368 ==6637== by 0x119535: main (ospf6_main.c:228)
RCA:
1. when the ospf6 area is being deleted, the neighbor related information
was not being cleaned up.
2. when aggr route gets deleted from rt_aggr_tbl the corrsponding summary
route attched to the aggr route was not being deleted.
Fix:
Added the ospf6_neighbor_delete in ospf6_area_delete to free the
neighbor related information and added ospf6_route_delete while
freeing external aggr route to free the summary route.
Nico Berlee [Sun, 23 Oct 2022 14:42:51 +0000 (16:42 +0200)]
vtysh: Ensure an empty string does not get printed for host/domain
vtysh show running-config is showing:
frr version 8.3.1_git
frr defaults traditional
hostname test
log file /etc/frr/frr.log informational
log timestamp precision 3
domainname
service integrated-vtysh-config
domainname should not be printed in this case at all. If the
host has no search/domainname configured, frr_reload.py
crashes on invalid config from `vtysh show running-config`
Stephen Worley [Fri, 21 Oct 2022 16:45:50 +0000 (12:45 -0400)]
bgpd,doc: limit InQ buf to allow for back pressure
Add a default limit to the InQ for messages off the bgp peer
socket. Make the limit configurable via cli.
Adding in this limit causes the messages to be retained in the tcp
socket and allow for tcp back pressure and congestion control to kick
in.
Before this change, we allow the InQ to grow indefinitely just taking
messages off the socket and adding them to the fifo queue, never letting
the kernel know we need to slow down. We were seeing under high loads of
messages and large perf-heavy routemaps (regex matching) this queue
would cause a memory spike and BGP would get OOM killed. Modifying this
leaves the messages in the socket and distributes that load where it
should be in the socket buffers on both send/recv while we handle the
mesages.
Also, changes were made to allow the ringbuffer to hold messages and
continue to be filled by the IO pthread while we wait for the Main
pthread to handle the work on the InQ.
Memory spike seen with large numbers of routes flapping and route-maps
with dozens of regex matching:
```
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: > 2GB
Holding block headers: 516 KiB
Used small blocks: 0 bytes
Used ordinary blocks: 160 MiB
Free small blocks: 3680 bytes
Free ordinary blocks: > 2GB
Ordinary blocks: 121244
Small blocks: 83
Holding blocks: 1
```
With most of it being held by the inQ (seen from the stream datastructure info here):
```
Type : Current# Size Total Max# MaxBytes
...
...
Stream : 115543 variable 26963208159707403571708768
```
With this change that memory is capped and load is left in the sockets:
RECV Side:
```
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 265350 0 [fe80::4080:30ff:feb0:cee3]%veth1:36950 [fe80::4c14:9cff:fe1d:5bfd]:179 users:(("bgpd",pid=1393334,fd=26))
skmem:(r403688,rb425984,t0,tb425984,f1816,w0,o0,bl0,d61)
```
SEND Side:
```
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 1275012 [fe80::4c14:9cff:fe1d:5bfd]%veth1:179 [fe80::4080:30ff:feb0:cee3]:36950 users:(("bgpd",pid=1393443,fd=27))
skmem:(r0,rb131072,t0,tb1453568,f1916,w1300612,o0,bl0,d0)
```
Signed-off-by: Stephen Worley <sworley@nvidia.com>