Rafael Zalamena [Mon, 13 Dec 2021 20:21:56 +0000 (17:21 -0300)]
bgpd: fix aggregate route AS Path attribute
Always free the locally allocated attribute not the one we are using for
return. This fixes a memory leak and a crash when AS Path is set with
route-map.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Donald Sharp [Fri, 16 Apr 2021 15:34:30 +0000 (11:34 -0400)]
bgpd, tests: Add code to handle failed installations
Currently the Wait for Install code ( bgp_suppress_fib ) does
not properly handle two states from zebra: ROUTE_INSTALL_FAILED
and BETTER_ADMIN_DISTANCE_WON. Pre this change the WFI code
would just never notify our peers about a route install failure
but more is needed. In the ROUTE_INSTALL_FAILED and the
BETTER_ADMIN_DISTANCE_WON we need to notify our peers with
a withdrawal about the route, else we will continue to
draw traffic to us when we cannot legally do so.
Why is this needed? In either case imagine that we've already
received a bgp route, installed it and sent to our peers.
In the Better admin distance won case, say a static route is installed
at this point in time we must stop advertising the route through
us since we are not installed. As such a withdrawal must be sent.
In the ROUTE_INSTALL_FAILED case, the code was not properly handling
the situation where we have Route A, it was successfully installed
and then we received a update to Route A that was attempted to be
installed but failed. In this case we also need to send a withdrawal
Finally update the bgp_suppress_fib topotest to test both of these
situations.
David Lamparter [Thu, 11 Nov 2021 16:22:59 +0000 (17:22 +0100)]
lib: shuffle around command line options
New `FRR_NO_SPLIT_CONFIG` flag for newly added daemons where we're just
rolling without split config and always expect configs to be loaded via
vtysh/integrated config.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Igor Ryzhov [Tue, 14 Dec 2021 13:28:08 +0000 (16:28 +0300)]
isisd: fix use after free
Pointers to the adjacency must be cleared only when the adjacency is
deleted. Otherwise, when the ISIS router is deleted later, the adjacency
is not deleted and a crash happens because of UAF.
Donald Sharp [Sat, 11 Dec 2021 17:05:36 +0000 (12:05 -0500)]
tests: test_ospf_lan.py is looking for a certain order enforce it
OSPF when converging will choose a DR / Backup DR based upon
who has already come up. Irrelevant of priority. As such if
under system load OSPF comes up first and elects a DR that under
normal circumstances not be the elected one due to priority
OSPF does not go back through and re-elect to keep the system
stable in this case. Tests are experiencing this:
unet> r0 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.1 99 Full/Backup 4m14s 3.780s 10.0.1.2 r0-s1-eth0:10.0.1.1 0 0 0
100.1.1.2 0 Full/DROther 4m14s 3.848s 10.0.1.3 r0-s1-eth0:10.0.1.1 0 0 0
100.1.1.3 0 Full/DROther 4m14s 3.912s 10.0.1.4 r0-s1-eth0:10.0.1.1 0 0 0
unet> r1 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.0 98 Full/DR 4m15s 3.011s 10.0.1.1 r1-s1-eth1:10.0.1.2 0 0 0
100.1.1.2 0 Full/DROther 4m19s 3.124s 10.0.1.3 r1-s1-eth1:10.0.1.2 0 0 0
100.1.1.3 0 Full/DROther 4m19s 3.188s 10.0.1.4 r1-s1-eth1:10.0.1.2 0 0 0
unet> r2 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.0 98 Full/DR 4m27s 3.483s 10.0.1.1 r2-s1-eth0:10.0.1.3 0 0 0
100.1.1.1 99 Full/Backup 4m32s 3.527s 10.0.1.2 r2-s1-eth0:10.0.1.3 0 0 0
100.1.1.3 0 2-Way/DROther 4m32s 3.660s 10.0.1.4 r2-s1-eth0:10.0.1.3 0 0 0
unet> r3 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.0 98 Full/DR 4m55s 3.786s 10.0.1.1 r3-s1-eth1:10.0.1.4 0 0 0
100.1.1.1 99 Full/Backup 4m55s 3.829s 10.0.1.2 r3-s1-eth1:10.0.1.4 0 0 0
100.1.1.2 0 2-Way/DROther 4m54s 3.897s 10.0.1.3 r3-s1-eth1:10.0.1.4 0 0 0
Modify the test to do a clear to enforce the order we are specifically looking for.
Chirag Shah [Thu, 2 Dec 2021 06:13:37 +0000 (22:13 -0800)]
tools: exit when reload fails to parse config file
frr-reload triggers restart of service in case
it fails to parse new config file and conjunction with
running config contains 'router bgp' (default bgp instnace).
When frr-reload fails to parse new config file, it fails
to build newconfig context (empty object).
Instead of bailing out it compares against the running config
context. If the running config contains default bgp instance
it thinks new config is removing default bgp instance so it
triggers frr restart.
Fix is to to bail out reload script when it fails to parse
config file.
tools: add a script to generate draft release changelog
This utility script helps in generated formatted and consistent
change log including:
1- group logs per daemon
2- standarize daemon names (lowercase, end with d)
3- capitalize all log lines
4- no merge commits
caveat: comments are assumed to be in the form
daemon-name : message
Sample Output:
```
sharpd
Follow the practice on cli design for json output
Install route supports nexthop-seg6 (step3)
Install_routes_helper support zapi_route flags (step1)
snapcraft
Add missing dependency
Add pathd to frr snap daemons
Change base to ubuntu 18.04 and libyang 2.0.7
staticd
Convert typedef to enum
Fix distance processing
Fix late initialization of blackhole type
Output config using nb callbacks instead of operational data
```
Igor Ryzhov [Wed, 17 Nov 2021 23:20:43 +0000 (02:20 +0300)]
bfdd: remove unnecessary receive timer restart
When the detection time expires, we put the session down and restart the
timer. As the comment in the code says, it's needed to zero the remote
discriminator after the second expiration.
But the RFC clearly says that this must be done on the first expiration:
bfd.RemoteDiscr
The remote discriminator for this BFD session. This is the
discriminator chosen by the remote system, and is totally opaque
to the local system. This MUST be initialized to zero. If a
period of a Detection Time passes without the receipt of a valid,
authenticated BFD packet from the remote system, this variable
MUST be set to zero.
And we actually already do it in `ptm_bfd_sess_dn`, so there's no need
to reset the timer and wait for it twice.
Frr-reload failure:
line 179: Failure to communicate[13] to bgpd, line: neighbor 10.2.1.1
remote-as external
% Peer-group member cannot override remote-as of peer-group
line 179: Failure to communicate[13] to bgpd, line: neighbor 10.2.1.2
remote-as external
% Peer-group member cannot override remote-as of peer-group
Igor Ryzhov [Fri, 13 Aug 2021 23:09:54 +0000 (02:09 +0300)]
vtysh: fix duplicated output of key chain configuration
When both ripd and eigrpd run at the same time, all key configuration in
key chain node is duplicated. This change adds a concept of nested nodes
into vtysh to fix the issue.
Igor Ryzhov [Wed, 24 Nov 2021 12:01:41 +0000 (15:01 +0300)]
bfdd: fix detection timeout update
Per RFC 5880 section 6.8.12, the use of a Poll Sequence is not necessary
when the Detect Multiplier is changed. Currently, we update the Detection
Timeout only when a Poll Sequence is terminated, therefore we ignore the
Detect Multiplier change if it's not accompanied with RX/TX timer change.
To fix the problem, we should update the Detection Timeout on every
received packet.
Stephen Worley [Mon, 29 Nov 2021 19:59:06 +0000 (14:59 -0500)]
zebra: add optional NHG ID output to `show ip ro`
Add optional NHG ID output to `show ip route` dumps. We have
this in json output already as nexthopGroupID but nice
to have the option in a normal dump as well. Not including in main
output for now to avoid breaking screen scrapers.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Donald Sharp [Tue, 30 Nov 2021 00:33:48 +0000 (19:33 -0500)]
tests: Fix Daemon Killing to actually notice when a deamon dies
Lot's of the GR topotests kill daemons in order to test code
that deals with crashing daemons. Under heavy system load
it was noticed that a kill command was sent and if told to
wait we would sleep 2 seconds send another kill command and
call it good. This was causiing issues when subsuquent
json commands would get errors like `lost connection to daemon`
as the daemon finally shut down after some time due to load.
Modify the kill the daemon function to notice that the daemon
was not actually killed and if we need to wait wait some
more time for it too happen
Donald Sharp [Mon, 29 Nov 2021 20:51:45 +0000 (15:51 -0500)]
zebra: Prevent thread usage of data after it being freed
On startup we create a thread timer event to do a rib sweep
of the system. On shutdown we never stopped this timer and
as such we have a situation where a thread event could be run
on shutdown after the data for it has been freed. Here is the
crash I am seeing:
(gdb) bt
(gdb)
Save the thread data in zebra_router and stop the thread so we don't
accidently do work on shutdown we don't mean to. In this case
it happened in our topotests with some severe system load.
Essentially we happened to kill the zebra daemon just as the
graceful_restart timer popped here.