Christian Hopps [Wed, 14 Jul 2021 11:05:29 +0000 (07:05 -0400)]
tools: improve frr-reload.py delta file creation
- Remove incorrect requirement for `service integrated-vtysh-config`
when producing a delta.
- Add `--test-reset` option which suppresses non-parseable lines from the
produced delta
- Use new features in common_config.py
With fix:
```
exit1-debian-9# sh ip bgp dampening flap-statistics
BGP table version is 22, local router ID is 10.10.10.200, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
```
5 0x00007fccab6fac39 in json_object_boolean_true_add (obj=<optimized out>, key=<optimized out>) at lib/json.c:70
No locals.
6 0x000055c7b8c08ae5 in route_vty_short_status_out (vty=<optimized out>, path=0x55c7bb37dcf0, p=<optimized out>, json_path=0x55c7bb3735a0)
at bgpd/bgp_route.c:8566
rpki_state = RPKI_NOT_BEING_USED
7 0x000055c7b8c22d1b in flap_route_vty_out (afi=AFI_IP, json=0x55c7bb3735a0, use_json=true, safi=SAFI_UNICAST, display=0, path=0x55c7bb37dcf0,
p=0x55c7bb37dea0, vty=0x55c7bb39e4c0) at bgpd/bgp_route.c:9600
attr = <optimized out>
bdi = 0x55c7bb377950
timebuf = '\000' <repeats 24 times>
len = <optimized out>
8 bgp_show_table (vty=0x55c7bb39e4c0, bgp=0x55c7bb316300, safi=safi@entry=SAFI_UNICAST, table=0x55c7bb314d90, type=bgp_show_type_flap_statistics,
output_arg=0x0, rd=0x0, is_last=1, output_cum=0x0, total_cum=0x0, json_header_depth=0x7ffeefd649f8, show_flags=1, rpki_target_state=RPKI_NOT_BEING_USED)
at bgpd/bgp_route.c:11110
```
With fix:
```
exit1-debian-9# sh ip bgp dampening dampened-paths
BGP table version is 16, local router ID is 10.10.10.200, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Donald Sharp [Wed, 7 Jul 2021 20:52:24 +0000 (16:52 -0400)]
zebra: When passing lookup information back pass the fully resolved
In the reachability code we auto pass back the fully resolved
nexthops. Modify the ZEBRA_IPV4_NEXTHOP_LOOKUP_MRIB code
to do the exact same thing so that the zclient_lookup_nexthop
code does not need to recursively look for the data that
zebra already has.
TMUX and Screen support when running topotests inside docker. This
allows the gdb, shell and vtysh features to correctly work even when
running the tests inside docker.
Add options:
--asan-abort :: aborts the process on ASAN errors
--strace-daemons :: strace some or all daemons
Quentin Young [Fri, 14 May 2021 18:57:06 +0000 (14:57 -0400)]
bgpd: add knob to config cond-adv scanner period
Adds a knob that sets the time between loc-rib scans for conditional
advertisement.
I chose the range (5-240) because 1 second seems dumb and too easy to
hurt yourself at even moderate scale, 5 seconds you can still hurt
yourself but I could see a use case for it, and 4 minutes should be
enough for anyone (tm)
Igor Ryzhov [Mon, 12 Jul 2021 20:56:04 +0000 (23:56 +0300)]
isisd: fix processing of the attached bit
There are two problems with the current code for processing the attached
bit:
- we should process it when acting both a level-1-only and level-1-2
- we should add the default route when we don't have L2 adjacensies, not
when we don't have other routers configured on the device
Igor Ryzhov [Mon, 12 Jul 2021 20:51:27 +0000 (23:51 +0300)]
isisd: fix setting of the attached bit
Current code related to setting of the attached bit checks for existence
of L2 adjacencies in other routers configured on the device. This makes
no sense. We should check for L2 adjacencies in the same router where we
have L1 adjacencies.
Igor Ryzhov [Mon, 12 Jul 2021 19:51:49 +0000 (22:51 +0300)]
ospf6d: fix freebsd mcast group issues
There's a delay in FreeBSD between issuing a command to leave a
multicast group and an actual leave. If we execute "no router ospf6" and
"router ospf6" fast enough, we can end up in a situation when OS
performs the leave later than it performs the join and the interface
remains without a multicast group.
Instead of counting on a one second delay, we must wait until the
interface actually leaves the group.
Philippe Guibert [Mon, 12 Jul 2021 07:22:41 +0000 (09:22 +0200)]
bgpd: associate correct nexthop when using peer link-local
When setting bgp configuration using peers referencing link local
ipv6 addresses, the bgp should be able to handle incoming bgp
connections, and find out the appropriate interface where the
connection comes from.
ipv6 link local sessions work by using bgp unnumbered interfaces
config, but it does not work if we have a shared media with
multiple potential link local ipv6 addresses on the network.
The fix consists in finding out the appropriate interface, when
the local configuration references a link local ipv6 addresses,
and the source address used references an interface. below
configuration illustrates what can be done then:
note: this change does not solve the ability for such config to
create an outgoing connection to remote peer (as the link local
ipv6 address config does not indicate which interface to use).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
staticd: fix late initialization of blackhole type
If a static route is added to a not-yet-existing VRF, the blackhole type
is not initialized. Initialization must be done before the VRF existence
check.
Rafael Zalamena [Thu, 8 Jul 2021 17:09:20 +0000 (14:09 -0300)]
lib,ospfd,ospf6d: remove duplicated function
Move `is_default_prefix` variations to `lib/prefix.h` and make the code
use the library version instead of implementing it again.
NOTE
----
The function was split into per family versions to cover all types.
Using `union prefixconstptr` is not possible due to static analyzer
warnings which cause CI to fail.
The specific cases that would cause this failure were:
- Caller used `struct prefix_ipv4` and called the generic function.
- `is_default_prefix` with signature using `const struct prefix *` or
`union prefixconstptr`.
The compiler would complain about reading bytes outside of the memory
bounds even though it did not take into account the `prefix->family`
part.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Christian Hopps [Fri, 9 Jul 2021 07:58:02 +0000 (03:58 -0400)]
ospf6d: fix backlink check
This code has been wrong ~ever (according to git history). There are 3
conditional blocks with the added assertion that both the LSA and the
vertex being checked can't both be network LSAs.
The third block is clearly assuming both LSA and vertex are router
LSAs b/c it is accessing the backlink and lsdesc as router lsdesc's also
making sure both are p2p links (which they would have to be to point at
each other).
The programming error here is that (A && B) == False does NOT imply !A,
but the code is written that way.
So we end up in the third block one of LSA or vertex being network LSAs
rather easily (whenever that is the case and the desc isn't the backlink
being sought).
This was caught by ASAN b/c the lsdesc and backlinks are being accessed
(> 4 byte field offsets) as if they were router lsdesc's in the third
block, when in fact one of them is a network lsdesc which is only 4
bytes long -- so ASAN flags the access beyond bounds.
Problem: Sometimes the configured Local GR state is not reflected in
show command and peer node. This is causing failures in few of the
BGP-GR topotests.
RCA: This problem is seen when the configuration of local GR state
happens when the BGP session is in OpenSent state and moves to
Established after the configuration is complete.
When the session gets established, we move the GR state value from stub peer
to the config peer. This will result in overriding the GR state to
previous value.
Fix: The local GR state is modified only through CLI configuration and
does not change during BGP FSM transition. In this case it is not necessary
to transfer the GR state value from stub peer to config peer. This way we
can ensure that always the most recent config value is present in peer
datastructure.
Donald Sharp [Wed, 7 Jul 2021 20:00:12 +0000 (16:00 -0400)]
lib: Allow ZAPI_MESSAGE_OPAQUE_LENGTH length of data
We are sending up to ZAPI_MESSAGE_OPAQUE_LENGTH but checking
for one less. We know the data will fit in it to that size.
Also we have asserts on the write to ensure we don't go over
it
Fixes: #8995 Signed-off-by: Donald Sharp <sharpd@nvidia.com>
David Lamparter [Wed, 7 Jul 2021 12:57:36 +0000 (14:57 +0200)]
lib: fix coverity unused result warning
There's nothing that can be done here with an error. Try to make
Coverity understand that this is intentional.
(I don't know if the `(void)` will actually fix the coverity warning,
but I don't really have a better way to figure it out beyond just
getting this merged and waiting for a result...)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
ospf6d: Fix crash in ospf6_asbr_lsa_remove at ospf6d/ospf6_asbr.c:696
Issue: Crash observed when LSAs are removed from LSDB after max age
when there is no area configured.
(gdb) bt
0 raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
1 0x00007fdb190548bc in core_handler (signo=6, siginfo=0x7ffdd2f5a470, context=<optimized out>) at lib/sigevent.c:262
2 <signal handler called>
3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
4 0x00007fdb185ad921 in __GI_abort () at abort.c:79
5 0x00007fdb1907f199 in _zlog_assert_failed (xref=xref@entry=0x55f30902aa20 <_xref.21999>, extra=extra@entry=0x0) at lib/zlog.c:581
6 0x000055f308dc4f78 in ospf6_asbr_lsa_remove (lsa=0x55f30a7546d0, asbr_entry=0x0) at ospf6d/ospf6_asbr.c:696
7 0x000055f308dd8f0d in ospf6_lsdb_remove (lsa=0x55f30a7546d0, lsdb=lsdb@entry=0x55f30a73d300) at ospf6d/ospf6_lsdb.c:166
8 0x000055f308dd9701 in ospf6_lsdb_maxage_remover (lsdb=0x55f30a73d300) at ospf6d/ospf6_lsdb.c:376
9 0x000055f308dee724 in ospf6_maxage_remover (thread=<optimized out>) at ospf6d/ospf6_top.c:603
10 0x00007fdb1906520d in thread_call (thread=thread@entry=0x7ffdd2f5ae90) at lib/thread.c:1919
11 0x00007fdb19023e48 in frr_run (master=0x55f30a569b70) at lib/libfrr.c:1155
12 0x000055f308dc09b6 in main (argc=6, argv=0x7ffdd2f5b198, envp=<optimized out>) at ospf6d/ospf6_main.c:235
(gdb)
Steps to reproduce the issue:
1. router ospf6
2. redistribute static
3. ipv6 route 1::1/128 Null0
4. no redistribute static
5. wait for Max aged LSA to flush
6. Check DB, crash occurs.
RCA:
Crash occurred while accessing listgetdata(listhead(ospf6->area_list))
When there is no area attached to any of the interface listhead(ospf6->area_list)
is NULL. Therefore it crashed due to NULL access.
zyxwvu Shi [Wed, 26 May 2021 02:33:55 +0000 (10:33 +0800)]
bgpd: Do not delete peer_af when deactivating peer-group.
There is no peer_af allocated in `peer_activate`. Trying to delete
the structure just results in an no-op and a error return value.
The error message "couldn't delete af structure for peer" is
unexpected.
Signed-off-by: zyxwvu Shi <shiyuchen.syc@bytedance.com>
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
tests: add OSPF graceful restart topotest
Add a new topotest that features a topology with seven routers spread
across four OSPF areas:
* 1 backbone area;
* 1 regular non-backbone area (0.0.0.1);
* 1 stub area (0.0.0.2);
* 1 NSSA area (0.0.0.3).
All routers have both GR and GR helper functionality enabled in
the configuration. The test consists of restarting each router,
one at time, and checking that all forwarding planes (and LSDBs)
are kept intact during those restarts.
A successful run takes about three minutes to finish.
Renato Westphal [Mon, 31 May 2021 13:27:51 +0000 (10:27 -0300)]
tests: add "save_config" parameter to kill_router_daemons()
Using "write memory" to save the daemons' configurations before
restarting them can cause log files to stop working correctly. Add
a new "save_config" to the kill_router_daemons() function to prevent
that from happening when saving the configurations isn't necessary.