tests: [topojson] Increase BGP convergence wait time
1. Increasing BGP convergence wait time to overcome Ubuntu 16.04 arm8 box, as
bgp neighorship is taking more time in this particular testbed.
2. Debugged bgp-ecmp-topo2 failures and here also it seems to be bgp convergence
issue, doing some enhancement in scripts to handle it
bgpd: Show the real next-hop address in addition to hostname in `show bgp`
It's hard to cope with cases when next-hop is changed/unchanged or
peers are non-direct.
It would be better to show the hostname and nexthop IP address (both)
under `show bgp` to quickly identify the source and the real next-hop
of the route.
If `bgp default show-nexthop-hostname` is toggled the output looks like:
```
spine1-debian-9# show bgp
BGP table version is 1, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 2a02:4780::/64 fe80::a00:27ff:fe09:f8a3(exit1-debian-9)
0 0 65001 ?
spine1-debian-9# show ip bgp
BGP table version is 5, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 65002
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.255.255.0/24 192.168.0.1(exit1-debian-9)
0 0 65001 ?
```
Stephen Worley [Mon, 6 Jul 2020 22:20:14 +0000 (18:20 -0400)]
zebra: mark connected nh inactive if not matching ifindex
If we are asked to check if a nexthop is active and it matches a
connected route but the ifindex on it does not match the interface
with the connected route, mark as inactive. This is a bad nexthop.
Before, we would skip this check and just assume any nexthop that matches
on a connected route is valid and return here then fail during
installation. This adds a check for the IPV*_ifindex nexthop case where the
ifindex we have been sent doesn't match.
Old:
F>r 0.0.0.0/0 [200/0] via 20.0.0.2, test, weight 1, 00:00:27
r via 40.4.4.4, lo, weight 1, 00:00:27
New:
F>* 0.0.0.0/0 [200/0] via 20.0.0.2, test, weight 1, 00:00:06
* via 40.4.4.4, lo inactive, weight 1, 00:00:06
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
bgpd: Add command to show only established sessions
```
exit1-debian-9# show bgp summary
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
192.168.0.2 4 200 10 6 0 0 0 00:00:35 8 8
2a02:4780::2 4 0 0 1 0 0 0 never Active 0
Total number of neighbors 2
exit1-debian-9# show bgp summary established
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Total number of neighbors 2
exit1-debian-9# show bgp summary failed
IPv4 Unicast Summary:
BGP router identifier 192.168.0.1, local AS number 100 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 2, using 43 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
2a02:4780::2 0 0 never Waiting for peer OPEN
Donald Sharp [Wed, 8 Jul 2020 01:48:34 +0000 (21:48 -0400)]
ospfd: allow interfaces to come up in rare situation
On startup of both zebra and ospfd. If ospfd has not
received a valid router-id *but* has received interface
data, interfaces will not be turned on in the state
machine. When ospf finally receives a valid router-id
it would never actually kick the state machine into
action for those interfaces it has been configured for.
Modify ospf on router id changes, *if* the old
router id was INADDR_ANY *and* the interface is
operative *and* the oi->state is ISM_Down, give
it the old kick in the patooeys
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Mark Stapp [Tue, 2 Jun 2020 20:16:21 +0000 (16:16 -0400)]
zebra: collapse some duplicate LSP nhlfe apis
Collapse some apis where primary and backup nhlfe code
was very similar, generally using a single common api
and using a bool to distinguish between primary and
backup.
Mark Stapp [Tue, 2 Jun 2020 15:04:56 +0000 (11:04 -0400)]
zebra: add init api for dplane lsp context
Add an init api (based on what had been a private/static api)
to allow a caller to init a context and use it to generate LSP
updates. This might be useful for testing, or from a dplane
plugin.
Mark Stapp [Wed, 27 May 2020 16:53:20 +0000 (12:53 -0400)]
zebra: include backup nexthops in nexthop-tracking
Include backup nexthops when examining routes that resolve
NHT requests. Include installed backups when sending nexthops
in zapi messages to client daemons.
When handling a fib notification event that involves a route
with backup nexthops, be clearer about representing the
installed state of the backups: any installed backup will be
on a dedicated route_entry list.
Mark Stapp [Fri, 8 May 2020 20:22:54 +0000 (16:22 -0400)]
staticd,zebra: use ALLOW_RECURSION for static routes
Remove a special-case clause for static routes - it was the same
as the clause for other recursive routes. Have staticd just tell
zebra that recursion is allowed. Update topotest that was aware
of this 'internal' flag.
Rafael Zalamena [Mon, 6 Jul 2020 14:39:27 +0000 (11:39 -0300)]
lib: fix route map description memory leak
Route map entries are not getting a chance to call `description` string
deallocation on shutdown or when the parent entry is destroyed, so lets
add a code to handle this in the `route_map_index_delete` function.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Kuldeep Kashyap [Thu, 18 Jun 2020 11:33:31 +0000 (11:33 +0000)]
tests: Add bgp_recursive_route_ebgp_multi_hop test suite
1. Added 7 test cases to verify bgp recursive nexthop and ebgp multi-hop functionality
2. Added framework support to automate these test cases
3. Total execution time is ~5 mins
the code in isis_spf_add2tent was asserting in case the vertex
we were trying to add was already present in the path or tent
trees. This however CAN happen if the user accidentally configures
the system Id of the area to the same value of an estabished
neighbor. Handle this more gracefully by logging and returning,
to prevent crashes.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Mark Stapp [Tue, 30 Jun 2020 16:47:46 +0000 (12:47 -0400)]
zebra: check LSP flags when deleting an LSP
Check the LSP INSTALLED flag in delete apis, to ensure we
enqueue a delete operation for the lfib. Some apis were only
checking the nexthop/nhlfe INSTALLED flags, and those could be
unset if there's an in-flight dataplane update.
GalaxyGorilla [Tue, 19 May 2020 11:52:04 +0000 (11:52 +0000)]
isisd: Fast RIB recovery from BFD recognized link failures
Unfortunately as the topotests show a fast recovery after failure
detection due to BFD is currently not possible because of the following
issue:
There are multiple scheduling mechanisms within isisd to prevent
overload situations. Regarding our problem these two are important:
* scheduler for regenerating ISIS Link State PDUs scheduler for managing
* consecutive SPF calculations
In fact both schedulers are coupled, the first one triggers the second
one, which again is triggered by isis_adj_state_change (which again is
triggered by a BFD 'down' message). The re-calculation of SPF paths
finally triggers updates in zebra for the RIB.
Both schedulers work as a throttle, e.g. they allow the regeneration of
Link State PDUs or a re-calculation for SPF paths only once within a
certain time interval which is configurable (and by default different!).
This means that a request can go through the first scheduler but might
still be 'stuck' at the second one for a while. Or a request can be
'stuck' at the first scheduler even though the second one is ready. This
also explains the 'random' behaviour one can observe testing since a
'fast' recovery is only possible if both schedulers are ready to process
this request.
Note that the solution in this commit is 'thread safe' in the sense that
both schedulers use the same thread master such that the introduced
flags are only used exactly one time (and one after another) for a
'fast' execution.
Further there are some irritating comments and logs which I partially
removed. They seems to be not valid anymore due to changes in thread
management (or they were never valid in the first place).
GalaxyGorilla [Tue, 12 May 2020 11:49:58 +0000 (11:49 +0000)]
tests: Introduce BFD IS-IS topotests
The tests work with the default settings of BFD meaning that bfdd is
able to recognize a 'down' link after ~900ms so a route recovery should
be visible in the RIB after 1 second.
In the current state only IPv4 is used (when using IPv6
autoconfiguration) within BFD, even though the recovery also affects
IPv6 routes. This is different to the current state of ospfd/ospf6d in
combination with BFD since both IPv4 and IPv6 sessions are used there.
Route recovery is tested on RT1. The focus here lies on the two
different routes to RT5. Link failures are generated by taking
down interfaces via the mininet Python interface on RT2 and RT3.
Hence routes are supposed to be adjusted to use RT3 when a link
failure happens on RT2 or vice versa.
Note that only failure recognition and recovery is "fast". BFD
does not monitor a link becoming available again.