Problem Statement:
==================
valgrind shows memleaks in rp_table, when pimd shuts down gracefully.
2020-05-05 22:09:29,451 ERROR: Memory leaks in router [r4] for daemon [pimd]
2020-05-05 22:09:29,451 ERROR: Memory leaks in router [r4] for daemon [zebra]
2020-05-05 22:09:29,637 ERROR: Found memory leak in module pimd
2020-05-05 22:09:29,638 ERROR: ==6178== 184 (56 direct, 128 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 21
2020-05-05 22:09:29,638 ERROR: ==6178== at 0x4C2FFAC: calloc (vg_replace_malloc.c:762)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4E855EE: qcalloc (memory.c:111)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4EAA43C: route_table_init_with_delegate (table.c:52)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x1281A1: pim_rp_init (pim_rp.c:114)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11D0F8: pim_instance_init (pim_instance.c:117)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11D0F8: pim_vrf_new (pim_instance.c:150)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4EB1BEC: vrf_get (vrf.c:209)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4EB2B2F: vrf_init (vrf.c:493)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11D227: pim_vrf_init (pim_instance.c:217)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11BBAB: main (pim_main.c:121)
Fix:
====
rp_info is allocated in pim_rp_init API. rp_info pointer is present
in rp_list and rp_table. In rp_list cleanup, the memory for rp_info
gets freed. rp_table clean up should be done first and then rp_list.
pimd: pim_ifchannel_local_membership_add should not inherit if (S,G) rpf unresolved
Problem:
S,G entry has iif = oif in FHR is LHR case.
Setup:-
R11-----R2----R4
R11 :- FHR and LHR
R2 :- RP
R4 :- LHR
Issue :-
1) shut mapped interface in R11
2) wait for 5 min
3) do FRR restart
5) No shut of mapped interface
OIL is added for local interface also where OIL is same as IIF
and duplicate traffic observed on R4 receives in Ixia
RCA:
pim_ifchannel_local_membership_add adds inherited oif from starg when iif for
SG is unavailable.
When rpf for that SG resolves to this inherited oif from starg, iif is also in oif.
This results in dup traffic.
Fix:
If iif is not available, do not inherit from starg.
ospf6d: fix argument processing in the "area ... range" command
* When the "cost" argument isn't present, the default cost should be
used instead of preserving the previously configured one (if any);
* When the "not-advertise" argument isn't present, the "not-advertise"
flag should be unset regardless if it was previously configured or
not.
Configuration commands should be deterministic and work in the same
way regardless of the current state.
Igor Ryzhov [Thu, 29 Jul 2021 17:21:00 +0000 (20:21 +0300)]
zebra: remove checks for src address existence when using "set src"
1. This check is absolutely useless. Nothing keeps user from deleting
the address right after this check.
2. This check prevents zebra from correctly reading the user config with
"set src" because of a race with interface startup (see #4249).
3. NO OPERATIONAL DATA USAGE ON VALIDATION STAGE.
batmancn [Mon, 30 Nov 2020 12:04:44 +0000 (20:04 +0800)]
zebra: bugfix of error quit of zebra, due to no nexthop ACTIVE
There exists some rare situations where fpm will attempt
to send a route update with no valid nexthops. In that
case an assert would be hit. This is not good for
trying to keep your routing daemons up and running
when we can safely just recover the situation.
Fixes #7588 Signed-off-by: batmancn <batmanustc@gmail.com>
<fixed commit message, and used zlog_err> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 5306e6cf00c58a4c4558609d623ecbbd79faabf1)
Igor Ryzhov [Fri, 30 Jul 2021 11:11:33 +0000 (14:11 +0300)]
ospfd: fix "no ip ospf passive" command
This command is currently always treated as an "unset" command, assuming
that active is the default type of the interface. In reality, the default
type of the interface can be changed using "passive-interface default"
command. Both "no" and regular commands can be "set" commands, depending
on the default value. They are treated as an "unset" when there's already
a config of the opposite type.
All this logic is in ospf_passive_interface_update.
bgpd: Use strict AS4 capability when processing parsing/generating pkts
PeerA sets `dont-capability-negotiate` for PeerB. It does not send any
capabilities to PeerB. This leads to situation when PeerA received AS4 cap,
while it doesn't send AS4 to PeerB and tries parsing AS_PATH using 32bits.
bgpd: Clear capabilities field when resetting a bgp neighbor
Currently, the following sequence of events between peers could
result in erroneous capability reports on the peer
with enabled dont-capability-negotiate option:
- having some of the capabilities advertised to a bgp neighbor,
- then disabling capability negotiation to that neighbor,
- then resetting connection to it,
- and no capabilities are actually sent to the neighbor,
- but "show bgp neighbors" on the host still displays them
as advertised to the neighbor.
There are two possibilities for establishing a new connection
- the established connection was initiated by us with bgp_start(),
- the connection was initiated on the neighbor side and processed by
us via bgp_accept() in bgp_network.c.
The former case results in "show bgp neighbors" displaying only
"received" in capabilities, as the peer's cap is initiated to zero
in bgp_start().
In the latter case, if bgp_accept() happens before bgp_start()
is called, then new peer capabilities are being transferred
from its previous record before being zeroed in bgp_start().
This results in "show bgp neighbors" still displaying
"advertised and received" in capabilities.
Following the logic of a similar af_cap field clearing,
treated correctly in both cases, we
- reset peer's capability during bgp_stop()
- don't pass it over to a new peer structure in bgp_accept().
This fix prevents transferring of the previous capabilities record
to a new peer instance in arbitrary reconnect scenario.
bgpd: Set extended msg size only if we advertised and received capability
If we don't advertise any capabilities (dont-capability-negotiate), we
shouldn't set msg size to 65k only if received this capability from another
peer.
Before:
```
~/frr# vtysh -c 'show ip bgp update-group' | grep 'Max packet size'
Max packet size: 65535
```
After:
```
~/frr# vtysh -c 'show ip bgp update-group' | grep 'Max packet size'
Max packet size: 4096
```
Igor Ryzhov [Wed, 28 Jul 2021 22:17:50 +0000 (01:17 +0300)]
bgpd: fix incorrect usage of slist in dampening
Current code is a complete misuse of SLIST structure. Instead of just
adding a SLIST_ENTRY to struct bgp_damp_info, it allocates a separate
structure to be a node in the list.
Igor Ryzhov [Thu, 29 Jul 2021 11:42:16 +0000 (14:42 +0300)]
bgpd: fix missing list add in dampening
One more crash in dampening code...
When bgp_damp_withdraw is called, if there's already a BDI structure,
bgp_damp_info_claim is called to re-assign the bdi->config in case it
was changed. The problem is that bgp_damp_info_claim actually removes
the BDI from the reuse list of the old config and never adds it to the
reuse list of the new config. We must do this to prevent the crash
because all the code assumes that BDI is always in some list.
Igor Ryzhov [Tue, 27 Jul 2021 13:10:35 +0000 (16:10 +0300)]
ospfd: don't exit when socket is not created
Let's be less radical. There's no reason to stop the whole daemon when
there's a socket creation error in a single VRF. The user can always
restart this single VRF to retry to create a socket.
Philippe Guibert [Wed, 30 Jun 2021 08:52:29 +0000 (10:52 +0200)]
bgpd: prevent routes loop through itself
Some BGP updates received by BGP invite local router to
install a route through itself. The system will not do it, and
the route should be considered as not valid at the earliest.
This case is detected on the zebra, and this detection prevents
from trying to install this route to the local system. However,
the nexthop tracking mechanism is called, and acts as if the route
was valid, which is not the case.
By detecting in BGP that use case, we avoid installing the invalid
routes.
Igor Ryzhov [Fri, 23 Jul 2021 15:38:20 +0000 (18:38 +0300)]
vtysh: don't install "enable" command in user mode
Recent change in d1b287e only fixed the problem for 3-letter words.
We were still displaying error for longer words starting with "ena":
```
nfware> enac
% Command not allowed: enable
nfware> enad
% Command not allowed: enable
nfware> enaena
% Command not allowed: enable
```
If we don't allow "enable" command in user mode, why add it at all?
Donald Sharp [Wed, 7 Jul 2021 20:52:24 +0000 (16:52 -0400)]
zebra: When passing lookup information back pass the fully resolved
In the reachability code we auto pass back the fully resolved
nexthops. Modify the ZEBRA_IPV4_NEXTHOP_LOOKUP_MRIB code
to do the exact same thing so that the zclient_lookup_nexthop
code does not need to recursively look for the data that
zebra already has.
Mark Stapp [Mon, 1 Mar 2021 15:49:32 +0000 (10:49 -0500)]
zebra: optionally hide backup-nexthop events in nht
Optionally hide route changes that only involve backup nexthop
activation/deactivation. The goal is to avoid route churn during
backup nexthop switchover events, before the resolving routes
re-converge. A UI config enables this 'hiding' behavior.
Mark Stapp [Fri, 4 Jun 2021 17:15:50 +0000 (13:15 -0400)]
zebra: update pw dataplane info
Include the complete set of primary and backup nexthops from
the resolving route for a pseudowire. Add accessors for that
info. Modify the logic that creates the fib set of pw nexthops
so that only installed, labelled nexthops are included.
Mark Stapp [Wed, 10 Mar 2021 15:36:46 +0000 (10:36 -0500)]
zebra: revise pw reachability logic
Modify the pseudowire reachability logic so that it returns
success if there is at least one installed labelled nexthop for
the route resolving the pw destination. We also check for
valid backup nexthops if necessary, in case there's been a
switchover event.
Only OpenBSD requires that _all_ nexthops be labelled, so we
have a more strict version of the logic also.