Igor Ryzhov [Tue, 17 Aug 2021 12:36:55 +0000 (15:36 +0300)]
ospfd: explicitly exit from the router node
There's a new "mpls ldp-sync" command added to the OSPF router node in
FRR 8.0. This change broke the following config:
```
router ospf
!
mpls ldp
discovery hello interval 10
!
```
The config was broken because the "mpls ldp" line is now treated as an
"mpls ldp-sync" line inside the router node. We must explicitly print
"exit" at the end of OSPF router node to fix the issue.
Igor Ryzhov [Wed, 11 Aug 2021 14:46:31 +0000 (17:46 +0300)]
vtysh, pathd: fix pcep node-entering commands
pce-config, pce and pcc node-entering commands in vtysh include no-form,
which is incorrect. Currently, when user passes a no-form command like
`no pcc`, vtysh enters the node while pathd deletes the node and this
leads to a desynchronization.
Regular and no-form commands should be defined separately to fix this.
Don Slice [Wed, 11 Aug 2021 12:45:23 +0000 (08:45 -0400)]
bgpd: Stop prepending peer-as if self-originated and lastas configured
Problem seen where if "set aspath-prepend last-as" configured and
applied outbound, we prepend the peer's asn which causes our self-
originated routes to be denied.
Yash Ranjan [Wed, 4 Aug 2021 11:53:10 +0000 (04:53 -0700)]
ospf6d: Check the cost only when asbr_present for ECMP routes
For ECMP routes, the metric cost and metric type are compared
even when the asbr entry is not present. This stops the routes
from getting removed when max age LSAs are received for the
ECMP routes.
Donald Sharp [Sun, 8 Aug 2021 12:23:24 +0000 (08:23 -0400)]
bgpd: bgp_packet_process_error can access peer after deletion
in bgp_io.c upon packet read of some error we are storing
the peer pointer on a thread to call bgp_packet_process_error.
In this case an event is generated that is not guaranteed to be
run immediately. It could come in *after* the peer data structure
is deleted and as such we now are writing into memory that we
no longer possibly own as a peer data structure.
Modify the code so that the peer can track the thread associated
with the read error and then it can wisely kill that thread
when deleting the peer data structure.
Donald Sharp [Mon, 9 Aug 2021 12:01:06 +0000 (08:01 -0400)]
zebra: Properly note add/update for rib_add_multipath_nhe
When calling rib_add_multipath_nhe ensure that we have
well aligned return codes that mean something so that
interersted parties can properly handle the situation.
Martin Winter [Mon, 9 Aug 2021 23:52:05 +0000 (01:52 +0200)]
redhat: Install frr.conf only if no per daemon config exists
Install frr.conf template as a template file, but only install it
as a config file if no per daemon file exists. This will use the
integrated config with new setups, but keeps the per-daemon config
for existing users
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Problem Statement:
==================
valgrind shows memleaks in rp_table, when pimd shuts down gracefully.
2020-05-05 22:09:29,451 ERROR: Memory leaks in router [r4] for daemon [pimd]
2020-05-05 22:09:29,451 ERROR: Memory leaks in router [r4] for daemon [zebra]
2020-05-05 22:09:29,637 ERROR: Found memory leak in module pimd
2020-05-05 22:09:29,638 ERROR: ==6178== 184 (56 direct, 128 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 21
2020-05-05 22:09:29,638 ERROR: ==6178== at 0x4C2FFAC: calloc (vg_replace_malloc.c:762)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4E855EE: qcalloc (memory.c:111)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4EAA43C: route_table_init_with_delegate (table.c:52)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x1281A1: pim_rp_init (pim_rp.c:114)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11D0F8: pim_instance_init (pim_instance.c:117)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11D0F8: pim_vrf_new (pim_instance.c:150)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4EB1BEC: vrf_get (vrf.c:209)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x4EB2B2F: vrf_init (vrf.c:493)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11D227: pim_vrf_init (pim_instance.c:217)
2020-05-05 22:09:29,638 ERROR: ==6178== by 0x11BBAB: main (pim_main.c:121)
Fix:
====
rp_info is allocated in pim_rp_init API. rp_info pointer is present
in rp_list and rp_table. In rp_list cleanup, the memory for rp_info
gets freed. rp_table clean up should be done first and then rp_list.
pimd: pim_ifchannel_local_membership_add should not inherit if (S,G) rpf unresolved
Problem:
S,G entry has iif = oif in FHR is LHR case.
Setup:-
R11-----R2----R4
R11 :- FHR and LHR
R2 :- RP
R4 :- LHR
Issue :-
1) shut mapped interface in R11
2) wait for 5 min
3) do FRR restart
5) No shut of mapped interface
OIL is added for local interface also where OIL is same as IIF
and duplicate traffic observed on R4 receives in Ixia
RCA:
pim_ifchannel_local_membership_add adds inherited oif from starg when iif for
SG is unavailable.
When rpf for that SG resolves to this inherited oif from starg, iif is also in oif.
This results in dup traffic.
Fix:
If iif is not available, do not inherit from starg.
ospf6d: fix argument processing in the "area ... range" command
* When the "cost" argument isn't present, the default cost should be
used instead of preserving the previously configured one (if any);
* When the "not-advertise" argument isn't present, the "not-advertise"
flag should be unset regardless if it was previously configured or
not.
Configuration commands should be deterministic and work in the same
way regardless of the current state.
Igor Ryzhov [Thu, 29 Jul 2021 17:21:00 +0000 (20:21 +0300)]
zebra: remove checks for src address existence when using "set src"
1. This check is absolutely useless. Nothing keeps user from deleting
the address right after this check.
2. This check prevents zebra from correctly reading the user config with
"set src" because of a race with interface startup (see #4249).
3. NO OPERATIONAL DATA USAGE ON VALIDATION STAGE.
batmancn [Mon, 30 Nov 2020 12:04:44 +0000 (20:04 +0800)]
zebra: bugfix of error quit of zebra, due to no nexthop ACTIVE
There exists some rare situations where fpm will attempt
to send a route update with no valid nexthops. In that
case an assert would be hit. This is not good for
trying to keep your routing daemons up and running
when we can safely just recover the situation.
Fixes #7588 Signed-off-by: batmancn <batmanustc@gmail.com>
<fixed commit message, and used zlog_err> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 5306e6cf00c58a4c4558609d623ecbbd79faabf1)
Igor Ryzhov [Fri, 30 Jul 2021 11:11:33 +0000 (14:11 +0300)]
ospfd: fix "no ip ospf passive" command
This command is currently always treated as an "unset" command, assuming
that active is the default type of the interface. In reality, the default
type of the interface can be changed using "passive-interface default"
command. Both "no" and regular commands can be "set" commands, depending
on the default value. They are treated as an "unset" when there's already
a config of the opposite type.
All this logic is in ospf_passive_interface_update.
bgpd: Use strict AS4 capability when processing parsing/generating pkts
PeerA sets `dont-capability-negotiate` for PeerB. It does not send any
capabilities to PeerB. This leads to situation when PeerA received AS4 cap,
while it doesn't send AS4 to PeerB and tries parsing AS_PATH using 32bits.
bgpd: Clear capabilities field when resetting a bgp neighbor
Currently, the following sequence of events between peers could
result in erroneous capability reports on the peer
with enabled dont-capability-negotiate option:
- having some of the capabilities advertised to a bgp neighbor,
- then disabling capability negotiation to that neighbor,
- then resetting connection to it,
- and no capabilities are actually sent to the neighbor,
- but "show bgp neighbors" on the host still displays them
as advertised to the neighbor.
There are two possibilities for establishing a new connection
- the established connection was initiated by us with bgp_start(),
- the connection was initiated on the neighbor side and processed by
us via bgp_accept() in bgp_network.c.
The former case results in "show bgp neighbors" displaying only
"received" in capabilities, as the peer's cap is initiated to zero
in bgp_start().
In the latter case, if bgp_accept() happens before bgp_start()
is called, then new peer capabilities are being transferred
from its previous record before being zeroed in bgp_start().
This results in "show bgp neighbors" still displaying
"advertised and received" in capabilities.
Following the logic of a similar af_cap field clearing,
treated correctly in both cases, we
- reset peer's capability during bgp_stop()
- don't pass it over to a new peer structure in bgp_accept().
This fix prevents transferring of the previous capabilities record
to a new peer instance in arbitrary reconnect scenario.
bgpd: Set extended msg size only if we advertised and received capability
If we don't advertise any capabilities (dont-capability-negotiate), we
shouldn't set msg size to 65k only if received this capability from another
peer.
Before:
```
~/frr# vtysh -c 'show ip bgp update-group' | grep 'Max packet size'
Max packet size: 65535
```
After:
```
~/frr# vtysh -c 'show ip bgp update-group' | grep 'Max packet size'
Max packet size: 4096
```
Igor Ryzhov [Wed, 28 Jul 2021 22:17:50 +0000 (01:17 +0300)]
bgpd: fix incorrect usage of slist in dampening
Current code is a complete misuse of SLIST structure. Instead of just
adding a SLIST_ENTRY to struct bgp_damp_info, it allocates a separate
structure to be a node in the list.
Igor Ryzhov [Thu, 29 Jul 2021 11:42:16 +0000 (14:42 +0300)]
bgpd: fix missing list add in dampening
One more crash in dampening code...
When bgp_damp_withdraw is called, if there's already a BDI structure,
bgp_damp_info_claim is called to re-assign the bdi->config in case it
was changed. The problem is that bgp_damp_info_claim actually removes
the BDI from the reuse list of the old config and never adds it to the
reuse list of the new config. We must do this to prevent the crash
because all the code assumes that BDI is always in some list.