| Age | Commit message (Collapse) | Author |
|
This reverts commit 23bdaba147517abb4a70d03166ad74f759888d46.
|
|
Again instead of making the keepalives be peer based
use the connection to make it happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
The peer is going to eventually have a incoming and
outgoing connection. Let's send the data based
upon the connection not the peer.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
The bgp_keepalives_on|off functions should use a peer_connection
as a basis for it's operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Add a hash_clean_and_free() function as well as convert
the code to use it. This function also takes a double
pointer to the hash to set it NULL. Also it cleanly
does nothing if the pointer is NULL( as a bunch of
code tested for ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Done with a combination of regex'ing and banging my head against a wall.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
|
|
bgpd: Fix crash during shutdown due to race condition
|
|
[New LWP 2524]
[New LWP 2539]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/avi/bin/bgpd -f /run/frr/avi_ns3_bgpd.config -i /opt/avi/etc/avi_ns3_bgpd.'.
Program terminated with signal SIGABRT, Aborted.
[Current thread is 1 (Thread 0x7f92ac8f1740 (LWP 2524))]
0 0x00007f92acb3800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f92ac8f1740 (LWP 2524))]
0 0x00007f92acb3800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007f92acb17859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f92acb17729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
3 0x00007f92acb28fd6 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
4 0x00007f92accf2164 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
5 0x000055b46be1ef63 in bgp_keepalives_wake () at bgpd/bgp_keepalives.c:311
6 0x000055b46be1f111 in bgp_keepalives_stop (fpt=0x55b46cfacf20, result=<optimized out>) at bgpd/bgp_keepalives.c:323
7 0x00007f92acea9521 in frr_pthread_stop (fpt=0x55b46cfacf20, result=result@entry=0x0) at lib/frr_pthread.c:176
8 0x00007f92acea9586 in frr_pthread_stop_all () at lib/frr_pthread.c:188
9 0x000055b46bdde54a in bgp_pthreads_finish () at bgpd/bgpd.c:8150
10 0x000055b46bd696ca in bgp_exit (status=0) at bgpd/bgp_main.c:210
11 sigint () at bgpd/bgp_main.c:154
12 0x00007f92acecc1e9 in quagga_sigevent_process () at lib/sigevent.c:105
13 0x00007f92aced689a in thread_fetch (m=m@entry=0x55b46cf23540, fetch=fetch@entry=0x7fff95379238) at lib/thread.c:1487
14 0x00007f92aceb2681 in frr_run (master=0x55b46cf23540) at lib/libfrr.c:1010
15 0x000055b46bd676f4 in main (argc=11, argv=0x7fff953795a8) at bgpd/bgp_main.c:482
Root cause:
This is due to race condition between main thread & keepalive thread during clean-up.
This happens when the keepalive thread is processing a wake signal owning the mutex, when meanwhile the main thread tries to stop the keepalives thread.
In main thread, the keepalive thread’s running bit (fpt->running) is set to false, without taking the mutex & then it blocks on mutex.
Meanwhile, keepalive thread which owns the mutex sees that the running bit is false & executes bgp_keepalives_finish() which also frees up mutex.
Main thread that is waiting on mutex with pthread_mutex_lock() will cause core while trying to access mutex.
Fix:
Take the lock in main thread while setting the fpt->running to false.
Signed-off-by: Samanvitha B Bhargav <bsamanvitha@vmware.com>
|
|
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Log keepalive timer expiry against 'debug bgp keepalive' instead
of 'debug bgp neighbor-events'.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
|
|
Don't hold the rcu lock in the bgp keepalive pthread: it
blocks the rcu pthread and prevents log-file deletion.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
|
|
convert:
frr_with_mutex(..)
to:
frr_with_mutex (..)
To make all our code agree with what clang-format is going to produce
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
Wrong: memset(&a, 0, sizeof(struct ...));
Good: memset(&a, 0, sizeof(a));
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|
Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.
Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.
Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.
Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
ensure it is a created node, not a found node.
Refer to `isis_vertex_queue_insert()` of isisd, there
are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
is a found node, then free <searching_data>.
Refer to `aspath_intern()` of bgpd, there are many
examples of this case in bgpd.
Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
|
|
const const const your boat, merrily down the stream...
Signed-off-by: David Lamparter <equinox@diac24.net>
|
|
RFC4271 specifies behavior when the hold timer is sent to zero - we
should not send keepalives or run a hold timer. But FRR, and other
vendors, allow the keepalive timer to be set to zero with a nonzero hold
timer. In this case we were sending keepalives constantly and maxing out
a pthread to do so. Instead behave similarly to other vendors and do not
send keepalives.
Unsure what the utility of this is, but blasting keepalives is
definitely the wrong thing to do.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited. This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
|
|
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Presume typo from original author
Signed-off-by: Tim Bray <tim@kooky.org>
|
|
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
|
|
the thread
The current invocation of frr_pthread_set_name was causing it reset the os_name.
There is no need for this, we now always create the pthread appropriately
to have both name and os_name. So convert this function to a simple
call through of the pthread call now.
Before(any of these changes):
sharpd@robot ~/frr1> ps -L -p 16895
PID LWP TTY TIME CMD
16895 16895 ? 00:01:39 bgpd
16895 16896 ? 00:00:54
16895 16897 ? 00:00:07 bgpd_ka
After:
sharpd@donna ~/frr1> ps -L -p 1752
PID LWP TTY TIME CMD
1752 1752 ? 00:00:00 bgpd
1752 1753 ? 00:00:00 bgpd_io
1752 1754 ? 00:00:00 bgpd_ka
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
All I can see is an unneccessary complication. If there's some purpose
here it needs to be documented...
Signed-off-by: David Lamparter <equinox@diac24.net>
|
|
Testing Done:
TOR#cat /proc/2670/task/2672/comm
bgpd_ka
TOR# ps H -C bgpd -o 'pid tid cmd comm'
PID TID CMD COMMAND
2670 2670 /usr/lib/frr/bgpd -M snmp - bgpd
2670 2671 /usr/lib/frr/bgpd -M snmp - bgpd
2670 2672 /usr/lib/frr/bgpd -M snmp - bgpd_ka
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
|
|
In bgp_keepalives.c, it was noticed that we were
ensuring that we called an intialization function first,
but this is a development escape in that once this
was fixed we never see it. So if a developer moves
this assumption around, let's crash the program and
lead them to this spot instead of silently ignoring
the problem.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
|
|
Incorrect check for sentinel value effectively caused peers to sometimes
use the keepalive timer value of other peers, which sometimes led to
hold timer expiry.
Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
|
|
If a peer already has keepalives turned on when asking to turn them on,
return immediately. Same thing for turning them off.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Use the new threading facilities provided in lib/ to streamline the
threads used in bgpd. In particular, all of the lifecycle code has been
removed from the I/O thread and replaced with the default loop. Did not
do the same to the keepalives thread as it is much smaller (doesn't need
the event system).
Also cleaned up some comments to match the style guide.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
This is necessary because otherwise between the time we wipe the output
buffer and the time we push the NOTIFY onto it, the KA generation thread
could have pushed a KEEPALIVE in the middle.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
And update copyright header.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
* Move and modify all network input related code to bgp_io.c
* Add a real input buffer to `struct peer`
* Move connection initialization to its own thread.c task instead of
piggybacking off of bgp_read()
* Tons of little fixups
Primary changes are in bgp_packet.[ch], bgp_io.[ch], bgp_fsm.[ch].
Changes made elsewhere are almost exclusively refactoring peer->ibuf to
peer->curr since peer->ibuf is now the true FIFO packet input buffer
while peer->curr represents the packet currently being processed by the
main pthread.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Large numbers of peers makes insertion and removal time for a linked
list non-negligible.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
Changes all synchronization primitives to be dynamically allocated. This
should help catch any subtle errors in pthread lifecycles.
This change also pre-initializes synchronization primitives before
threads begin to run, eliminating a potential race condition that
probably would have caused a segfault on startup on a very fast box.
Also changes mutex and condition variable allocations to use
MTYPE_PTHREAD and updates tests to do the proper initializations.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
* Remove t_write
* Remove t_keepalive
These have been replaced by pthreads and are no longer needed. Since
some code looks at these values to determine if the threads are
scheduled, also add a new bitfield to store the same information.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
This patch, in tandem with moving packet writes into a dedicated kernel
thread, fixes session flaps caused by long-running internal operations
starving the (old) userspace write thread.
BGP keepalives are now produced by a kernel thread and placed onto the
peer's output queue. These are then consumed by the write thread. Both
of these tasks are concurrent with the rest of bgpd, obviating the
session flaps described above.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|