diff options
| author | Donald Sharp <sharpd@cumulusnetworks.com> | 2019-08-28 12:09:41 -0400 | 
|---|---|---|
| committer | Donald Sharp <sharpd@cumulusnetworks.com> | 2019-08-28 12:09:41 -0400 | 
| commit | 11375c52740089b6b49ca7d56b2cea0c7208338c (patch) | |
| tree | 4ce8b0c75c2c57bdc77a18c17059a457eca857b7 /lib/frrcu.c | |
| parent | 8f910d6c3f9e92422764a625ec3a9c23a87df6bf (diff) | |
lib: Stop arm crash on shutdown
Arm platforms are crashing in our topotests with this callstack;
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0xffffabb591d0 (LWP 18947))]
(gdb) bt
    file=file@entry=0xaaaadfed1e48 "lib/memory.c", line=line@entry=80,
    function=function@entry=0xaaaadfed1db8 <__func__.10514> "mt_count_free") at lib/log.c:837
(gdb)
So we are crashing because we are attempting to free a mtype that has no allocations
associated with it.
I added this debug code:
@@ -227,7 +230,9 @@ static void rcu_bump(void)
     struct rcu_next *rn;
     rn = XMALLOC(MTYPE_RCU_NEXT, sizeof(*rn));
-
+    zlog_debug("RCU_BUMP");
+    mtype_dump(MTYPE_RCU_THREAD);
+    mtype_dump(MTYPE_RCU_NEXT);
     /* note: each RCUA_NEXT item corresponds to exactly one seqno bump.
      * This means we don't need to communicate which seqno is which
      * RCUA_NEXT, since we really don't care.
and added a mtype_dump function:
+void mtype_dump(struct memtype *mt)
+{
+    zlog_debug("%s: %d", mt->name, (int)mt->n_alloc);
+}
Which resulted in this output:
2019/08/28 15:41:11 BGP: RCU_BUMP
2019/08/28 15:41:11 BGP: RCU thread: 3
2019/08/28 15:41:11 BGP: RCU thread: 3
If we look at the defintion of the two static memory types:
DEFINE_MTYPE_STATIC(LIB, RCU_THREAD,    "RCU thread")
DEFINE_MTYPE_STATIC(LIB, RCU_NEXT,      "RCU sequence barrier")
I would have expected the output to be:
RCU_BUMP
RCU thread: 3
RCU sequence barrier: X
instead.
As a thought experiment I reduced the number of static memory types
to 1 in the file and the crash stopped happening.
I suspect we have a systematic error on arm in lib/memory.h
due to the asm code.  I am going to leave that alone for the
moment ( and leave the crash issue open ), but see if we
can get this code change into the system so that our CI
system becomes happy again.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Diffstat (limited to 'lib/frrcu.c')
| -rw-r--r-- | lib/frrcu.c | 5 | 
1 files changed, 2 insertions, 3 deletions
diff --git a/lib/frrcu.c b/lib/frrcu.c index 7e6475b648..54626f909d 100644 --- a/lib/frrcu.c +++ b/lib/frrcu.c @@ -55,7 +55,6 @@  #include "atomlist.h"  DEFINE_MTYPE_STATIC(LIB, RCU_THREAD,    "RCU thread") -DEFINE_MTYPE_STATIC(LIB, RCU_NEXT,      "RCU sequence barrier")  DECLARE_ATOMLIST(rcu_heads, struct rcu_head, head) @@ -226,7 +225,7 @@ static void rcu_bump(void)  {  	struct rcu_next *rn; -	rn = XMALLOC(MTYPE_RCU_NEXT, sizeof(*rn)); +	rn = XMALLOC(MTYPE_RCU_THREAD, sizeof(*rn));  	/* note: each RCUA_NEXT item corresponds to exactly one seqno bump.  	 * This means we don't need to communicate which seqno is which @@ -269,7 +268,7 @@ static void rcu_bump(void)  	 * "last item is being deleted - start over" case, and then we may end  	 * up accessing old RCU queue items that are already free'd.  	 */ -	rcu_free_internal(MTYPE_RCU_NEXT, rn, head_free); +	rcu_free_internal(MTYPE_RCU_THREAD, rn, head_free);  	/* Only allow the RCU sweeper to run after these 2 items are queued.  	 *  | 
