From 7a86aa5a0a537dd2fb7103658bd123b996e89dcc Mon Sep 17 00:00:00 2001 From: Quentin Young Date: Mon, 6 Nov 2017 00:33:46 -0500 Subject: [PATCH] bgpd: schedule packet job after connection xfer During initial session establishment, bgpd performs a "connection transfer" to a new peer struct if the connection was initiated passively (i.e. by the remote peer). With the addition of buffered input and a reorganized packet processor, the following race condition manifests: 1. Remote peer initiates a connection. After exchanging OPEN messages, we send them a KEEPALIVE. They send us a KEEPALIVE followed by 10,000 UPDATE messages. The I/O thread pushes these onto our local peer's input buffer and schedules a packet processing job on the main thread. 2. The packet job runs and processes the KEEPALIVE, which completes the handshake on our end. As part of transferring to ESTABLISHED we transfer all peer state to a new struct, as mentioned. Upon returning from the KEEPALIVE processing routing, the peer context we had has now been destroyed. We notice this and stop processing. Meanwhile 10k UPDATE messages are sitting on the input buffer. 3. N seconds later, the remote peer sends us a KEEPALIVE. The I/O thread schedules another process job, which finds 10k UPDATEs waiting for it. Convergence is achieved, but has been delayed by the value of the KEEPALIVE timer. The racey part is that if the remote peer takes a little bit of time to send UPDATEs after KEEPALIVEs -- somewhere on the order of a few hundred milliseconds -- we complete the transfer successfully and the packet processing job is scheduled on the new peer upon arrival of the UPDATE messages. Yuck. The solution is to schedule a packet processing job on the new peer struct after transferring state. Lengthy commit message in case someone has to debug similar problems in the future... Signed-off-by: Quentin Young --- bgpd/bgp_fsm.c | 3 +++ bgpd/bgp_io.c | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c index b97bebd7bb..9e58e466e1 100644 --- a/bgpd/bgp_fsm.c +++ b/bgpd/bgp_fsm.c @@ -139,6 +139,7 @@ static struct peer *peer_xfer_conn(struct peer *from_peer) BGP_TIMER_OFF(from_peer->t_connect); BGP_TIMER_OFF(from_peer->t_connect_check_r); BGP_TIMER_OFF(from_peer->t_connect_check_w); + BGP_TIMER_OFF(from_peer->t_process_packet); /* * At this point in time, it is possible that there are packets pending @@ -265,6 +266,8 @@ static struct peer *peer_xfer_conn(struct peer *from_peer) bgp_reads_on(peer); bgp_writes_on(peer); + thread_add_timer_msec(bm->master, bgp_process_packet, peer, 0, + &peer->t_process_packet); if (from_peer) peer_xfer_stats(peer, from_peer); diff --git a/bgpd/bgp_io.c b/bgpd/bgp_io.c index 80e7f13dab..807f577a2d 100644 --- a/bgpd/bgp_io.c +++ b/bgpd/bgp_io.c @@ -344,8 +344,8 @@ static int bgp_process_reads(struct thread *thread) thread_add_read(fpt->master, bgp_process_reads, peer, peer->fd, &peer->t_read); if (added_pkt) - thread_add_event(bm->master, bgp_process_packet, peer, - 0, NULL); + thread_add_timer_msec(bm->master, bgp_process_packet, + peer, 0, &peer->t_process_packet); } return 0; -- 2.39.5