Golang 弹幕服务器优化之 一 心跳包

TCP协议层提供了个套接字选项SO_KEEPALIVE用于探测TCP链接是否正常,避免僵死链接浪费资源。百毒一搜SO_KEEPALIVE可以看到各种告诉你怎么用。某些情况下的确可以很好工作,但是。。。。

参考Linux内核/net/ipv4/tcp_input.c代码

static void tcp_keepalive_timer (struct timer_list *t)
{
    struct sock *sk = from_timer(sk, t, sk_timer);
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct tcp_sock *tp = tcp_sk(sk);
    u32 elapsed;

    /* Only process if socket is not in use. */
    bh_lock_sock(sk);
    if (sock_owned_by_user(sk)) {
        /* Try again later. */
        inet_csk_reset_keepalive_timer (sk, HZ/20);
        goto out;
    }

    if (sk->sk_state == TCP_LISTEN) {
        pr_err("Hmm... keepalive on a LISTEN ???\n");
        goto out;
    }

    tcp_mstamp_refresh(tp);
    if (sk->sk_state == TCP_FIN_WAIT2 && sock_flag(sk, SOCK_DEAD)) {
        if (tp->linger2 >= 0) {
            const int tmo = tcp_fin_time(sk) - TCP_TIMEWAIT_LEN;

            if (tmo > 0) {
                tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);
                goto out;
            }
        }
        tcp_send_active_reset(sk, GFP_ATOMIC);
        goto death;
    }

    if (!sock_flag(sk, SOCK_KEEPOPEN) ||
        ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))
        goto out;

    elapsed = keepalive_time_when(tp);

    /* It is alive without keepalive 8) */

    // 注意这里tcp_write_queue_empty如果发送缓存非空,则不会触发。
    if (tp->packets_out || !tcp_write_queue_empty(sk))
        goto resched;

    elapsed = keepalive_time_elapsed(tp);

    if (elapsed >= keepalive_time_when(tp)) {
        /* If the TCP_USER_TIMEOUT option is enabled, use that
         * to determine when to timeout instead.
         */
        if ((icsk->icsk_user_timeout != 0 &&
            elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&
            icsk->icsk_probes_out > 0) ||
            (icsk->icsk_user_timeout == 0 &&
            icsk->icsk_probes_out >= keepalive_probes(tp))) {
            tcp_send_active_reset(sk, GFP_ATOMIC);
            tcp_write_err(sk);
            goto out;
        }
        if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
            icsk->icsk_probes_out++;
            elapsed = keepalive_intvl_when(tp);
        } else {
            /* If keepalive was lost due to local congestion,
             * try harder.
             */
            elapsed = TCP_RESOURCE_PROBE_INTERVAL;
        }
    } else {
        /* It is tp->rcv_tstamp + keepalive_time_when(tp) */
        elapsed = keepalive_time_when(tp) - elapsed;
    }

    sk_mem_reclaim(sk);

resched:
    inet_csk_reset_keepalive_timer (sk, elapsed);
    goto out;

death:
    tcp_done(sk);

out:
    bh_unlock_sock(sk);
    sock_put(sk);
}

考虑用户进程send调用,因为TCP拥塞会将数据放入发送缓存中或者考虑到中间设备故障数据包一直无法ACK重试队列一直有数据则探测机制一直都不会触发,直到以下几种情况发生层才能感知到socket出问题了:

1、重试到达/proc/sys/net/ipv4/tcp_retries1设定次数网络层会去尝试更新路由,并且路由探测机制发现此路不通(No route to host)
2、重试到达/proc/sys/net/ipv4/tcp_retries2设定失败次数,几十分钟和使用的拥塞控制算法与版本有关系。
3、应用层把write buffer写满

而且一些ZZ运营商(比如移动)为了节省NAT出口的资源会对300秒没有业务负载的TCP链接RST,引入应用层的ping/pong包确保链接存活是一个比较好的方案,对链接保活和比较快的发现无效链接很有帮助。发起ping时间太短会影响性能,参考bittorrent协议建议的心跳包是300秒,以及中国移动的NAT环境,我们暂时把心跳定在295秒。

添加新评论