The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/netinet/tcp_congctl.c

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 /*      $NetBSD: tcp_congctl.c,v 1.12 2006/11/16 01:33:45 christos Exp $        */
    2 
    3 /*-
    4  * Copyright (c) 1997, 1998, 1999, 2001, 2005, 2006 The NetBSD Foundation, Inc.
    5  * All rights reserved.
    6  *
    7  * This code is derived from software contributed to The NetBSD Foundation
    8  * by Jason R. Thorpe and Kevin M. Lahey of the Numerical Aerospace Simulation
    9  * Facility, NASA Ames Research Center.
   10  * This code is derived from software contributed to The NetBSD Foundation
   11  * by Charles M. Hannum.
   12  * This code is derived from software contributed to The NetBSD Foundation
   13  * by Rui Paulo.
   14  *
   15  * Redistribution and use in source and binary forms, with or without
   16  * modification, are permitted provided that the following conditions
   17  * are met:
   18  * 1. Redistributions of source code must retain the above copyright
   19  *    notice, this list of conditions and the following disclaimer.
   20  * 2. Redistributions in binary form must reproduce the above copyright
   21  *    notice, this list of conditions and the following disclaimer in the
   22  *    documentation and/or other materials provided with the distribution.
   23  * 3. All advertising materials mentioning features or use of this software
   24  *    must display the following acknowledgement:
   25  *      This product includes software developed by the NetBSD
   26  *      Foundation, Inc. and its contributors.
   27  * 4. Neither the name of The NetBSD Foundation nor the names of its
   28  *    contributors may be used to endorse or promote products derived
   29  *    from this software without specific prior written permission.
   30  *
   31  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
   32  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
   33  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
   34  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
   35  * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
   36  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
   37  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
   38  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
   39  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
   40  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   41  * POSSIBILITY OF SUCH DAMAGE.
   42  */
   43 
   44 /*
   45  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
   46  * All rights reserved.
   47  *
   48  * Redistribution and use in source and binary forms, with or without
   49  * modification, are permitted provided that the following conditions
   50  * are met:
   51  * 1. Redistributions of source code must retain the above copyright
   52  *    notice, this list of conditions and the following disclaimer.
   53  * 2. Redistributions in binary form must reproduce the above copyright
   54  *    notice, this list of conditions and the following disclaimer in the
   55  *    documentation and/or other materials provided with the distribution.
   56  * 3. Neither the name of the project nor the names of its contributors
   57  *    may be used to endorse or promote products derived from this software
   58  *    without specific prior written permission.
   59  *
   60  * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
   61  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   62  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   63  * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
   64  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
   65  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
   66  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
   67  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
   68  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
   69  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   70  * SUCH DAMAGE.
   71  */
   72 
   73 /*
   74  *      @(#)COPYRIGHT   1.1 (NRL) 17 January 1995
   75  *
   76  * NRL grants permission for redistribution and use in source and binary
   77  * forms, with or without modification, of the software and documentation
   78  * created at NRL provided that the following conditions are met:
   79  *
   80  * 1. Redistributions of source code must retain the above copyright
   81  *    notice, this list of conditions and the following disclaimer.
   82  * 2. Redistributions in binary form must reproduce the above copyright
   83  *    notice, this list of conditions and the following disclaimer in the
   84  *    documentation and/or other materials provided with the distribution.
   85  * 3. All advertising materials mentioning features or use of this software
   86  *    must display the following acknowledgements:
   87  *      This product includes software developed by the University of
   88  *      California, Berkeley and its contributors.
   89  *      This product includes software developed at the Information
   90  *      Technology Division, US Naval Research Laboratory.
   91  * 4. Neither the name of the NRL nor the names of its contributors
   92  *    may be used to endorse or promote products derived from this software
   93  *    without specific prior written permission.
   94  *
   95  * THE SOFTWARE PROVIDED BY NRL IS PROVIDED BY NRL AND CONTRIBUTORS ``AS
   96  * IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
   97  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
   98  * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL NRL OR
   99  * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
  100  * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  101  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  102  * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
  103  * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
  104  * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  105  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  106  *
  107  * The views and conclusions contained in the software and documentation
  108  * are those of the authors and should not be interpreted as representing
  109  * official policies, either expressed or implied, of the US Naval
  110  * Research Laboratory (NRL).
  111  */
  112 
  113 /*
  114  * Copyright (c) 1982, 1986, 1988, 1990, 1993, 1994, 1995
  115  *      The Regents of the University of California.  All rights reserved.
  116  *
  117  * Redistribution and use in source and binary forms, with or without
  118  * modification, are permitted provided that the following conditions
  119  * are met:
  120  * 1. Redistributions of source code must retain the above copyright
  121  *    notice, this list of conditions and the following disclaimer.
  122  * 2. Redistributions in binary form must reproduce the above copyright
  123  *    notice, this list of conditions and the following disclaimer in the
  124  *    documentation and/or other materials provided with the distribution.
  125  * 3. Neither the name of the University nor the names of its contributors
  126  *    may be used to endorse or promote products derived from this software
  127  *    without specific prior written permission.
  128  *
  129  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  130  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  131  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  132  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  133  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  134  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  135  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  136  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  137  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  138  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  139  * SUCH DAMAGE.
  140  *
  141  *      @(#)tcp_input.c 8.12 (Berkeley) 5/24/95
  142  */
  143 
  144 #include <sys/cdefs.h>
  145 __KERNEL_RCSID(0, "$NetBSD: tcp_congctl.c,v 1.12 2006/11/16 01:33:45 christos Exp $");
  146 
  147 #include "opt_inet.h"
  148 #include "opt_tcp_debug.h"
  149 #include "opt_tcp_congctl.h"
  150 
  151 #include <sys/param.h>
  152 #include <sys/systm.h>
  153 #include <sys/malloc.h>
  154 #include <sys/mbuf.h>
  155 #include <sys/protosw.h>
  156 #include <sys/socket.h>
  157 #include <sys/socketvar.h>
  158 #include <sys/errno.h>
  159 #include <sys/syslog.h>
  160 #include <sys/pool.h>
  161 #include <sys/domain.h>
  162 #include <sys/kernel.h>
  163 #include <sys/lock.h>
  164 
  165 #include <net/if.h>
  166 #include <net/route.h>
  167 
  168 #include <netinet/in.h>
  169 #include <netinet/in_systm.h>
  170 #include <netinet/ip.h>
  171 #include <netinet/in_pcb.h>
  172 #include <netinet/in_var.h>
  173 #include <netinet/ip_var.h>
  174 
  175 #ifdef INET6
  176 #ifndef INET
  177 #include <netinet/in.h>
  178 #endif
  179 #include <netinet/ip6.h>
  180 #include <netinet6/ip6_var.h>
  181 #include <netinet6/in6_pcb.h>
  182 #include <netinet6/ip6_var.h>
  183 #include <netinet6/in6_var.h>
  184 #include <netinet/icmp6.h>
  185 #include <netinet6/nd6.h>
  186 #endif
  187 
  188 #include <netinet/tcp.h>
  189 #include <netinet/tcp_fsm.h>
  190 #include <netinet/tcp_seq.h>
  191 #include <netinet/tcp_timer.h>
  192 #include <netinet/tcp_var.h>
  193 #include <netinet/tcpip.h>
  194 #include <netinet/tcp_congctl.h>
  195 #ifdef TCP_DEBUG
  196 #include <netinet/tcp_debug.h>
  197 #endif
  198 
  199 /*
  200  * TODO:
  201  *   consider separating the actual implementations in another file.
  202  */
  203 
  204 static int  tcp_reno_fast_retransmit(struct tcpcb *, const struct tcphdr *);
  205 static void tcp_reno_slow_retransmit(struct tcpcb *);
  206 static void tcp_reno_fast_retransmit_newack(struct tcpcb *,
  207     const struct tcphdr *);
  208 static void tcp_reno_newack(struct tcpcb *, const struct tcphdr *);
  209 static void tcp_reno_congestion_exp(struct tcpcb *tp);
  210 
  211 static int  tcp_newreno_fast_retransmit(struct tcpcb *, const struct tcphdr *);
  212 static void tcp_newreno_fast_retransmit_newack(struct tcpcb *,
  213         const struct tcphdr *);
  214 static void tcp_newreno_newack(struct tcpcb *, const struct tcphdr *);
  215 
  216 
  217 static void tcp_congctl_fillnames(void);
  218 
  219 extern int tcprexmtthresh;
  220 
  221 MALLOC_DEFINE(M_TCPCONGCTL, "tcpcongctl", "TCP congestion control structures");
  222 
  223 /*
  224  * Used to list the available congestion control algorithms.
  225  */
  226 struct tcp_congctlent {
  227         TAILQ_ENTRY(tcp_congctlent) congctl_ent;
  228         char               congctl_name[TCPCC_MAXLEN];
  229         struct tcp_congctl *congctl_ctl;
  230 };
  231 TAILQ_HEAD(, tcp_congctlent) tcp_congctlhd;
  232 
  233 struct simplelock tcp_congctl_slock;
  234 
  235 void
  236 tcp_congctl_init(void)
  237 {
  238         int r;
  239         
  240         TAILQ_INIT(&tcp_congctlhd);
  241         simple_lock_init(&tcp_congctl_slock);
  242 
  243         /* Base algorithms. */
  244         r = tcp_congctl_register("reno", &tcp_reno_ctl);
  245         KASSERT(r == 0);
  246         r = tcp_congctl_register("newreno", &tcp_newreno_ctl);
  247         KASSERT(r == 0);
  248 
  249         /* NewReno is the default. */
  250 #ifndef TCP_CONGCTL_DEFAULT
  251 #define TCP_CONGCTL_DEFAULT "newreno"
  252 #endif
  253 
  254         r = tcp_congctl_select(NULL, TCP_CONGCTL_DEFAULT);
  255         KASSERT(r == 0);
  256 }
  257 
  258 /*
  259  * Register a congestion algorithm and select it if we have none.
  260  */
  261 int
  262 tcp_congctl_register(const char *name, struct tcp_congctl *tcc)
  263 {
  264         struct tcp_congctlent *ntcc, *tccp;
  265 
  266         TAILQ_FOREACH(tccp, &tcp_congctlhd, congctl_ent) 
  267                 if (!strcmp(name, tccp->congctl_name)) {
  268                         /* name already registered */
  269                         return EEXIST;
  270                 }
  271 
  272         ntcc = malloc(sizeof(*ntcc), M_TCPCONGCTL, M_WAITOK);
  273 
  274         strlcpy(ntcc->congctl_name, name, sizeof(ntcc->congctl_name) - 1);
  275         ntcc->congctl_ctl = tcc;
  276 
  277         TAILQ_INSERT_TAIL(&tcp_congctlhd, ntcc, congctl_ent);
  278         tcp_congctl_fillnames();
  279 
  280         if (TAILQ_FIRST(&tcp_congctlhd) == ntcc)
  281                 tcp_congctl_select(NULL, name);
  282                 
  283         return 0;
  284 }
  285 
  286 int
  287 tcp_congctl_unregister(const char *name)
  288 {
  289         struct tcp_congctlent *tccp, *rtccp;
  290         unsigned int size;
  291         
  292         rtccp = NULL;
  293         size = 0;
  294         TAILQ_FOREACH(tccp, &tcp_congctlhd, congctl_ent) {
  295                 if (!strcmp(name, tccp->congctl_name))
  296                         rtccp = tccp;
  297                 size++;
  298         }
  299         
  300         if (!rtccp)
  301                 return ENOENT;
  302 
  303         if (size <= 1 || tcp_congctl_global == rtccp->congctl_ctl ||
  304             rtccp->congctl_ctl->refcnt)
  305                 return EBUSY;
  306 
  307         TAILQ_REMOVE(&tcp_congctlhd, rtccp, congctl_ent);
  308         free(rtccp, M_TCPCONGCTL);
  309         tcp_congctl_fillnames();
  310 
  311         return 0;
  312 }
  313 
  314 /*
  315  * Select a congestion algorithm by name.
  316  */
  317 int
  318 tcp_congctl_select(struct tcpcb *tp, const char *name)
  319 {
  320         struct tcp_congctlent *tccp;
  321 
  322         KASSERT(name);
  323 
  324         TAILQ_FOREACH(tccp, &tcp_congctlhd, congctl_ent)
  325                 if (!strcmp(name, tccp->congctl_name)) {
  326                         if (tp) {
  327                                 simple_lock(&tcp_congctl_slock);
  328                                 tp->t_congctl->refcnt--;
  329                                 tp->t_congctl = tccp->congctl_ctl;
  330                                 tp->t_congctl->refcnt++;
  331                                 simple_unlock(&tcp_congctl_slock);
  332                         } else {
  333                                 tcp_congctl_global = tccp->congctl_ctl;
  334                                 strlcpy(tcp_congctl_global_name,
  335                                     tccp->congctl_name,
  336                                     sizeof(tcp_congctl_global_name) - 1);
  337                         }
  338                         return 0;
  339                 }
  340         
  341         return EINVAL;
  342 }
  343 
  344 /*
  345  * Returns the name of a congestion algorithm.
  346  */
  347 const char *
  348 tcp_congctl_bystruct(const struct tcp_congctl *tcc)
  349 {
  350         struct tcp_congctlent *tccp;
  351         
  352         KASSERT(tcc);
  353         
  354         TAILQ_FOREACH(tccp, &tcp_congctlhd, congctl_ent)
  355                 if (tccp->congctl_ctl == tcc)
  356                         return tccp->congctl_name;
  357 
  358         return NULL;
  359 }
  360 
  361 static void
  362 tcp_congctl_fillnames(void)
  363 {
  364         struct tcp_congctlent *tccp;
  365         const char *delim = " ";
  366         
  367         tcp_congctl_avail[0] = '\0';
  368         TAILQ_FOREACH(tccp, &tcp_congctlhd, congctl_ent) {
  369                 strlcat(tcp_congctl_avail, tccp->congctl_name,
  370                     sizeof(tcp_congctl_avail) - 1);
  371                 if (TAILQ_NEXT(tccp, congctl_ent))
  372                         strlcat(tcp_congctl_avail, delim, 
  373                             sizeof(tcp_congctl_avail) - 1);
  374         }       
  375         
  376 }
  377 
  378 /* ------------------------------------------------------------------------ */
  379 
  380 /*
  381  * TCP/Reno congestion control.
  382  */
  383 static void
  384 tcp_reno_congestion_exp(struct tcpcb *tp)
  385 {
  386         u_int win;
  387 
  388         /* 
  389          * Halve the congestion window and reduce the
  390          * slow start threshold.
  391          */
  392         win = min(tp->snd_wnd, tp->snd_cwnd) / 2 / tp->t_segsz;
  393         if (win < 2)
  394                 win = 2;
  395 
  396         tp->snd_ssthresh = win * tp->t_segsz;
  397         tp->snd_recover = tp->snd_max;
  398         tp->snd_cwnd = tp->snd_ssthresh;
  399 
  400         /*
  401          * When using TCP ECN, notify the peer that
  402          * we reduced the cwnd.
  403          */
  404         if (TCP_ECN_ALLOWED(tp))
  405                 tp->t_flags |= TF_ECN_SND_CWR;
  406 }
  407 
  408 
  409 
  410 static int
  411 tcp_reno_fast_retransmit(struct tcpcb *tp, const struct tcphdr *th)
  412 {
  413         /*
  414          * We know we're losing at the current
  415          * window size so do congestion avoidance
  416          * (set ssthresh to half the current window
  417          * and pull our congestion window back to
  418          * the new ssthresh).
  419          *
  420          * Dup acks mean that packets have left the
  421          * network (they're now cached at the receiver)
  422          * so bump cwnd by the amount in the receiver
  423          * to keep a constant cwnd packets in the
  424          * network.
  425          *
  426          * If we are using TCP/SACK, then enter
  427          * Fast Recovery if the receiver SACKs
  428          * data that is tcprexmtthresh * MSS
  429          * bytes past the last ACKed segment,
  430          * irrespective of the number of DupAcks.
  431          */
  432         
  433         tcp_seq onxt;
  434         
  435         onxt = tp->snd_nxt;
  436         tcp_reno_congestion_exp(tp);
  437         tp->t_partialacks = 0;
  438         TCP_TIMER_DISARM(tp, TCPT_REXMT);
  439         tp->t_rtttime = 0;
  440         if (TCP_SACK_ENABLED(tp)) {
  441                 tp->t_dupacks = tcprexmtthresh;
  442                 tp->sack_newdata = tp->snd_nxt;
  443                 tp->snd_cwnd = tp->t_segsz;
  444                 (void) tcp_output(tp);
  445                 return 0;
  446         }
  447         tp->snd_nxt = th->th_ack;
  448         tp->snd_cwnd = tp->t_segsz;
  449         (void) tcp_output(tp);
  450         tp->snd_cwnd = tp->snd_ssthresh + tp->t_segsz * tp->t_dupacks;
  451         if (SEQ_GT(onxt, tp->snd_nxt))
  452                 tp->snd_nxt = onxt;
  453         
  454         return 0;
  455 }
  456 
  457 static void
  458 tcp_reno_slow_retransmit(struct tcpcb *tp)
  459 {
  460         u_int win;
  461 
  462         /*
  463          * Close the congestion window down to one segment
  464          * (we'll open it by one segment for each ack we get).
  465          * Since we probably have a window's worth of unacked
  466          * data accumulated, this "slow start" keeps us from
  467          * dumping all that data as back-to-back packets (which
  468          * might overwhelm an intermediate gateway).
  469          *
  470          * There are two phases to the opening: Initially we
  471          * open by one mss on each ack.  This makes the window
  472          * size increase exponentially with time.  If the
  473          * window is larger than the path can handle, this
  474          * exponential growth results in dropped packet(s)
  475          * almost immediately.  To get more time between
  476          * drops but still "push" the network to take advantage
  477          * of improving conditions, we switch from exponential
  478          * to linear window opening at some threshhold size.
  479          * For a threshhold, we use half the current window
  480          * size, truncated to a multiple of the mss.
  481          *
  482          * (the minimum cwnd that will give us exponential
  483          * growth is 2 mss.  We don't allow the threshhold
  484          * to go below this.)
  485          */
  486 
  487         win = min(tp->snd_wnd, tp->snd_cwnd) / 2 / tp->t_segsz;
  488         if (win < 2)
  489                 win = 2;
  490         /* Loss Window MUST be one segment. */
  491         tp->snd_cwnd = tp->t_segsz;
  492         tp->snd_ssthresh = win * tp->t_segsz;
  493         tp->t_partialacks = -1;
  494         tp->t_dupacks = 0;
  495         tp->t_bytes_acked = 0;
  496 }
  497 
  498 static void
  499 tcp_reno_fast_retransmit_newack(struct tcpcb *tp,
  500     const struct tcphdr *th)
  501 {
  502         if (tp->t_partialacks < 0) {
  503                 /*
  504                  * We were not in fast recovery.  Reset the duplicate ack
  505                  * counter.
  506                  */
  507                 tp->t_dupacks = 0;
  508         } else {
  509                 /*
  510                  * Clamp the congestion window to the crossover point and
  511                  * exit fast recovery.
  512                  */
  513                 if (tp->snd_cwnd > tp->snd_ssthresh)
  514                         tp->snd_cwnd = tp->snd_ssthresh;
  515                 tp->t_partialacks = -1;
  516                 tp->t_dupacks = 0;
  517                 tp->t_bytes_acked = 0;
  518         }
  519 }
  520 
  521 static void
  522 tcp_reno_newack(struct tcpcb *tp, const struct tcphdr *th)
  523 {
  524         /*
  525          * When new data is acked, open the congestion window.
  526          */
  527 
  528         u_int cw = tp->snd_cwnd;
  529         u_int incr = tp->t_segsz;
  530 
  531         if (tcp_do_abc) {
  532 
  533                 /*
  534                  * RFC 3465 Appropriate Byte Counting (ABC)
  535                  */
  536 
  537                 int acked = th->th_ack - tp->snd_una;
  538 
  539                 if (cw >= tp->snd_ssthresh) {
  540                         tp->t_bytes_acked += acked;
  541                         if (tp->t_bytes_acked >= cw) {
  542                                 /* Time to increase the window. */
  543                                 tp->t_bytes_acked -= cw;
  544                         } else {
  545                                 /* No need to increase yet. */
  546                                 incr = 0;
  547                         }
  548                 } else {
  549                         /*
  550                          * use 2*SMSS or 1*SMSS for the "L" param,
  551                          * depending on sysctl setting.
  552                          *
  553                          * (See RFC 3465 2.3 Choosing the Limit)
  554                          */
  555                         u_int abc_lim;
  556 
  557                         abc_lim = (tcp_abc_aggressive == 0 ||
  558                             tp->snd_nxt != tp->snd_max) ? incr : incr * 2;
  559                         incr = min(acked, abc_lim);
  560                 }
  561         } else {
  562 
  563                 /*
  564                  * If the window gives us less than ssthresh packets
  565                  * in flight, open exponentially (segsz per packet).
  566                  * Otherwise open linearly: segsz per window
  567                  * (segsz^2 / cwnd per packet).
  568                  */
  569 
  570                 if (cw >= tp->snd_ssthresh) {
  571                         incr = incr * incr / cw;
  572                 }
  573         }
  574 
  575         tp->snd_cwnd = min(cw + incr, TCP_MAXWIN << tp->snd_scale);
  576 }
  577 
  578 struct tcp_congctl tcp_reno_ctl = {
  579         .fast_retransmit = tcp_reno_fast_retransmit,
  580         .slow_retransmit = tcp_reno_slow_retransmit,
  581         .fast_retransmit_newack = tcp_reno_fast_retransmit_newack,
  582         .newack = tcp_reno_newack,
  583         .cong_exp = tcp_reno_congestion_exp,
  584 };
  585 
  586 /*
  587  * TCP/NewReno Congestion control.
  588  */
  589 static int
  590 tcp_newreno_fast_retransmit(struct tcpcb *tp, const struct tcphdr *th)
  591 {
  592         if (SEQ_LT(th->th_ack, tp->snd_high)) {
  593                 /*
  594                  * False fast retransmit after timeout.
  595                  * Do not enter fast recovery
  596                  */
  597                 tp->t_dupacks = 0;
  598                 return 1;
  599         } else {
  600                 /*
  601                  * Fast retransmit is same as reno.
  602                  */
  603                 return tcp_reno_fast_retransmit(tp, th);
  604         }
  605 
  606         return 0;
  607 }
  608 
  609 /*
  610  * Implement the NewReno response to a new ack, checking for partial acks in
  611  * fast recovery.
  612  */
  613 static void
  614 tcp_newreno_fast_retransmit_newack(struct tcpcb *tp, const struct tcphdr *th)
  615 {
  616         if (tp->t_partialacks < 0) {
  617                 /*
  618                  * We were not in fast recovery.  Reset the duplicate ack
  619                  * counter.
  620                  */
  621                 tp->t_dupacks = 0;
  622         } else if (SEQ_LT(th->th_ack, tp->snd_recover)) {
  623                 /*
  624                  * This is a partial ack.  Retransmit the first unacknowledged
  625                  * segment and deflate the congestion window by the amount of
  626                  * acknowledged data.  Do not exit fast recovery.
  627                  */
  628                 tcp_seq onxt = tp->snd_nxt;
  629                 u_long ocwnd = tp->snd_cwnd;
  630 
  631                 /*
  632                  * snd_una has not yet been updated and the socket's send
  633                  * buffer has not yet drained off the ACK'd data, so we
  634                  * have to leave snd_una as it was to get the correct data
  635                  * offset in tcp_output().
  636                  */
  637                 if (++tp->t_partialacks == 1)
  638                         TCP_TIMER_DISARM(tp, TCPT_REXMT);
  639                 tp->t_rtttime = 0;
  640                 tp->snd_nxt = th->th_ack;
  641                 /*
  642                  * Set snd_cwnd to one segment beyond ACK'd offset.  snd_una
  643                  * is not yet updated when we're called.
  644                  */
  645                 tp->snd_cwnd = tp->t_segsz + (th->th_ack - tp->snd_una);
  646                 (void) tcp_output(tp);
  647                 tp->snd_cwnd = ocwnd;
  648                 if (SEQ_GT(onxt, tp->snd_nxt))
  649                         tp->snd_nxt = onxt;
  650                 /*
  651                  * Partial window deflation.  Relies on fact that tp->snd_una
  652                  * not updated yet.
  653                  */
  654                 tp->snd_cwnd -= (th->th_ack - tp->snd_una - tp->t_segsz);
  655         } else {
  656                 /*
  657                  * Complete ack.  Inflate the congestion window to ssthresh
  658                  * and exit fast recovery.
  659                  *
  660                  * Window inflation should have left us with approx.
  661                  * snd_ssthresh outstanding data.  But in case we
  662                  * would be inclined to send a burst, better to do
  663                  * it via the slow start mechanism.
  664                  */
  665                 if (SEQ_SUB(tp->snd_max, th->th_ack) < tp->snd_ssthresh)
  666                         tp->snd_cwnd = SEQ_SUB(tp->snd_max, th->th_ack)
  667                             + tp->t_segsz;
  668                 else
  669                         tp->snd_cwnd = tp->snd_ssthresh;
  670                 tp->t_partialacks = -1;
  671                 tp->t_dupacks = 0;
  672                 tp->t_bytes_acked = 0;
  673         }
  674 }
  675 
  676 static void
  677 tcp_newreno_newack(struct tcpcb *tp, const struct tcphdr *th)
  678 {
  679         /*
  680          * If we are still in fast recovery (meaning we are using
  681          * NewReno and we have only received partial acks), do not
  682          * inflate the window yet.
  683          */
  684         if (tp->t_partialacks < 0)
  685                 tcp_reno_newack(tp, th);
  686 }
  687 
  688 
  689 struct tcp_congctl tcp_newreno_ctl = {
  690         .fast_retransmit = tcp_newreno_fast_retransmit,
  691         .slow_retransmit = tcp_reno_slow_retransmit,
  692         .fast_retransmit_newack = tcp_newreno_fast_retransmit_newack,
  693         .newack = tcp_newreno_newack,
  694         .cong_exp = tcp_reno_congestion_exp,
  695 };
  696 
  697 

Cache object: c8936c95ce66039e9ebfca8a02150fd3


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.