The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/kern/uipc_socket.c

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 /*-
    2  * SPDX-License-Identifier: BSD-3-Clause
    3  *
    4  * Copyright (c) 1982, 1986, 1988, 1990, 1993
    5  *      The Regents of the University of California.
    6  * Copyright (c) 2004 The FreeBSD Foundation
    7  * Copyright (c) 2004-2008 Robert N. M. Watson
    8  * All rights reserved.
    9  *
   10  * Redistribution and use in source and binary forms, with or without
   11  * modification, are permitted provided that the following conditions
   12  * are met:
   13  * 1. Redistributions of source code must retain the above copyright
   14  *    notice, this list of conditions and the following disclaimer.
   15  * 2. Redistributions in binary form must reproduce the above copyright
   16  *    notice, this list of conditions and the following disclaimer in the
   17  *    documentation and/or other materials provided with the distribution.
   18  * 3. Neither the name of the University nor the names of its contributors
   19  *    may be used to endorse or promote products derived from this software
   20  *    without specific prior written permission.
   21  *
   22  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
   23  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   24  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   25  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
   26  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
   27  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
   28  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
   29  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
   30  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
   31  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   32  * SUCH DAMAGE.
   33  *
   34  *      @(#)uipc_socket.c       8.3 (Berkeley) 4/15/94
   35  */
   36 
   37 /*
   38  * Comments on the socket life cycle:
   39  *
   40  * soalloc() sets of socket layer state for a socket, called only by
   41  * socreate() and sonewconn().  Socket layer private.
   42  *
   43  * sodealloc() tears down socket layer state for a socket, called only by
   44  * sofree() and sonewconn().  Socket layer private.
   45  *
   46  * pru_attach() associates protocol layer state with an allocated socket;
   47  * called only once, may fail, aborting socket allocation.  This is called
   48  * from socreate() and sonewconn().  Socket layer private.
   49  *
   50  * pru_detach() disassociates protocol layer state from an attached socket,
   51  * and will be called exactly once for sockets in which pru_attach() has
   52  * been successfully called.  If pru_attach() returned an error,
   53  * pru_detach() will not be called.  Socket layer private.
   54  *
   55  * pru_abort() and pru_close() notify the protocol layer that the last
   56  * consumer of a socket is starting to tear down the socket, and that the
   57  * protocol should terminate the connection.  Historically, pru_abort() also
   58  * detached protocol state from the socket state, but this is no longer the
   59  * case.
   60  *
   61  * socreate() creates a socket and attaches protocol state.  This is a public
   62  * interface that may be used by socket layer consumers to create new
   63  * sockets.
   64  *
   65  * sonewconn() creates a socket and attaches protocol state.  This is a
   66  * public interface  that may be used by protocols to create new sockets when
   67  * a new connection is received and will be available for accept() on a
   68  * listen socket.
   69  *
   70  * soclose() destroys a socket after possibly waiting for it to disconnect.
   71  * This is a public interface that socket consumers should use to close and
   72  * release a socket when done with it.
   73  *
   74  * soabort() destroys a socket without waiting for it to disconnect (used
   75  * only for incoming connections that are already partially or fully
   76  * connected).  This is used internally by the socket layer when clearing
   77  * listen socket queues (due to overflow or close on the listen socket), but
   78  * is also a public interface protocols may use to abort connections in
   79  * their incomplete listen queues should they no longer be required.  Sockets
   80  * placed in completed connection listen queues should not be aborted for
   81  * reasons described in the comment above the soclose() implementation.  This
   82  * is not a general purpose close routine, and except in the specific
   83  * circumstances described here, should not be used.
   84  *
   85  * sofree() will free a socket and its protocol state if all references on
   86  * the socket have been released, and is the public interface to attempt to
   87  * free a socket when a reference is removed.  This is a socket layer private
   88  * interface.
   89  *
   90  * NOTE: In addition to socreate() and soclose(), which provide a single
   91  * socket reference to the consumer to be managed as required, there are two
   92  * calls to explicitly manage socket references, soref(), and sorele().
   93  * Currently, these are generally required only when transitioning a socket
   94  * from a listen queue to a file descriptor, in order to prevent garbage
   95  * collection of the socket at an untimely moment.  For a number of reasons,
   96  * these interfaces are not preferred, and should be avoided.
   97  *
   98  * NOTE: With regard to VNETs the general rule is that callers do not set
   99  * curvnet. Exceptions to this rule include soabort(), sodisconnect(),
  100  * sofree() (and with that sorele(), sotryfree()), as well as sonewconn()
  101  * and sorflush(), which are usually called from a pre-set VNET context.
  102  * sopoll() currently does not need a VNET context to be set.
  103  */
  104 
  105 #include <sys/cdefs.h>
  106 __FBSDID("$FreeBSD$");
  107 
  108 #include "opt_inet.h"
  109 #include "opt_inet6.h"
  110 #include "opt_sctp.h"
  111 
  112 #include <sys/param.h>
  113 #include <sys/systm.h>
  114 #include <sys/fcntl.h>
  115 #include <sys/limits.h>
  116 #include <sys/lock.h>
  117 #include <sys/mac.h>
  118 #include <sys/malloc.h>
  119 #include <sys/mbuf.h>
  120 #include <sys/mutex.h>
  121 #include <sys/domain.h>
  122 #include <sys/file.h>                   /* for struct knote */
  123 #include <sys/hhook.h>
  124 #include <sys/kernel.h>
  125 #include <sys/khelp.h>
  126 #include <sys/event.h>
  127 #include <sys/eventhandler.h>
  128 #include <sys/poll.h>
  129 #include <sys/proc.h>
  130 #include <sys/protosw.h>
  131 #include <sys/socket.h>
  132 #include <sys/socketvar.h>
  133 #include <sys/resourcevar.h>
  134 #include <net/route.h>
  135 #include <sys/signalvar.h>
  136 #include <sys/stat.h>
  137 #include <sys/sx.h>
  138 #include <sys/sysctl.h>
  139 #include <sys/taskqueue.h>
  140 #include <sys/uio.h>
  141 #include <sys/jail.h>
  142 #include <sys/syslog.h>
  143 #include <netinet/in.h>
  144 
  145 #include <net/vnet.h>
  146 
  147 #include <security/mac/mac_framework.h>
  148 
  149 #include <vm/uma.h>
  150 
  151 #ifdef COMPAT_FREEBSD32
  152 #include <sys/mount.h>
  153 #include <sys/sysent.h>
  154 #include <compat/freebsd32/freebsd32.h>
  155 #endif
  156 
  157 static int      soreceive_rcvoob(struct socket *so, struct uio *uio,
  158                     int flags);
  159 static void     so_rdknl_lock(void *);
  160 static void     so_rdknl_unlock(void *);
  161 static void     so_rdknl_assert_locked(void *);
  162 static void     so_rdknl_assert_unlocked(void *);
  163 static void     so_wrknl_lock(void *);
  164 static void     so_wrknl_unlock(void *);
  165 static void     so_wrknl_assert_locked(void *);
  166 static void     so_wrknl_assert_unlocked(void *);
  167 
  168 static void     filt_sordetach(struct knote *kn);
  169 static int      filt_soread(struct knote *kn, long hint);
  170 static void     filt_sowdetach(struct knote *kn);
  171 static int      filt_sowrite(struct knote *kn, long hint);
  172 static int      filt_soempty(struct knote *kn, long hint);
  173 static int inline hhook_run_socket(struct socket *so, void *hctx, int32_t h_id);
  174 fo_kqfilter_t   soo_kqfilter;
  175 
  176 static struct filterops soread_filtops = {
  177         .f_isfd = 1,
  178         .f_detach = filt_sordetach,
  179         .f_event = filt_soread,
  180 };
  181 static struct filterops sowrite_filtops = {
  182         .f_isfd = 1,
  183         .f_detach = filt_sowdetach,
  184         .f_event = filt_sowrite,
  185 };
  186 static struct filterops soempty_filtops = {
  187         .f_isfd = 1,
  188         .f_detach = filt_sowdetach,
  189         .f_event = filt_soempty,
  190 };
  191 
  192 so_gen_t        so_gencnt;      /* generation count for sockets */
  193 
  194 MALLOC_DEFINE(M_SONAME, "soname", "socket name");
  195 MALLOC_DEFINE(M_PCB, "pcb", "protocol control block");
  196 
  197 #define VNET_SO_ASSERT(so)                                              \
  198         VNET_ASSERT(curvnet != NULL,                                    \
  199             ("%s:%d curvnet is NULL, so=%p", __func__, __LINE__, (so)));
  200 
  201 VNET_DEFINE(struct hhook_head *, socket_hhh[HHOOK_SOCKET_LAST + 1]);
  202 #define V_socket_hhh            VNET(socket_hhh)
  203 
  204 /*
  205  * Limit on the number of connections in the listen queue waiting
  206  * for accept(2).
  207  * NB: The original sysctl somaxconn is still available but hidden
  208  * to prevent confusion about the actual purpose of this number.
  209  */
  210 static u_int somaxconn = SOMAXCONN;
  211 
  212 static int
  213 sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
  214 {
  215         int error;
  216         int val;
  217 
  218         val = somaxconn;
  219         error = sysctl_handle_int(oidp, &val, 0, req);
  220         if (error || !req->newptr )
  221                 return (error);
  222 
  223         /*
  224          * The purpose of the UINT_MAX / 3 limit, is so that the formula
  225          *   3 * so_qlimit / 2
  226          * below, will not overflow.
  227          */
  228 
  229         if (val < 1 || val > UINT_MAX / 3)
  230                 return (EINVAL);
  231 
  232         somaxconn = val;
  233         return (0);
  234 }
  235 SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW,
  236     0, sizeof(int), sysctl_somaxconn, "I",
  237     "Maximum listen socket pending connection accept queue size");
  238 SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn,
  239     CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP,
  240     0, sizeof(int), sysctl_somaxconn, "I",
  241     "Maximum listen socket pending connection accept queue size (compat)");
  242 
  243 static int numopensockets;
  244 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
  245     &numopensockets, 0, "Number of open sockets");
  246 
  247 /*
  248  * accept_mtx locks down per-socket fields relating to accept queues.  See
  249  * socketvar.h for an annotation of the protected fields of struct socket.
  250  */
  251 struct mtx accept_mtx;
  252 MTX_SYSINIT(accept_mtx, &accept_mtx, "accept", MTX_DEF);
  253 
  254 /*
  255  * so_global_mtx protects so_gencnt, numopensockets, and the per-socket
  256  * so_gencnt field.
  257  */
  258 static struct mtx so_global_mtx;
  259 MTX_SYSINIT(so_global_mtx, &so_global_mtx, "so_glabel", MTX_DEF);
  260 
  261 /*
  262  * General IPC sysctl name space, used by sockets and a variety of other IPC
  263  * types.
  264  */
  265 SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, "IPC");
  266 
  267 /*
  268  * Initialize the socket subsystem and set up the socket
  269  * memory allocator.
  270  */
  271 static uma_zone_t socket_zone;
  272 int     maxsockets;
  273 
  274 static void
  275 socket_zone_change(void *tag)
  276 {
  277 
  278         maxsockets = uma_zone_set_max(socket_zone, maxsockets);
  279 }
  280 
  281 static void
  282 socket_hhook_register(int subtype)
  283 {
  284         
  285         if (hhook_head_register(HHOOK_TYPE_SOCKET, subtype,
  286             &V_socket_hhh[subtype],
  287             HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0)
  288                 printf("%s: WARNING: unable to register hook\n", __func__);
  289 }
  290 
  291 static void
  292 socket_hhook_deregister(int subtype)
  293 {
  294         
  295         if (hhook_head_deregister(V_socket_hhh[subtype]) != 0)
  296                 printf("%s: WARNING: unable to deregister hook\n", __func__);
  297 }
  298 
  299 static void
  300 socket_init(void *tag)
  301 {
  302 
  303         socket_zone = uma_zcreate("socket", sizeof(struct socket), NULL, NULL,
  304             NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
  305         maxsockets = uma_zone_set_max(socket_zone, maxsockets);
  306         uma_zone_set_warning(socket_zone, "kern.ipc.maxsockets limit reached");
  307         EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL,
  308             EVENTHANDLER_PRI_FIRST);
  309 }
  310 SYSINIT(socket, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY, socket_init, NULL);
  311 
  312 static void
  313 socket_vnet_init(const void *unused __unused)
  314 {
  315         int i;
  316 
  317         /* We expect a contiguous range */
  318         for (i = 0; i <= HHOOK_SOCKET_LAST; i++)
  319                 socket_hhook_register(i);
  320 }
  321 VNET_SYSINIT(socket_vnet_init, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY,
  322     socket_vnet_init, NULL);
  323 
  324 static void
  325 socket_vnet_uninit(const void *unused __unused)
  326 {
  327         int i;
  328 
  329         for (i = 0; i <= HHOOK_SOCKET_LAST; i++)
  330                 socket_hhook_deregister(i);
  331 }
  332 VNET_SYSUNINIT(socket_vnet_uninit, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY,
  333     socket_vnet_uninit, NULL);
  334 
  335 /*
  336  * Initialise maxsockets.  This SYSINIT must be run after
  337  * tunable_mbinit().
  338  */
  339 static void
  340 init_maxsockets(void *ignored)
  341 {
  342 
  343         TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets);
  344         maxsockets = imax(maxsockets, maxfiles);
  345 }
  346 SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL);
  347 
  348 /*
  349  * Sysctl to get and set the maximum global sockets limit.  Notify protocols
  350  * of the change so that they can update their dependent limits as required.
  351  */
  352 static int
  353 sysctl_maxsockets(SYSCTL_HANDLER_ARGS)
  354 {
  355         int error, newmaxsockets;
  356 
  357         newmaxsockets = maxsockets;
  358         error = sysctl_handle_int(oidp, &newmaxsockets, 0, req);
  359         if (error == 0 && req->newptr) {
  360                 if (newmaxsockets > maxsockets &&
  361                     newmaxsockets <= maxfiles) {
  362                         maxsockets = newmaxsockets;
  363                         EVENTHANDLER_INVOKE(maxsockets_change);
  364                 } else
  365                         error = EINVAL;
  366         }
  367         return (error);
  368 }
  369 SYSCTL_PROC(_kern_ipc, OID_AUTO, maxsockets, CTLTYPE_INT|CTLFLAG_RW,
  370     &maxsockets, 0, sysctl_maxsockets, "IU",
  371     "Maximum number of sockets available");
  372 
  373 /*
  374  * Socket operation routines.  These routines are called by the routines in
  375  * sys_socket.c or from a system process, and implement the semantics of
  376  * socket operations by switching out to the protocol specific routines.
  377  */
  378 
  379 /*
  380  * Get a socket structure from our zone, and initialize it.  Note that it
  381  * would probably be better to allocate socket and PCB at the same time, but
  382  * I'm not convinced that all the protocols can be easily modified to do
  383  * this.
  384  *
  385  * soalloc() returns a socket with a ref count of 0.
  386  */
  387 static struct socket *
  388 soalloc(struct vnet *vnet)
  389 {
  390         struct socket *so;
  391 
  392         so = uma_zalloc(socket_zone, M_NOWAIT | M_ZERO);
  393         if (so == NULL)
  394                 return (NULL);
  395 #ifdef MAC
  396         if (mac_socket_init(so, M_NOWAIT) != 0) {
  397                 uma_zfree(socket_zone, so);
  398                 return (NULL);
  399         }
  400 #endif
  401         if (khelp_init_osd(HELPER_CLASS_SOCKET, &so->osd)) {
  402                 uma_zfree(socket_zone, so);
  403                 return (NULL);
  404         }
  405 
  406         /*
  407          * The socket locking protocol allows to lock 2 sockets at a time,
  408          * however, the first one must be a listening socket.  WITNESS lacks
  409          * a feature to change class of an existing lock, so we use DUPOK.
  410          */
  411         mtx_init(&so->so_lock, "socket", NULL, MTX_DEF | MTX_DUPOK);
  412         SOCKBUF_LOCK_INIT(&so->so_snd, "so_snd");
  413         SOCKBUF_LOCK_INIT(&so->so_rcv, "so_rcv");
  414         so->so_rcv.sb_sel = &so->so_rdsel;
  415         so->so_snd.sb_sel = &so->so_wrsel;
  416         sx_init(&so->so_snd.sb_sx, "so_snd_sx");
  417         sx_init(&so->so_rcv.sb_sx, "so_rcv_sx");
  418         TAILQ_INIT(&so->so_snd.sb_aiojobq);
  419         TAILQ_INIT(&so->so_rcv.sb_aiojobq);
  420         TASK_INIT(&so->so_snd.sb_aiotask, 0, soaio_snd, so);
  421         TASK_INIT(&so->so_rcv.sb_aiotask, 0, soaio_rcv, so);
  422 #ifdef VIMAGE
  423         VNET_ASSERT(vnet != NULL, ("%s:%d vnet is NULL, so=%p",
  424             __func__, __LINE__, so));
  425         so->so_vnet = vnet;
  426 #endif
  427         /* We shouldn't need the so_global_mtx */
  428         if (hhook_run_socket(so, NULL, HHOOK_SOCKET_CREATE)) {
  429                 /* Do we need more comprehensive error returns? */
  430                 uma_zfree(socket_zone, so);
  431                 return (NULL);
  432         }
  433         mtx_lock(&so_global_mtx);
  434         so->so_gencnt = ++so_gencnt;
  435         ++numopensockets;
  436 #ifdef VIMAGE
  437         vnet->vnet_sockcnt++;
  438 #endif
  439         mtx_unlock(&so_global_mtx);
  440 
  441         return (so);
  442 }
  443 
  444 /*
  445  * Free the storage associated with a socket at the socket layer, tear down
  446  * locks, labels, etc.  All protocol state is assumed already to have been
  447  * torn down (and possibly never set up) by the caller.
  448  */
  449 static void
  450 sodealloc(struct socket *so)
  451 {
  452 
  453         KASSERT(so->so_count == 0, ("sodealloc(): so_count %d", so->so_count));
  454         KASSERT(so->so_pcb == NULL, ("sodealloc(): so_pcb != NULL"));
  455 
  456         mtx_lock(&so_global_mtx);
  457         so->so_gencnt = ++so_gencnt;
  458         --numopensockets;       /* Could be below, but faster here. */
  459 #ifdef VIMAGE
  460         VNET_ASSERT(so->so_vnet != NULL, ("%s:%d so_vnet is NULL, so=%p",
  461             __func__, __LINE__, so));
  462         so->so_vnet->vnet_sockcnt--;
  463 #endif
  464         mtx_unlock(&so_global_mtx);
  465 #ifdef MAC
  466         mac_socket_destroy(so);
  467 #endif
  468         hhook_run_socket(so, NULL, HHOOK_SOCKET_CLOSE);
  469 
  470         khelp_destroy_osd(&so->osd);
  471         if (SOLISTENING(so)) {
  472                 if (so->sol_accept_filter != NULL)
  473                         accept_filt_setopt(so, NULL);
  474         } else {
  475                 if (so->so_rcv.sb_hiwat)
  476                         (void)chgsbsize(so->so_cred->cr_uidinfo,
  477                             &so->so_rcv.sb_hiwat, 0, RLIM_INFINITY);
  478                 if (so->so_snd.sb_hiwat)
  479                         (void)chgsbsize(so->so_cred->cr_uidinfo,
  480                             &so->so_snd.sb_hiwat, 0, RLIM_INFINITY);
  481                 sx_destroy(&so->so_snd.sb_sx);
  482                 sx_destroy(&so->so_rcv.sb_sx);
  483                 SOCKBUF_LOCK_DESTROY(&so->so_snd);
  484                 SOCKBUF_LOCK_DESTROY(&so->so_rcv);
  485         }
  486         crfree(so->so_cred);
  487         mtx_destroy(&so->so_lock);
  488         uma_zfree(socket_zone, so);
  489 }
  490 
  491 /*
  492  * socreate returns a socket with a ref count of 1.  The socket should be
  493  * closed with soclose().
  494  */
  495 int
  496 socreate(int dom, struct socket **aso, int type, int proto,
  497     struct ucred *cred, struct thread *td)
  498 {
  499         struct protosw *prp;
  500         struct socket *so;
  501         int error;
  502 
  503         if (proto)
  504                 prp = pffindproto(dom, proto, type);
  505         else
  506                 prp = pffindtype(dom, type);
  507 
  508         if (prp == NULL) {
  509                 /* No support for domain. */
  510                 if (pffinddomain(dom) == NULL)
  511                         return (EAFNOSUPPORT);
  512                 /* No support for socket type. */
  513                 if (proto == 0 && type != 0)
  514                         return (EPROTOTYPE);
  515                 return (EPROTONOSUPPORT);
  516         }
  517         if (prp->pr_usrreqs->pru_attach == NULL ||
  518             prp->pr_usrreqs->pru_attach == pru_attach_notsupp)
  519                 return (EPROTONOSUPPORT);
  520 
  521         if (prison_check_af(cred, prp->pr_domain->dom_family) != 0)
  522                 return (EPROTONOSUPPORT);
  523 
  524         if (prp->pr_type != type)
  525                 return (EPROTOTYPE);
  526         so = soalloc(CRED_TO_VNET(cred));
  527         if (so == NULL)
  528                 return (ENOBUFS);
  529 
  530         so->so_type = type;
  531         so->so_cred = crhold(cred);
  532         if ((prp->pr_domain->dom_family == PF_INET) ||
  533             (prp->pr_domain->dom_family == PF_INET6) ||
  534             (prp->pr_domain->dom_family == PF_ROUTE))
  535                 so->so_fibnum = td->td_proc->p_fibnum;
  536         else
  537                 so->so_fibnum = 0;
  538         so->so_proto = prp;
  539 #ifdef MAC
  540         mac_socket_create(cred, so);
  541 #endif
  542         knlist_init(&so->so_rdsel.si_note, so, so_rdknl_lock, so_rdknl_unlock,
  543             so_rdknl_assert_locked, so_rdknl_assert_unlocked);
  544         knlist_init(&so->so_wrsel.si_note, so, so_wrknl_lock, so_wrknl_unlock,
  545             so_wrknl_assert_locked, so_wrknl_assert_unlocked);
  546         /*
  547          * Auto-sizing of socket buffers is managed by the protocols and
  548          * the appropriate flags must be set in the pru_attach function.
  549          */
  550         CURVNET_SET(so->so_vnet);
  551         error = (*prp->pr_usrreqs->pru_attach)(so, proto, td);
  552         CURVNET_RESTORE();
  553         if (error) {
  554                 sodealloc(so);
  555                 return (error);
  556         }
  557         soref(so);
  558         *aso = so;
  559         return (0);
  560 }
  561 
  562 #ifdef REGRESSION
  563 static int regression_sonewconn_earlytest = 1;
  564 SYSCTL_INT(_regression, OID_AUTO, sonewconn_earlytest, CTLFLAG_RW,
  565     &regression_sonewconn_earlytest, 0, "Perform early sonewconn limit test");
  566 #endif
  567 
  568 /*
  569  * When an attempt at a new connection is noted on a socket which accepts
  570  * connections, sonewconn is called.  If the connection is possible (subject
  571  * to space constraints, etc.) then we allocate a new structure, properly
  572  * linked into the data structure of the original socket, and return this.
  573  * Connstatus may be 0, or SS_ISCONFIRMING, or SS_ISCONNECTED.
  574  *
  575  * Note: the ref count on the socket is 0 on return.
  576  */
  577 struct socket *
  578 sonewconn(struct socket *head, int connstatus)
  579 {
  580         static struct timeval lastover;
  581         static struct timeval overinterval = { 60, 0 };
  582         static int overcount;
  583 
  584         struct socket *so;
  585         u_int over;
  586 
  587         SOLISTEN_LOCK(head);
  588         over = (head->sol_qlen > 3 * head->sol_qlimit / 2);
  589         SOLISTEN_UNLOCK(head);
  590 #ifdef REGRESSION
  591         if (regression_sonewconn_earlytest && over) {
  592 #else
  593         if (over) {
  594 #endif
  595                 overcount++;
  596 
  597                 if (ratecheck(&lastover, &overinterval)) {
  598                         log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow: "
  599                             "%i already in queue awaiting acceptance "
  600                             "(%d occurrences)\n",
  601                             __func__, head->so_pcb, head->sol_qlen, overcount);
  602 
  603                         overcount = 0;
  604                 }
  605 
  606                 return (NULL);
  607         }
  608         VNET_ASSERT(head->so_vnet != NULL, ("%s: so %p vnet is NULL",
  609             __func__, head));
  610         so = soalloc(head->so_vnet);
  611         if (so == NULL) {
  612                 log(LOG_DEBUG, "%s: pcb %p: New socket allocation failure: "
  613                     "limit reached or out of memory\n",
  614                     __func__, head->so_pcb);
  615                 return (NULL);
  616         }
  617         so->so_listen = head;
  618         so->so_type = head->so_type;
  619         so->so_options = head->so_options & ~SO_ACCEPTCONN;
  620         so->so_linger = head->so_linger;
  621         so->so_state = head->so_state | SS_NOFDREF;
  622         so->so_fibnum = head->so_fibnum;
  623         so->so_proto = head->so_proto;
  624         so->so_cred = crhold(head->so_cred);
  625 #ifdef MAC
  626         mac_socket_newconn(head, so);
  627 #endif
  628         knlist_init(&so->so_rdsel.si_note, so, so_rdknl_lock, so_rdknl_unlock,
  629             so_rdknl_assert_locked, so_rdknl_assert_unlocked);
  630         knlist_init(&so->so_wrsel.si_note, so, so_wrknl_lock, so_wrknl_unlock,
  631             so_wrknl_assert_locked, so_wrknl_assert_unlocked);
  632         VNET_SO_ASSERT(head);
  633         if (soreserve(so, head->sol_sbsnd_hiwat, head->sol_sbrcv_hiwat)) {
  634                 sodealloc(so);
  635                 log(LOG_DEBUG, "%s: pcb %p: soreserve() failed\n",
  636                     __func__, head->so_pcb);
  637                 return (NULL);
  638         }
  639         if ((*so->so_proto->pr_usrreqs->pru_attach)(so, 0, NULL)) {
  640                 sodealloc(so);
  641                 log(LOG_DEBUG, "%s: pcb %p: pru_attach() failed\n",
  642                     __func__, head->so_pcb);
  643                 return (NULL);
  644         }
  645         so->so_rcv.sb_lowat = head->sol_sbrcv_lowat;
  646         so->so_snd.sb_lowat = head->sol_sbsnd_lowat;
  647         so->so_rcv.sb_timeo = head->sol_sbrcv_timeo;
  648         so->so_snd.sb_timeo = head->sol_sbsnd_timeo;
  649         so->so_rcv.sb_flags |= head->sol_sbrcv_flags & SB_AUTOSIZE;
  650         so->so_snd.sb_flags |= head->sol_sbsnd_flags & SB_AUTOSIZE;
  651 
  652         SOLISTEN_LOCK(head);
  653         if (head->sol_accept_filter != NULL)
  654                 connstatus = 0;
  655         so->so_state |= connstatus;
  656         soref(head); /* A socket on (in)complete queue refs head. */
  657         if (connstatus) {
  658                 TAILQ_INSERT_TAIL(&head->sol_comp, so, so_list);
  659                 so->so_qstate = SQ_COMP;
  660                 head->sol_qlen++;
  661                 solisten_wakeup(head);  /* unlocks */
  662         } else {
  663                 /*
  664                  * Keep removing sockets from the head until there's room for
  665                  * us to insert on the tail.  In pre-locking revisions, this
  666                  * was a simple if(), but as we could be racing with other
  667                  * threads and soabort() requires dropping locks, we must
  668                  * loop waiting for the condition to be true.
  669                  */
  670                 while (head->sol_incqlen > head->sol_qlimit) {
  671                         struct socket *sp;
  672 
  673                         sp = TAILQ_FIRST(&head->sol_incomp);
  674                         TAILQ_REMOVE(&head->sol_incomp, sp, so_list);
  675                         head->sol_incqlen--;
  676                         SOCK_LOCK(sp);
  677                         sp->so_qstate = SQ_NONE;
  678                         sp->so_listen = NULL;
  679                         SOCK_UNLOCK(sp);
  680                         sorele(head);   /* does SOLISTEN_UNLOCK, head stays */
  681                         soabort(sp);
  682                         SOLISTEN_LOCK(head);
  683                 }
  684                 TAILQ_INSERT_TAIL(&head->sol_incomp, so, so_list);
  685                 so->so_qstate = SQ_INCOMP;
  686                 head->sol_incqlen++;
  687                 SOLISTEN_UNLOCK(head);
  688         }
  689         return (so);
  690 }
  691 
  692 #if defined(SCTP) || defined(SCTP_SUPPORT)
  693 /*
  694  * Socket part of sctp_peeloff().  Detach a new socket from an
  695  * association.  The new socket is returned with a reference.
  696  */
  697 struct socket *
  698 sopeeloff(struct socket *head)
  699 {
  700         struct socket *so;
  701 
  702         VNET_ASSERT(head->so_vnet != NULL, ("%s:%d so_vnet is NULL, head=%p",
  703             __func__, __LINE__, head));
  704         so = soalloc(head->so_vnet);
  705         if (so == NULL) {
  706                 log(LOG_DEBUG, "%s: pcb %p: New socket allocation failure: "
  707                     "limit reached or out of memory\n",
  708                     __func__, head->so_pcb);
  709                 return (NULL);
  710         }
  711         so->so_type = head->so_type;
  712         so->so_options = head->so_options;
  713         so->so_linger = head->so_linger;
  714         so->so_state = (head->so_state & SS_NBIO) | SS_ISCONNECTED;
  715         so->so_fibnum = head->so_fibnum;
  716         so->so_proto = head->so_proto;
  717         so->so_cred = crhold(head->so_cred);
  718 #ifdef MAC
  719         mac_socket_newconn(head, so);
  720 #endif
  721         knlist_init(&so->so_rdsel.si_note, so, so_rdknl_lock, so_rdknl_unlock,
  722             so_rdknl_assert_locked, so_rdknl_assert_unlocked);
  723         knlist_init(&so->so_wrsel.si_note, so, so_wrknl_lock, so_wrknl_unlock,
  724             so_wrknl_assert_locked, so_wrknl_assert_unlocked);
  725         VNET_SO_ASSERT(head);
  726         if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat)) {
  727                 sodealloc(so);
  728                 log(LOG_DEBUG, "%s: pcb %p: soreserve() failed\n",
  729                     __func__, head->so_pcb);
  730                 return (NULL);
  731         }
  732         if ((*so->so_proto->pr_usrreqs->pru_attach)(so, 0, NULL)) {
  733                 sodealloc(so);
  734                 log(LOG_DEBUG, "%s: pcb %p: pru_attach() failed\n",
  735                     __func__, head->so_pcb);
  736                 return (NULL);
  737         }
  738         so->so_rcv.sb_lowat = head->so_rcv.sb_lowat;
  739         so->so_snd.sb_lowat = head->so_snd.sb_lowat;
  740         so->so_rcv.sb_timeo = head->so_rcv.sb_timeo;
  741         so->so_snd.sb_timeo = head->so_snd.sb_timeo;
  742         so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
  743         so->so_snd.sb_flags |= head->so_snd.sb_flags & SB_AUTOSIZE;
  744 
  745         soref(so);
  746 
  747         return (so);
  748 }
  749 #endif  /* SCTP */
  750 
  751 int
  752 sobind(struct socket *so, struct sockaddr *nam, struct thread *td)
  753 {
  754         int error;
  755 
  756         CURVNET_SET(so->so_vnet);
  757         error = (*so->so_proto->pr_usrreqs->pru_bind)(so, nam, td);
  758         CURVNET_RESTORE();
  759         return (error);
  760 }
  761 
  762 int
  763 sobindat(int fd, struct socket *so, struct sockaddr *nam, struct thread *td)
  764 {
  765         int error;
  766 
  767         CURVNET_SET(so->so_vnet);
  768         error = (*so->so_proto->pr_usrreqs->pru_bindat)(fd, so, nam, td);
  769         CURVNET_RESTORE();
  770         return (error);
  771 }
  772 
  773 /*
  774  * solisten() transitions a socket from a non-listening state to a listening
  775  * state, but can also be used to update the listen queue depth on an
  776  * existing listen socket.  The protocol will call back into the sockets
  777  * layer using solisten_proto_check() and solisten_proto() to check and set
  778  * socket-layer listen state.  Call backs are used so that the protocol can
  779  * acquire both protocol and socket layer locks in whatever order is required
  780  * by the protocol.
  781  *
  782  * Protocol implementors are advised to hold the socket lock across the
  783  * socket-layer test and set to avoid races at the socket layer.
  784  */
  785 int
  786 solisten(struct socket *so, int backlog, struct thread *td)
  787 {
  788         int error;
  789 
  790         CURVNET_SET(so->so_vnet);
  791         error = (*so->so_proto->pr_usrreqs->pru_listen)(so, backlog, td);
  792         CURVNET_RESTORE();
  793         return (error);
  794 }
  795 
  796 int
  797 solisten_proto_check(struct socket *so)
  798 {
  799 
  800         SOCK_LOCK_ASSERT(so);
  801 
  802         if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING |
  803             SS_ISDISCONNECTING))
  804                 return (EINVAL);
  805         return (0);
  806 }
  807 
  808 void
  809 solisten_proto(struct socket *so, int backlog)
  810 {
  811         int sbrcv_lowat, sbsnd_lowat;
  812         u_int sbrcv_hiwat, sbsnd_hiwat;
  813         short sbrcv_flags, sbsnd_flags;
  814         sbintime_t sbrcv_timeo, sbsnd_timeo;
  815 
  816         SOCK_LOCK_ASSERT(so);
  817 
  818         if (SOLISTENING(so))
  819                 goto listening;
  820 
  821         /*
  822          * Change this socket to listening state.
  823          */
  824         sbrcv_lowat = so->so_rcv.sb_lowat;
  825         sbsnd_lowat = so->so_snd.sb_lowat;
  826         sbrcv_hiwat = so->so_rcv.sb_hiwat;
  827         sbsnd_hiwat = so->so_snd.sb_hiwat;
  828         sbrcv_flags = so->so_rcv.sb_flags;
  829         sbsnd_flags = so->so_snd.sb_flags;
  830         sbrcv_timeo = so->so_rcv.sb_timeo;
  831         sbsnd_timeo = so->so_snd.sb_timeo;
  832 
  833         sbdestroy(&so->so_snd, so);
  834         sbdestroy(&so->so_rcv, so);
  835         sx_destroy(&so->so_snd.sb_sx);
  836         sx_destroy(&so->so_rcv.sb_sx);
  837         SOCKBUF_LOCK_DESTROY(&so->so_snd);
  838         SOCKBUF_LOCK_DESTROY(&so->so_rcv);
  839 
  840 #ifdef INVARIANTS
  841         bzero(&so->so_rcv,
  842             sizeof(struct socket) - offsetof(struct socket, so_rcv));
  843 #endif
  844 
  845         so->sol_sbrcv_lowat = sbrcv_lowat;
  846         so->sol_sbsnd_lowat = sbsnd_lowat;
  847         so->sol_sbrcv_hiwat = sbrcv_hiwat;
  848         so->sol_sbsnd_hiwat = sbsnd_hiwat;
  849         so->sol_sbrcv_flags = sbrcv_flags;
  850         so->sol_sbsnd_flags = sbsnd_flags;
  851         so->sol_sbrcv_timeo = sbrcv_timeo;
  852         so->sol_sbsnd_timeo = sbsnd_timeo;
  853 
  854         so->sol_qlen = so->sol_incqlen = 0;
  855         TAILQ_INIT(&so->sol_incomp);
  856         TAILQ_INIT(&so->sol_comp);
  857 
  858         so->sol_accept_filter = NULL;
  859         so->sol_accept_filter_arg = NULL;
  860         so->sol_accept_filter_str = NULL;
  861 
  862         so->sol_upcall = NULL;
  863         so->sol_upcallarg = NULL;
  864 
  865         so->so_options |= SO_ACCEPTCONN;
  866 
  867 listening:
  868         if (backlog < 0 || backlog > somaxconn)
  869                 backlog = somaxconn;
  870         so->sol_qlimit = backlog;
  871 }
  872 
  873 /*
  874  * Wakeup listeners/subsystems once we have a complete connection.
  875  * Enters with lock, returns unlocked.
  876  */
  877 void
  878 solisten_wakeup(struct socket *sol)
  879 {
  880 
  881         if (sol->sol_upcall != NULL)
  882                 (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
  883         else {
  884                 selwakeuppri(&sol->so_rdsel, PSOCK);
  885                 KNOTE_LOCKED(&sol->so_rdsel.si_note, 0);
  886         }
  887         SOLISTEN_UNLOCK(sol);
  888         wakeup_one(&sol->sol_comp);
  889         if ((sol->so_state & SS_ASYNC) && sol->so_sigio != NULL)
  890                 pgsigio(&sol->so_sigio, SIGIO, 0);
  891 }
  892 
  893 /*
  894  * Return single connection off a listening socket queue.  Main consumer of
  895  * the function is kern_accept4().  Some modules, that do their own accept
  896  * management also use the function.
  897  *
  898  * Listening socket must be locked on entry and is returned unlocked on
  899  * return.
  900  * The flags argument is set of accept4(2) flags and ACCEPT4_INHERIT.
  901  */
  902 int
  903 solisten_dequeue(struct socket *head, struct socket **ret, int flags)
  904 {
  905         struct socket *so;
  906         int error;
  907 
  908         SOLISTEN_LOCK_ASSERT(head);
  909 
  910         while (!(head->so_state & SS_NBIO) && TAILQ_EMPTY(&head->sol_comp) &&
  911             head->so_error == 0) {
  912                 error = msleep(&head->sol_comp, &head->so_lock, PSOCK | PCATCH,
  913                     "accept", 0);
  914                 if (error != 0) {
  915                         SOLISTEN_UNLOCK(head);
  916                         return (error);
  917                 }
  918         }
  919         if (head->so_error) {
  920                 error = head->so_error;
  921                 head->so_error = 0;
  922         } else if ((head->so_state & SS_NBIO) && TAILQ_EMPTY(&head->sol_comp))
  923                 error = EWOULDBLOCK;
  924         else
  925                 error = 0;
  926         if (error) {
  927                 SOLISTEN_UNLOCK(head);
  928                 return (error);
  929         }
  930         so = TAILQ_FIRST(&head->sol_comp);
  931         SOCK_LOCK(so);
  932         KASSERT(so->so_qstate == SQ_COMP,
  933             ("%s: so %p not SQ_COMP", __func__, so));
  934         soref(so);
  935         head->sol_qlen--;
  936         so->so_qstate = SQ_NONE;
  937         so->so_listen = NULL;
  938         TAILQ_REMOVE(&head->sol_comp, so, so_list);
  939         if (flags & ACCEPT4_INHERIT)
  940                 so->so_state |= (head->so_state & SS_NBIO);
  941         else
  942                 so->so_state |= (flags & SOCK_NONBLOCK) ? SS_NBIO : 0;
  943         SOCK_UNLOCK(so);
  944         sorele(head);
  945 
  946         *ret = so;
  947         return (0);
  948 }
  949 
  950 /*
  951  * Evaluate the reference count and named references on a socket; if no
  952  * references remain, free it.  This should be called whenever a reference is
  953  * released, such as in sorele(), but also when named reference flags are
  954  * cleared in socket or protocol code.
  955  *
  956  * sofree() will free the socket if:
  957  *
  958  * - There are no outstanding file descriptor references or related consumers
  959  *   (so_count == 0).
  960  *
  961  * - The socket has been closed by user space, if ever open (SS_NOFDREF).
  962  *
  963  * - The protocol does not have an outstanding strong reference on the socket
  964  *   (SS_PROTOREF).
  965  *
  966  * - The socket is not in a completed connection queue, so a process has been
  967  *   notified that it is present.  If it is removed, the user process may
  968  *   block in accept() despite select() saying the socket was ready.
  969  */
  970 void
  971 sofree(struct socket *so)
  972 {
  973         struct protosw *pr = so->so_proto;
  974 
  975         SOCK_LOCK_ASSERT(so);
  976 
  977         if ((so->so_state & SS_NOFDREF) == 0 || so->so_count != 0 ||
  978             (so->so_state & SS_PROTOREF) || (so->so_qstate == SQ_COMP)) {
  979                 SOCK_UNLOCK(so);
  980                 return;
  981         }
  982 
  983         if (!SOLISTENING(so) && so->so_qstate == SQ_INCOMP) {
  984                 struct socket *sol;
  985 
  986                 sol = so->so_listen;
  987                 KASSERT(sol, ("%s: so %p on incomp of NULL", __func__, so));
  988 
  989                 /*
  990                  * To solve race between close of a listening socket and
  991                  * a socket on its incomplete queue, we need to lock both.
  992                  * The order is first listening socket, then regular.
  993                  * Since we don't have SS_NOFDREF neither SS_PROTOREF, this
  994                  * function and the listening socket are the only pointers
  995                  * to so.  To preserve so and sol, we reference both and then
  996                  * relock.
  997                  * After relock the socket may not move to so_comp since it
  998                  * doesn't have PCB already, but it may be removed from
  999                  * so_incomp. If that happens, we share responsiblity on
 1000                  * freeing the socket, but soclose() has already removed
 1001                  * it from queue.
 1002                  */
 1003                 soref(sol);
 1004                 soref(so);
 1005                 SOCK_UNLOCK(so);
 1006                 SOLISTEN_LOCK(sol);
 1007                 SOCK_LOCK(so);
 1008                 if (so->so_qstate == SQ_INCOMP) {
 1009                         KASSERT(so->so_listen == sol,
 1010                             ("%s: so %p migrated out of sol %p",
 1011                             __func__, so, sol));
 1012                         TAILQ_REMOVE(&sol->sol_incomp, so, so_list);
 1013                         sol->sol_incqlen--;
 1014                         /* This is guarenteed not to be the last. */
 1015                         refcount_release(&sol->so_count);
 1016                         so->so_qstate = SQ_NONE;
 1017                         so->so_listen = NULL;
 1018                 } else
 1019                         KASSERT(so->so_listen == NULL,
 1020                             ("%s: so %p not on (in)comp with so_listen",
 1021                             __func__, so));
 1022                 sorele(sol);
 1023                 KASSERT(so->so_count == 1,
 1024                     ("%s: so %p count %u", __func__, so, so->so_count));
 1025                 so->so_count = 0;
 1026         }
 1027         if (SOLISTENING(so))
 1028                 so->so_error = ECONNABORTED;
 1029         SOCK_UNLOCK(so);
 1030 
 1031         if (so->so_dtor != NULL)
 1032                 so->so_dtor(so);
 1033 
 1034         VNET_SO_ASSERT(so);
 1035         if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL)
 1036                 (*pr->pr_domain->dom_dispose)(so);
 1037         if (pr->pr_usrreqs->pru_detach != NULL)
 1038                 (*pr->pr_usrreqs->pru_detach)(so);
 1039 
 1040         /*
 1041          * From this point on, we assume that no other references to this
 1042          * socket exist anywhere else in the stack.  Therefore, no locks need
 1043          * to be acquired or held.
 1044          *
 1045          * We used to do a lot of socket buffer and socket locking here, as
 1046          * well as invoke sorflush() and perform wakeups.  The direct call to
 1047          * dom_dispose() and sbrelease_internal() are an inlining of what was
 1048          * necessary from sorflush().
 1049          *
 1050          * Notice that the socket buffer and kqueue state are torn down
 1051          * before calling pru_detach.  This means that protocols shold not
 1052          * assume they can perform socket wakeups, etc, in their detach code.
 1053          */
 1054         if (!SOLISTENING(so)) {
 1055                 sbdestroy(&so->so_snd, so);
 1056                 sbdestroy(&so->so_rcv, so);
 1057         }
 1058         seldrain(&so->so_rdsel);
 1059         seldrain(&so->so_wrsel);
 1060         knlist_destroy(&so->so_rdsel.si_note);
 1061         knlist_destroy(&so->so_wrsel.si_note);
 1062         sodealloc(so);
 1063 }
 1064 
 1065 /*
 1066  * Close a socket on last file table reference removal.  Initiate disconnect
 1067  * if connected.  Free socket when disconnect complete.
 1068  *
 1069  * This function will sorele() the socket.  Note that soclose() may be called
 1070  * prior to the ref count reaching zero.  The actual socket structure will
 1071  * not be freed until the ref count reaches zero.
 1072  */
 1073 int
 1074 soclose(struct socket *so)
 1075 {
 1076         struct accept_queue lqueue;
 1077         bool listening;
 1078         int error = 0;
 1079 
 1080         KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter"));
 1081 
 1082         CURVNET_SET(so->so_vnet);
 1083         funsetown(&so->so_sigio);
 1084         if (so->so_state & SS_ISCONNECTED) {
 1085                 if ((so->so_state & SS_ISDISCONNECTING) == 0) {
 1086                         error = sodisconnect(so);
 1087                         if (error) {
 1088                                 if (error == ENOTCONN)
 1089                                         error = 0;
 1090                                 goto drop;
 1091                         }
 1092                 }
 1093 
 1094                 if ((so->so_options & SO_LINGER) != 0 && so->so_linger != 0) {
 1095                         if ((so->so_state & SS_ISDISCONNECTING) &&
 1096                             (so->so_state & SS_NBIO))
 1097                                 goto drop;
 1098                         while (so->so_state & SS_ISCONNECTED) {
 1099                                 error = tsleep(&so->so_timeo,
 1100                                     PSOCK | PCATCH, "soclos",
 1101                                     so->so_linger * hz);
 1102                                 if (error)
 1103                                         break;
 1104                         }
 1105                 }
 1106         }
 1107 
 1108 drop:
 1109         if (so->so_proto->pr_usrreqs->pru_close != NULL)
 1110                 (*so->so_proto->pr_usrreqs->pru_close)(so);
 1111 
 1112         SOCK_LOCK(so);
 1113         if ((listening = (so->so_options & SO_ACCEPTCONN))) {
 1114                 struct socket *sp;
 1115 
 1116                 TAILQ_INIT(&lqueue);
 1117                 TAILQ_SWAP(&lqueue, &so->sol_incomp, socket, so_list);
 1118                 TAILQ_CONCAT(&lqueue, &so->sol_comp, so_list);
 1119 
 1120                 so->sol_qlen = so->sol_incqlen = 0;
 1121 
 1122                 TAILQ_FOREACH(sp, &lqueue, so_list) {
 1123                         SOCK_LOCK(sp);
 1124                         sp->so_qstate = SQ_NONE;
 1125                         sp->so_listen = NULL;
 1126                         SOCK_UNLOCK(sp);
 1127                         /* Guaranteed not to be the last. */
 1128                         refcount_release(&so->so_count);
 1129                 }
 1130         }
 1131         KASSERT((so->so_state & SS_NOFDREF) == 0, ("soclose: NOFDREF"));
 1132         so->so_state |= SS_NOFDREF;
 1133         sorele(so);
 1134         if (listening) {
 1135                 struct socket *sp, *tsp;
 1136 
 1137                 TAILQ_FOREACH_SAFE(sp, &lqueue, so_list, tsp) {
 1138                         SOCK_LOCK(sp);
 1139                         if (sp->so_count == 0) {
 1140                                 SOCK_UNLOCK(sp);
 1141                                 soabort(sp);
 1142                         } else
 1143                                 /* sp is now in sofree() */
 1144                                 SOCK_UNLOCK(sp);
 1145                 }
 1146         }
 1147         CURVNET_RESTORE();
 1148         return (error);
 1149 }
 1150 
 1151 /*
 1152  * soabort() is used to abruptly tear down a connection, such as when a
 1153  * resource limit is reached (listen queue depth exceeded), or if a listen
 1154  * socket is closed while there are sockets waiting to be accepted.
 1155  *
 1156  * This interface is tricky, because it is called on an unreferenced socket,
 1157  * and must be called only by a thread that has actually removed the socket
 1158  * from the listen queue it was on, or races with other threads are risked.
 1159  *
 1160  * This interface will call into the protocol code, so must not be called
 1161  * with any socket locks held.  Protocols do call it while holding their own
 1162  * recursible protocol mutexes, but this is something that should be subject
 1163  * to review in the future.
 1164  */
 1165 void
 1166 soabort(struct socket *so)
 1167 {
 1168 
 1169         /*
 1170          * In as much as is possible, assert that no references to this
 1171          * socket are held.  This is not quite the same as asserting that the
 1172          * current thread is responsible for arranging for no references, but
 1173          * is as close as we can get for now.
 1174          */
 1175         KASSERT(so->so_count == 0, ("soabort: so_count"));
 1176         KASSERT((so->so_state & SS_PROTOREF) == 0, ("soabort: SS_PROTOREF"));
 1177         KASSERT(so->so_state & SS_NOFDREF, ("soabort: !SS_NOFDREF"));
 1178         VNET_SO_ASSERT(so);
 1179 
 1180         if (so->so_proto->pr_usrreqs->pru_abort != NULL)
 1181                 (*so->so_proto->pr_usrreqs->pru_abort)(so);
 1182         SOCK_LOCK(so);
 1183         sofree(so);
 1184 }
 1185 
 1186 int
 1187 soaccept(struct socket *so, struct sockaddr **nam)
 1188 {
 1189         int error;
 1190 
 1191         SOCK_LOCK(so);
 1192         KASSERT((so->so_state & SS_NOFDREF) != 0, ("soaccept: !NOFDREF"));
 1193         so->so_state &= ~SS_NOFDREF;
 1194         SOCK_UNLOCK(so);
 1195 
 1196         CURVNET_SET(so->so_vnet);
 1197         error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam);
 1198         CURVNET_RESTORE();
 1199         return (error);
 1200 }
 1201 
 1202 int
 1203 soconnect(struct socket *so, struct sockaddr *nam, struct thread *td)
 1204 {
 1205 
 1206         return (soconnectat(AT_FDCWD, so, nam, td));
 1207 }
 1208 
 1209 int
 1210 soconnectat(int fd, struct socket *so, struct sockaddr *nam, struct thread *td)
 1211 {
 1212         int error;
 1213 
 1214         if (so->so_options & SO_ACCEPTCONN)
 1215                 return (EOPNOTSUPP);
 1216 
 1217         CURVNET_SET(so->so_vnet);
 1218         /*
 1219          * If protocol is connection-based, can only connect once.
 1220          * Otherwise, if connected, try to disconnect first.  This allows
 1221          * user to disconnect by connecting to, e.g., a null address.
 1222          */
 1223         if (so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING) &&
 1224             ((so->so_proto->pr_flags & PR_CONNREQUIRED) ||
 1225             (error = sodisconnect(so)))) {
 1226                 error = EISCONN;
 1227         } else {
 1228                 /*
 1229                  * Prevent accumulated error from previous connection from
 1230                  * biting us.
 1231                  */
 1232                 so->so_error = 0;
 1233                 if (fd == AT_FDCWD) {
 1234                         error = (*so->so_proto->pr_usrreqs->pru_connect)(so,
 1235                             nam, td);
 1236                 } else {
 1237                         error = (*so->so_proto->pr_usrreqs->pru_connectat)(fd,
 1238                             so, nam, td);
 1239                 }
 1240         }
 1241         CURVNET_RESTORE();
 1242 
 1243         return (error);
 1244 }
 1245 
 1246 int
 1247 soconnect2(struct socket *so1, struct socket *so2)
 1248 {
 1249         int error;
 1250 
 1251         CURVNET_SET(so1->so_vnet);
 1252         error = (*so1->so_proto->pr_usrreqs->pru_connect2)(so1, so2);
 1253         CURVNET_RESTORE();
 1254         return (error);
 1255 }
 1256 
 1257 int
 1258 sodisconnect(struct socket *so)
 1259 {
 1260         int error;
 1261 
 1262         if ((so->so_state & SS_ISCONNECTED) == 0)
 1263                 return (ENOTCONN);
 1264         if (so->so_state & SS_ISDISCONNECTING)
 1265                 return (EALREADY);
 1266         VNET_SO_ASSERT(so);
 1267         error = (*so->so_proto->pr_usrreqs->pru_disconnect)(so);
 1268         return (error);
 1269 }
 1270 
 1271 #define SBLOCKWAIT(f)   (((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT)
 1272 
 1273 int
 1274 sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
 1275     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 1276 {
 1277         long space;
 1278         ssize_t resid;
 1279         int clen = 0, error, dontroute;
 1280 
 1281         KASSERT(so->so_type == SOCK_DGRAM, ("sosend_dgram: !SOCK_DGRAM"));
 1282         KASSERT(so->so_proto->pr_flags & PR_ATOMIC,
 1283             ("sosend_dgram: !PR_ATOMIC"));
 1284 
 1285         if (uio != NULL)
 1286                 resid = uio->uio_resid;
 1287         else
 1288                 resid = top->m_pkthdr.len;
 1289         /*
 1290          * In theory resid should be unsigned.  However, space must be
 1291          * signed, as it might be less than 0 if we over-committed, and we
 1292          * must use a signed comparison of space and resid.  On the other
 1293          * hand, a negative resid causes us to loop sending 0-length
 1294          * segments to the protocol.
 1295          */
 1296         if (resid < 0) {
 1297                 error = EINVAL;
 1298                 goto out;
 1299         }
 1300 
 1301         dontroute =
 1302             (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0;
 1303         if (td != NULL)
 1304                 td->td_ru.ru_msgsnd++;
 1305         if (control != NULL)
 1306                 clen = control->m_len;
 1307 
 1308         SOCKBUF_LOCK(&so->so_snd);
 1309         if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
 1310                 SOCKBUF_UNLOCK(&so->so_snd);
 1311                 error = EPIPE;
 1312                 goto out;
 1313         }
 1314         if (so->so_error) {
 1315                 error = so->so_error;
 1316                 so->so_error = 0;
 1317                 SOCKBUF_UNLOCK(&so->so_snd);
 1318                 goto out;
 1319         }
 1320         if ((so->so_state & SS_ISCONNECTED) == 0) {
 1321                 /*
 1322                  * `sendto' and `sendmsg' is allowed on a connection-based
 1323                  * socket if it supports implied connect.  Return ENOTCONN if
 1324                  * not connected and no address is supplied.
 1325                  */
 1326                 if ((so->so_proto->pr_flags & PR_CONNREQUIRED) &&
 1327                     (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) {
 1328                         if ((so->so_state & SS_ISCONFIRMING) == 0 &&
 1329                             !(resid == 0 && clen != 0)) {
 1330                                 SOCKBUF_UNLOCK(&so->so_snd);
 1331                                 error = ENOTCONN;
 1332                                 goto out;
 1333                         }
 1334                 } else if (addr == NULL) {
 1335                         if (so->so_proto->pr_flags & PR_CONNREQUIRED)
 1336                                 error = ENOTCONN;
 1337                         else
 1338                                 error = EDESTADDRREQ;
 1339                         SOCKBUF_UNLOCK(&so->so_snd);
 1340                         goto out;
 1341                 }
 1342         }
 1343 
 1344         /*
 1345          * Do we need MSG_OOB support in SOCK_DGRAM?  Signs here may be a
 1346          * problem and need fixing.
 1347          */
 1348         space = sbspace(&so->so_snd);
 1349         if (flags & MSG_OOB)
 1350                 space += 1024;
 1351         space -= clen;
 1352         SOCKBUF_UNLOCK(&so->so_snd);
 1353         if (resid > space) {
 1354                 error = EMSGSIZE;
 1355                 goto out;
 1356         }
 1357         if (uio == NULL) {
 1358                 resid = 0;
 1359                 if (flags & MSG_EOR)
 1360                         top->m_flags |= M_EOR;
 1361         } else {
 1362                 /*
 1363                  * Copy the data from userland into a mbuf chain.
 1364                  * If no data is to be copied in, a single empty mbuf
 1365                  * is returned.
 1366                  */
 1367                 top = m_uiotombuf(uio, M_WAITOK, space, max_hdr,
 1368                     (M_PKTHDR | ((flags & MSG_EOR) ? M_EOR : 0)));
 1369                 if (top == NULL) {
 1370                         error = EFAULT; /* only possible error */
 1371                         goto out;
 1372                 }
 1373                 space -= resid - uio->uio_resid;
 1374                 resid = uio->uio_resid;
 1375         }
 1376         KASSERT(resid == 0, ("sosend_dgram: resid != 0"));
 1377         /*
 1378          * XXXRW: Frobbing SO_DONTROUTE here is even worse without sblock
 1379          * than with.
 1380          */
 1381         if (dontroute) {
 1382                 SOCK_LOCK(so);
 1383                 so->so_options |= SO_DONTROUTE;
 1384                 SOCK_UNLOCK(so);
 1385         }
 1386         /*
 1387          * XXX all the SBS_CANTSENDMORE checks previously done could be out
 1388          * of date.  We could have received a reset packet in an interrupt or
 1389          * maybe we slept while doing page faults in uiomove() etc.  We could
 1390          * probably recheck again inside the locking protection here, but
 1391          * there are probably other places that this also happens.  We must
 1392          * rethink this.
 1393          */
 1394         VNET_SO_ASSERT(so);
 1395         error = (*so->so_proto->pr_usrreqs->pru_send)(so,
 1396             (flags & MSG_OOB) ? PRUS_OOB :
 1397         /*
 1398          * If the user set MSG_EOF, the protocol understands this flag and
 1399          * nothing left to send then use PRU_SEND_EOF instead of PRU_SEND.
 1400          */
 1401             ((flags & MSG_EOF) &&
 1402              (so->so_proto->pr_flags & PR_IMPLOPCL) &&
 1403              (resid <= 0)) ?
 1404                 PRUS_EOF :
 1405                 /* If there is more to send set PRUS_MORETOCOME */
 1406                 (flags & MSG_MORETOCOME) ||
 1407                 (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0,
 1408                 top, addr, control, td);
 1409         if (dontroute) {
 1410                 SOCK_LOCK(so);
 1411                 so->so_options &= ~SO_DONTROUTE;
 1412                 SOCK_UNLOCK(so);
 1413         }
 1414         clen = 0;
 1415         control = NULL;
 1416         top = NULL;
 1417 out:
 1418         if (top != NULL)
 1419                 m_freem(top);
 1420         if (control != NULL)
 1421                 m_freem(control);
 1422         return (error);
 1423 }
 1424 
 1425 /*
 1426  * Send on a socket.  If send must go all at once and message is larger than
 1427  * send buffering, then hard error.  Lock against other senders.  If must go
 1428  * all at once and not enough room now, then inform user that this would
 1429  * block and do nothing.  Otherwise, if nonblocking, send as much as
 1430  * possible.  The data to be sent is described by "uio" if nonzero, otherwise
 1431  * by the mbuf chain "top" (which must be null if uio is not).  Data provided
 1432  * in mbuf chain must be small enough to send all at once.
 1433  *
 1434  * Returns nonzero on error, timeout or signal; callers must check for short
 1435  * counts if EINTR/ERESTART are returned.  Data and control buffers are freed
 1436  * on return.
 1437  */
 1438 int
 1439 sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
 1440     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 1441 {
 1442         long space;
 1443         ssize_t resid;
 1444         int clen = 0, error, dontroute;
 1445         int atomic = sosendallatonce(so) || top;
 1446 
 1447         if (uio != NULL)
 1448                 resid = uio->uio_resid;
 1449         else
 1450                 resid = top->m_pkthdr.len;
 1451         /*
 1452          * In theory resid should be unsigned.  However, space must be
 1453          * signed, as it might be less than 0 if we over-committed, and we
 1454          * must use a signed comparison of space and resid.  On the other
 1455          * hand, a negative resid causes us to loop sending 0-length
 1456          * segments to the protocol.
 1457          *
 1458          * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM
 1459          * type sockets since that's an error.
 1460          */
 1461         if (resid < 0 || (so->so_type == SOCK_STREAM && (flags & MSG_EOR))) {
 1462                 error = EINVAL;
 1463                 goto out;
 1464         }
 1465 
 1466         dontroute =
 1467             (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0 &&
 1468             (so->so_proto->pr_flags & PR_ATOMIC);
 1469         if (td != NULL)
 1470                 td->td_ru.ru_msgsnd++;
 1471         if (control != NULL)
 1472                 clen = control->m_len;
 1473 
 1474         error = sblock(&so->so_snd, SBLOCKWAIT(flags));
 1475         if (error)
 1476                 goto out;
 1477 
 1478 restart:
 1479         do {
 1480                 SOCKBUF_LOCK(&so->so_snd);
 1481                 if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
 1482                         SOCKBUF_UNLOCK(&so->so_snd);
 1483                         error = EPIPE;
 1484                         goto release;
 1485                 }
 1486                 if (so->so_error) {
 1487                         error = so->so_error;
 1488                         so->so_error = 0;
 1489                         SOCKBUF_UNLOCK(&so->so_snd);
 1490                         goto release;
 1491                 }
 1492                 if ((so->so_state & SS_ISCONNECTED) == 0) {
 1493                         /*
 1494                          * `sendto' and `sendmsg' is allowed on a connection-
 1495                          * based socket if it supports implied connect.
 1496                          * Return ENOTCONN if not connected and no address is
 1497                          * supplied.
 1498                          */
 1499                         if ((so->so_proto->pr_flags & PR_CONNREQUIRED) &&
 1500                             (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) {
 1501                                 if ((so->so_state & SS_ISCONFIRMING) == 0 &&
 1502                                     !(resid == 0 && clen != 0)) {
 1503                                         SOCKBUF_UNLOCK(&so->so_snd);
 1504                                         error = ENOTCONN;
 1505                                         goto release;
 1506                                 }
 1507                         } else if (addr == NULL) {
 1508                                 SOCKBUF_UNLOCK(&so->so_snd);
 1509                                 if (so->so_proto->pr_flags & PR_CONNREQUIRED)
 1510                                         error = ENOTCONN;
 1511                                 else
 1512                                         error = EDESTADDRREQ;
 1513                                 goto release;
 1514                         }
 1515                 }
 1516                 space = sbspace(&so->so_snd);
 1517                 if (flags & MSG_OOB)
 1518                         space += 1024;
 1519                 if ((atomic && resid > so->so_snd.sb_hiwat) ||
 1520                     clen > so->so_snd.sb_hiwat) {
 1521                         SOCKBUF_UNLOCK(&so->so_snd);
 1522                         error = EMSGSIZE;
 1523                         goto release;
 1524                 }
 1525                 if (space < resid + clen &&
 1526                     (atomic || space < so->so_snd.sb_lowat || space < clen)) {
 1527                         if ((so->so_state & SS_NBIO) ||
 1528                             (flags & (MSG_NBIO | MSG_DONTWAIT)) != 0) {
 1529                                 SOCKBUF_UNLOCK(&so->so_snd);
 1530                                 error = EWOULDBLOCK;
 1531                                 goto release;
 1532                         }
 1533                         error = sbwait(&so->so_snd);
 1534                         SOCKBUF_UNLOCK(&so->so_snd);
 1535                         if (error)
 1536                                 goto release;
 1537                         goto restart;
 1538                 }
 1539                 SOCKBUF_UNLOCK(&so->so_snd);
 1540                 space -= clen;
 1541                 do {
 1542                         if (uio == NULL) {
 1543                                 resid = 0;
 1544                                 if (flags & MSG_EOR)
 1545                                         top->m_flags |= M_EOR;
 1546                         } else {
 1547                                 /*
 1548                                  * Copy the data from userland into a mbuf
 1549                                  * chain.  If resid is 0, which can happen
 1550                                  * only if we have control to send, then
 1551                                  * a single empty mbuf is returned.  This
 1552                                  * is a workaround to prevent protocol send
 1553                                  * methods to panic.
 1554                                  */
 1555                                 top = m_uiotombuf(uio, M_WAITOK, space,
 1556                                     (atomic ? max_hdr : 0),
 1557                                     (atomic ? M_PKTHDR : 0) |
 1558                                     ((flags & MSG_EOR) ? M_EOR : 0));
 1559                                 if (top == NULL) {
 1560                                         error = EFAULT; /* only possible error */
 1561                                         goto release;
 1562                                 }
 1563                                 space -= resid - uio->uio_resid;
 1564                                 resid = uio->uio_resid;
 1565                         }
 1566                         if (dontroute) {
 1567                                 SOCK_LOCK(so);
 1568                                 so->so_options |= SO_DONTROUTE;
 1569                                 SOCK_UNLOCK(so);
 1570                         }
 1571                         /*
 1572                          * XXX all the SBS_CANTSENDMORE checks previously
 1573                          * done could be out of date.  We could have received
 1574                          * a reset packet in an interrupt or maybe we slept
 1575                          * while doing page faults in uiomove() etc.  We
 1576                          * could probably recheck again inside the locking
 1577                          * protection here, but there are probably other
 1578                          * places that this also happens.  We must rethink
 1579                          * this.
 1580                          */
 1581                         VNET_SO_ASSERT(so);
 1582                         error = (*so->so_proto->pr_usrreqs->pru_send)(so,
 1583                             (flags & MSG_OOB) ? PRUS_OOB :
 1584                         /*
 1585                          * If the user set MSG_EOF, the protocol understands
 1586                          * this flag and nothing left to send then use
 1587                          * PRU_SEND_EOF instead of PRU_SEND.
 1588                          */
 1589                             ((flags & MSG_EOF) &&
 1590                              (so->so_proto->pr_flags & PR_IMPLOPCL) &&
 1591                              (resid <= 0)) ?
 1592                                 PRUS_EOF :
 1593                         /* If there is more to send set PRUS_MORETOCOME. */
 1594                             (flags & MSG_MORETOCOME) ||
 1595                             (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0,
 1596                             top, addr, control, td);
 1597                         if (dontroute) {
 1598                                 SOCK_LOCK(so);
 1599                                 so->so_options &= ~SO_DONTROUTE;
 1600                                 SOCK_UNLOCK(so);
 1601                         }
 1602                         clen = 0;
 1603                         control = NULL;
 1604                         top = NULL;
 1605                         if (error)
 1606                                 goto release;
 1607                 } while (resid && space > 0);
 1608         } while (resid);
 1609 
 1610 release:
 1611         sbunlock(&so->so_snd);
 1612 out:
 1613         if (top != NULL)
 1614                 m_freem(top);
 1615         if (control != NULL)
 1616                 m_freem(control);
 1617         return (error);
 1618 }
 1619 
 1620 int
 1621 sosend(struct socket *so, struct sockaddr *addr, struct uio *uio,
 1622     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 1623 {
 1624         int error;
 1625 
 1626         CURVNET_SET(so->so_vnet);
 1627         if (!SOLISTENING(so))
 1628                 error = so->so_proto->pr_usrreqs->pru_sosend(so, addr, uio,
 1629                     top, control, flags, td);
 1630         else {
 1631                 m_freem(top);
 1632                 m_freem(control);
 1633                 error = ENOTCONN;
 1634         }
 1635         CURVNET_RESTORE();
 1636         return (error);
 1637 }
 1638 
 1639 /*
 1640  * The part of soreceive() that implements reading non-inline out-of-band
 1641  * data from a socket.  For more complete comments, see soreceive(), from
 1642  * which this code originated.
 1643  *
 1644  * Note that soreceive_rcvoob(), unlike the remainder of soreceive(), is
 1645  * unable to return an mbuf chain to the caller.
 1646  */
 1647 static int
 1648 soreceive_rcvoob(struct socket *so, struct uio *uio, int flags)
 1649 {
 1650         struct protosw *pr = so->so_proto;
 1651         struct mbuf *m;
 1652         int error;
 1653 
 1654         KASSERT(flags & MSG_OOB, ("soreceive_rcvoob: (flags & MSG_OOB) == 0"));
 1655         VNET_SO_ASSERT(so);
 1656 
 1657         m = m_get(M_WAITOK, MT_DATA);
 1658         error = (*pr->pr_usrreqs->pru_rcvoob)(so, m, flags & MSG_PEEK);
 1659         if (error)
 1660                 goto bad;
 1661         do {
 1662                 error = uiomove(mtod(m, void *),
 1663                     (int) min(uio->uio_resid, m->m_len), uio);
 1664                 m = m_free(m);
 1665         } while (uio->uio_resid && error == 0 && m);
 1666 bad:
 1667         if (m != NULL)
 1668                 m_freem(m);
 1669         return (error);
 1670 }
 1671 
 1672 /*
 1673  * Following replacement or removal of the first mbuf on the first mbuf chain
 1674  * of a socket buffer, push necessary state changes back into the socket
 1675  * buffer so that other consumers see the values consistently.  'nextrecord'
 1676  * is the callers locally stored value of the original value of
 1677  * sb->sb_mb->m_nextpkt which must be restored when the lead mbuf changes.
 1678  * NOTE: 'nextrecord' may be NULL.
 1679  */
 1680 static __inline void
 1681 sockbuf_pushsync(struct sockbuf *sb, struct mbuf *nextrecord)
 1682 {
 1683 
 1684         SOCKBUF_LOCK_ASSERT(sb);
 1685         /*
 1686          * First, update for the new value of nextrecord.  If necessary, make
 1687          * it the first record.
 1688          */
 1689         if (sb->sb_mb != NULL)
 1690                 sb->sb_mb->m_nextpkt = nextrecord;
 1691         else
 1692                 sb->sb_mb = nextrecord;
 1693 
 1694         /*
 1695          * Now update any dependent socket buffer fields to reflect the new
 1696          * state.  This is an expanded inline of SB_EMPTY_FIXUP(), with the
 1697          * addition of a second clause that takes care of the case where
 1698          * sb_mb has been updated, but remains the last record.
 1699          */
 1700         if (sb->sb_mb == NULL) {
 1701                 sb->sb_mbtail = NULL;
 1702                 sb->sb_lastrecord = NULL;
 1703         } else if (sb->sb_mb->m_nextpkt == NULL)
 1704                 sb->sb_lastrecord = sb->sb_mb;
 1705 }
 1706 
 1707 /*
 1708  * Implement receive operations on a socket.  We depend on the way that
 1709  * records are added to the sockbuf by sbappend.  In particular, each record
 1710  * (mbufs linked through m_next) must begin with an address if the protocol
 1711  * so specifies, followed by an optional mbuf or mbufs containing ancillary
 1712  * data, and then zero or more mbufs of data.  In order to allow parallelism
 1713  * between network receive and copying to user space, as well as avoid
 1714  * sleeping with a mutex held, we release the socket buffer mutex during the
 1715  * user space copy.  Although the sockbuf is locked, new data may still be
 1716  * appended, and thus we must maintain consistency of the sockbuf during that
 1717  * time.
 1718  *
 1719  * The caller may receive the data as a single mbuf chain by supplying an
 1720  * mbuf **mp0 for use in returning the chain.  The uio is then used only for
 1721  * the count in uio_resid.
 1722  */
 1723 int
 1724 soreceive_generic(struct socket *so, struct sockaddr **psa, struct uio *uio,
 1725     struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 1726 {
 1727         struct mbuf *m, **mp;
 1728         int flags, error, offset;
 1729         ssize_t len;
 1730         struct protosw *pr = so->so_proto;
 1731         struct mbuf *nextrecord;
 1732         int moff, type = 0;
 1733         ssize_t orig_resid = uio->uio_resid;
 1734 
 1735         mp = mp0;
 1736         if (psa != NULL)
 1737                 *psa = NULL;
 1738         if (controlp != NULL)
 1739                 *controlp = NULL;
 1740         if (flagsp != NULL)
 1741                 flags = *flagsp &~ MSG_EOR;
 1742         else
 1743                 flags = 0;
 1744         if (flags & MSG_OOB)
 1745                 return (soreceive_rcvoob(so, uio, flags));
 1746         if (mp != NULL)
 1747                 *mp = NULL;
 1748         if ((pr->pr_flags & PR_WANTRCVD) && (so->so_state & SS_ISCONFIRMING)
 1749             && uio->uio_resid) {
 1750                 VNET_SO_ASSERT(so);
 1751                 (*pr->pr_usrreqs->pru_rcvd)(so, 0);
 1752         }
 1753 
 1754         error = sblock(&so->so_rcv, SBLOCKWAIT(flags));
 1755         if (error)
 1756                 return (error);
 1757 
 1758 restart:
 1759         SOCKBUF_LOCK(&so->so_rcv);
 1760         m = so->so_rcv.sb_mb;
 1761         /*
 1762          * If we have less data than requested, block awaiting more (subject
 1763          * to any timeout) if:
 1764          *   1. the current count is less than the low water mark, or
 1765          *   2. MSG_DONTWAIT is not set
 1766          */
 1767         if (m == NULL || (((flags & MSG_DONTWAIT) == 0 &&
 1768             sbavail(&so->so_rcv) < uio->uio_resid) &&
 1769             sbavail(&so->so_rcv) < so->so_rcv.sb_lowat &&
 1770             m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) {
 1771                 KASSERT(m != NULL || !sbavail(&so->so_rcv),
 1772                     ("receive: m == %p sbavail == %u",
 1773                     m, sbavail(&so->so_rcv)));
 1774                 if (so->so_error || so->so_rerror) {
 1775                         if (m != NULL)
 1776                                 goto dontblock;
 1777                         if (so->so_error)
 1778                                 error = so->so_error;
 1779                         else
 1780                                 error = so->so_rerror;
 1781                         if ((flags & MSG_PEEK) == 0) {
 1782                                 if (so->so_error)
 1783                                         so->so_error = 0;
 1784                                 else
 1785                                         so->so_rerror = 0;
 1786                         }
 1787                         SOCKBUF_UNLOCK(&so->so_rcv);
 1788                         goto release;
 1789                 }
 1790                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1791                 if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 1792                         if (m == NULL) {
 1793                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1794                                 goto release;
 1795                         } else
 1796                                 goto dontblock;
 1797                 }
 1798                 for (; m != NULL; m = m->m_next)
 1799                         if (m->m_type == MT_OOBDATA  || (m->m_flags & M_EOR)) {
 1800                                 m = so->so_rcv.sb_mb;
 1801                                 goto dontblock;
 1802                         }
 1803                 if ((so->so_state & (SS_ISCONNECTING | SS_ISCONNECTED |
 1804                     SS_ISDISCONNECTING | SS_ISDISCONNECTED)) == 0 &&
 1805                     (so->so_proto->pr_flags & PR_CONNREQUIRED) != 0) {
 1806                         SOCKBUF_UNLOCK(&so->so_rcv);
 1807                         error = ENOTCONN;
 1808                         goto release;
 1809                 }
 1810                 if (uio->uio_resid == 0) {
 1811                         SOCKBUF_UNLOCK(&so->so_rcv);
 1812                         goto release;
 1813                 }
 1814                 if ((so->so_state & SS_NBIO) ||
 1815                     (flags & (MSG_DONTWAIT|MSG_NBIO))) {
 1816                         SOCKBUF_UNLOCK(&so->so_rcv);
 1817                         error = EWOULDBLOCK;
 1818                         goto release;
 1819                 }
 1820                 SBLASTRECORDCHK(&so->so_rcv);
 1821                 SBLASTMBUFCHK(&so->so_rcv);
 1822                 error = sbwait(&so->so_rcv);
 1823                 SOCKBUF_UNLOCK(&so->so_rcv);
 1824                 if (error)
 1825                         goto release;
 1826                 goto restart;
 1827         }
 1828 dontblock:
 1829         /*
 1830          * From this point onward, we maintain 'nextrecord' as a cache of the
 1831          * pointer to the next record in the socket buffer.  We must keep the
 1832          * various socket buffer pointers and local stack versions of the
 1833          * pointers in sync, pushing out modifications before dropping the
 1834          * socket buffer mutex, and re-reading them when picking it up.
 1835          *
 1836          * Otherwise, we will race with the network stack appending new data
 1837          * or records onto the socket buffer by using inconsistent/stale
 1838          * versions of the field, possibly resulting in socket buffer
 1839          * corruption.
 1840          *
 1841          * By holding the high-level sblock(), we prevent simultaneous
 1842          * readers from pulling off the front of the socket buffer.
 1843          */
 1844         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1845         if (uio->uio_td)
 1846                 uio->uio_td->td_ru.ru_msgrcv++;
 1847         KASSERT(m == so->so_rcv.sb_mb, ("soreceive: m != so->so_rcv.sb_mb"));
 1848         SBLASTRECORDCHK(&so->so_rcv);
 1849         SBLASTMBUFCHK(&so->so_rcv);
 1850         nextrecord = m->m_nextpkt;
 1851         if (pr->pr_flags & PR_ADDR) {
 1852                 KASSERT(m->m_type == MT_SONAME,
 1853                     ("m->m_type == %d", m->m_type));
 1854                 orig_resid = 0;
 1855                 if (psa != NULL)
 1856                         *psa = sodupsockaddr(mtod(m, struct sockaddr *),
 1857                             M_NOWAIT);
 1858                 if (flags & MSG_PEEK) {
 1859                         m = m->m_next;
 1860                 } else {
 1861                         sbfree(&so->so_rcv, m);
 1862                         so->so_rcv.sb_mb = m_free(m);
 1863                         m = so->so_rcv.sb_mb;
 1864                         sockbuf_pushsync(&so->so_rcv, nextrecord);
 1865                 }
 1866         }
 1867 
 1868         /*
 1869          * Process one or more MT_CONTROL mbufs present before any data mbufs
 1870          * in the first mbuf chain on the socket buffer.  If MSG_PEEK, we
 1871          * just copy the data; if !MSG_PEEK, we call into the protocol to
 1872          * perform externalization (or freeing if controlp == NULL).
 1873          */
 1874         if (m != NULL && m->m_type == MT_CONTROL) {
 1875                 struct mbuf *cm = NULL, *cmn;
 1876                 struct mbuf **cme = &cm;
 1877 
 1878                 do {
 1879                         if (flags & MSG_PEEK) {
 1880                                 if (controlp != NULL) {
 1881                                         *controlp = m_copym(m, 0, m->m_len,
 1882                                             M_NOWAIT);
 1883                                         controlp = &(*controlp)->m_next;
 1884                                 }
 1885                                 m = m->m_next;
 1886                         } else {
 1887                                 sbfree(&so->so_rcv, m);
 1888                                 so->so_rcv.sb_mb = m->m_next;
 1889                                 m->m_next = NULL;
 1890                                 *cme = m;
 1891                                 cme = &(*cme)->m_next;
 1892                                 m = so->so_rcv.sb_mb;
 1893                         }
 1894                 } while (m != NULL && m->m_type == MT_CONTROL);
 1895                 if ((flags & MSG_PEEK) == 0)
 1896                         sockbuf_pushsync(&so->so_rcv, nextrecord);
 1897                 while (cm != NULL) {
 1898                         cmn = cm->m_next;
 1899                         cm->m_next = NULL;
 1900                         if (pr->pr_domain->dom_externalize != NULL) {
 1901                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1902                                 VNET_SO_ASSERT(so);
 1903                                 error = (*pr->pr_domain->dom_externalize)
 1904                                     (cm, controlp, flags);
 1905                                 SOCKBUF_LOCK(&so->so_rcv);
 1906                         } else if (controlp != NULL)
 1907                                 *controlp = cm;
 1908                         else
 1909                                 m_freem(cm);
 1910                         if (controlp != NULL) {
 1911                                 orig_resid = 0;
 1912                                 while (*controlp != NULL)
 1913                                         controlp = &(*controlp)->m_next;
 1914                         }
 1915                         cm = cmn;
 1916                 }
 1917                 if (m != NULL)
 1918                         nextrecord = so->so_rcv.sb_mb->m_nextpkt;
 1919                 else
 1920                         nextrecord = so->so_rcv.sb_mb;
 1921                 orig_resid = 0;
 1922         }
 1923         if (m != NULL) {
 1924                 if ((flags & MSG_PEEK) == 0) {
 1925                         KASSERT(m->m_nextpkt == nextrecord,
 1926                             ("soreceive: post-control, nextrecord !sync"));
 1927                         if (nextrecord == NULL) {
 1928                                 KASSERT(so->so_rcv.sb_mb == m,
 1929                                     ("soreceive: post-control, sb_mb!=m"));
 1930                                 KASSERT(so->so_rcv.sb_lastrecord == m,
 1931                                     ("soreceive: post-control, lastrecord!=m"));
 1932                         }
 1933                 }
 1934                 type = m->m_type;
 1935                 if (type == MT_OOBDATA)
 1936                         flags |= MSG_OOB;
 1937         } else {
 1938                 if ((flags & MSG_PEEK) == 0) {
 1939                         KASSERT(so->so_rcv.sb_mb == nextrecord,
 1940                             ("soreceive: sb_mb != nextrecord"));
 1941                         if (so->so_rcv.sb_mb == NULL) {
 1942                                 KASSERT(so->so_rcv.sb_lastrecord == NULL,
 1943                                     ("soreceive: sb_lastercord != NULL"));
 1944                         }
 1945                 }
 1946         }
 1947         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1948         SBLASTRECORDCHK(&so->so_rcv);
 1949         SBLASTMBUFCHK(&so->so_rcv);
 1950 
 1951         /*
 1952          * Now continue to read any data mbufs off of the head of the socket
 1953          * buffer until the read request is satisfied.  Note that 'type' is
 1954          * used to store the type of any mbuf reads that have happened so far
 1955          * such that soreceive() can stop reading if the type changes, which
 1956          * causes soreceive() to return only one of regular data and inline
 1957          * out-of-band data in a single socket receive operation.
 1958          */
 1959         moff = 0;
 1960         offset = 0;
 1961         while (m != NULL && !(m->m_flags & M_NOTAVAIL) && uio->uio_resid > 0
 1962             && error == 0) {
 1963                 /*
 1964                  * If the type of mbuf has changed since the last mbuf
 1965                  * examined ('type'), end the receive operation.
 1966                  */
 1967                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1968                 if (m->m_type == MT_OOBDATA || m->m_type == MT_CONTROL) {
 1969                         if (type != m->m_type)
 1970                                 break;
 1971                 } else if (type == MT_OOBDATA)
 1972                         break;
 1973                 else
 1974                     KASSERT(m->m_type == MT_DATA,
 1975                         ("m->m_type == %d", m->m_type));
 1976                 so->so_rcv.sb_state &= ~SBS_RCVATMARK;
 1977                 len = uio->uio_resid;
 1978                 if (so->so_oobmark && len > so->so_oobmark - offset)
 1979                         len = so->so_oobmark - offset;
 1980                 if (len > m->m_len - moff)
 1981                         len = m->m_len - moff;
 1982                 /*
 1983                  * If mp is set, just pass back the mbufs.  Otherwise copy
 1984                  * them out via the uio, then free.  Sockbuf must be
 1985                  * consistent here (points to current mbuf, it points to next
 1986                  * record) when we drop priority; we must note any additions
 1987                  * to the sockbuf when we block interrupts again.
 1988                  */
 1989                 if (mp == NULL) {
 1990                         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1991                         SBLASTRECORDCHK(&so->so_rcv);
 1992                         SBLASTMBUFCHK(&so->so_rcv);
 1993                         SOCKBUF_UNLOCK(&so->so_rcv);
 1994                         error = uiomove(mtod(m, char *) + moff, (int)len, uio);
 1995                         SOCKBUF_LOCK(&so->so_rcv);
 1996                         if (error) {
 1997                                 /*
 1998                                  * The MT_SONAME mbuf has already been removed
 1999                                  * from the record, so it is necessary to
 2000                                  * remove the data mbufs, if any, to preserve
 2001                                  * the invariant in the case of PR_ADDR that
 2002                                  * requires MT_SONAME mbufs at the head of
 2003                                  * each record.
 2004                                  */
 2005                                 if (pr->pr_flags & PR_ATOMIC &&
 2006                                     ((flags & MSG_PEEK) == 0))
 2007                                         (void)sbdroprecord_locked(&so->so_rcv);
 2008                                 SOCKBUF_UNLOCK(&so->so_rcv);
 2009                                 goto release;
 2010                         }
 2011                 } else
 2012                         uio->uio_resid -= len;
 2013                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2014                 if (len == m->m_len - moff) {
 2015                         if (m->m_flags & M_EOR)
 2016                                 flags |= MSG_EOR;
 2017                         if (flags & MSG_PEEK) {
 2018                                 m = m->m_next;
 2019                                 moff = 0;
 2020                         } else {
 2021                                 nextrecord = m->m_nextpkt;
 2022                                 sbfree(&so->so_rcv, m);
 2023                                 if (mp != NULL) {
 2024                                         m->m_nextpkt = NULL;
 2025                                         *mp = m;
 2026                                         mp = &m->m_next;
 2027                                         so->so_rcv.sb_mb = m = m->m_next;
 2028                                         *mp = NULL;
 2029                                 } else {
 2030                                         so->so_rcv.sb_mb = m_free(m);
 2031                                         m = so->so_rcv.sb_mb;
 2032                                 }
 2033                                 sockbuf_pushsync(&so->so_rcv, nextrecord);
 2034                                 SBLASTRECORDCHK(&so->so_rcv);
 2035                                 SBLASTMBUFCHK(&so->so_rcv);
 2036                         }
 2037                 } else {
 2038                         if (flags & MSG_PEEK)
 2039                                 moff += len;
 2040                         else {
 2041                                 if (mp != NULL) {
 2042                                         if (flags & MSG_DONTWAIT) {
 2043                                                 *mp = m_copym(m, 0, len,
 2044                                                     M_NOWAIT);
 2045                                                 if (*mp == NULL) {
 2046                                                         /*
 2047                                                          * m_copym() couldn't
 2048                                                          * allocate an mbuf.
 2049                                                          * Adjust uio_resid back
 2050                                                          * (it was adjusted
 2051                                                          * down by len bytes,
 2052                                                          * which we didn't end
 2053                                                          * up "copying" over).
 2054                                                          */
 2055                                                         uio->uio_resid += len;
 2056                                                         break;
 2057                                                 }
 2058                                         } else {
 2059                                                 SOCKBUF_UNLOCK(&so->so_rcv);
 2060                                                 *mp = m_copym(m, 0, len,
 2061                                                     M_WAITOK);
 2062                                                 SOCKBUF_LOCK(&so->so_rcv);
 2063                                         }
 2064                                 }
 2065                                 sbcut_locked(&so->so_rcv, len);
 2066                         }
 2067                 }
 2068                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2069                 if (so->so_oobmark) {
 2070                         if ((flags & MSG_PEEK) == 0) {
 2071                                 so->so_oobmark -= len;
 2072                                 if (so->so_oobmark == 0) {
 2073                                         so->so_rcv.sb_state |= SBS_RCVATMARK;
 2074                                         break;
 2075                                 }
 2076                         } else {
 2077                                 offset += len;
 2078                                 if (offset == so->so_oobmark)
 2079                                         break;
 2080                         }
 2081                 }
 2082                 if (flags & MSG_EOR)
 2083                         break;
 2084                 /*
 2085                  * If the MSG_WAITALL flag is set (for non-atomic socket), we
 2086                  * must not quit until "uio->uio_resid == 0" or an error
 2087                  * termination.  If a signal/timeout occurs, return with a
 2088                  * short count but without error.  Keep sockbuf locked
 2089                  * against other readers.
 2090                  */
 2091                 while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 &&
 2092                     !sosendallatonce(so) && nextrecord == NULL) {
 2093                         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2094                         if (so->so_error || so->so_rerror ||
 2095                             so->so_rcv.sb_state & SBS_CANTRCVMORE)
 2096                                 break;
 2097                         /*
 2098                          * Notify the protocol that some data has been
 2099                          * drained before blocking.
 2100                          */
 2101                         if (pr->pr_flags & PR_WANTRCVD) {
 2102                                 SOCKBUF_UNLOCK(&so->so_rcv);
 2103                                 VNET_SO_ASSERT(so);
 2104                                 (*pr->pr_usrreqs->pru_rcvd)(so, flags);
 2105                                 SOCKBUF_LOCK(&so->so_rcv);
 2106                         }
 2107                         SBLASTRECORDCHK(&so->so_rcv);
 2108                         SBLASTMBUFCHK(&so->so_rcv);
 2109                         /*
 2110                          * We could receive some data while was notifying
 2111                          * the protocol. Skip blocking in this case.
 2112                          */
 2113                         if (so->so_rcv.sb_mb == NULL) {
 2114                                 error = sbwait(&so->so_rcv);
 2115                                 if (error) {
 2116                                         SOCKBUF_UNLOCK(&so->so_rcv);
 2117                                         goto release;
 2118                                 }
 2119                         }
 2120                         m = so->so_rcv.sb_mb;
 2121                         if (m != NULL)
 2122                                 nextrecord = m->m_nextpkt;
 2123                 }
 2124         }
 2125 
 2126         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2127         if (m != NULL && pr->pr_flags & PR_ATOMIC) {
 2128                 flags |= MSG_TRUNC;
 2129                 if ((flags & MSG_PEEK) == 0)
 2130                         (void) sbdroprecord_locked(&so->so_rcv);
 2131         }
 2132         if ((flags & MSG_PEEK) == 0) {
 2133                 if (m == NULL) {
 2134                         /*
 2135                          * First part is an inline SB_EMPTY_FIXUP().  Second
 2136                          * part makes sure sb_lastrecord is up-to-date if
 2137                          * there is still data in the socket buffer.
 2138                          */
 2139                         so->so_rcv.sb_mb = nextrecord;
 2140                         if (so->so_rcv.sb_mb == NULL) {
 2141                                 so->so_rcv.sb_mbtail = NULL;
 2142                                 so->so_rcv.sb_lastrecord = NULL;
 2143                         } else if (nextrecord->m_nextpkt == NULL)
 2144                                 so->so_rcv.sb_lastrecord = nextrecord;
 2145                 }
 2146                 SBLASTRECORDCHK(&so->so_rcv);
 2147                 SBLASTMBUFCHK(&so->so_rcv);
 2148                 /*
 2149                  * If soreceive() is being done from the socket callback,
 2150                  * then don't need to generate ACK to peer to update window,
 2151                  * since ACK will be generated on return to TCP.
 2152                  */
 2153                 if (!(flags & MSG_SOCALLBCK) &&
 2154                     (pr->pr_flags & PR_WANTRCVD)) {
 2155                         SOCKBUF_UNLOCK(&so->so_rcv);
 2156                         VNET_SO_ASSERT(so);
 2157                         (*pr->pr_usrreqs->pru_rcvd)(so, flags);
 2158                         SOCKBUF_LOCK(&so->so_rcv);
 2159                 }
 2160         }
 2161         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2162         if (orig_resid == uio->uio_resid && orig_resid &&
 2163             (flags & MSG_EOR) == 0 && (so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0) {
 2164                 SOCKBUF_UNLOCK(&so->so_rcv);
 2165                 goto restart;
 2166         }
 2167         SOCKBUF_UNLOCK(&so->so_rcv);
 2168 
 2169         if (flagsp != NULL)
 2170                 *flagsp |= flags;
 2171 release:
 2172         sbunlock(&so->so_rcv);
 2173         return (error);
 2174 }
 2175 
 2176 /*
 2177  * Optimized version of soreceive() for stream (TCP) sockets.
 2178  */
 2179 int
 2180 soreceive_stream(struct socket *so, struct sockaddr **psa, struct uio *uio,
 2181     struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 2182 {
 2183         int len = 0, error = 0, flags, oresid;
 2184         struct sockbuf *sb;
 2185         struct mbuf *m, *n = NULL;
 2186 
 2187         /* We only do stream sockets. */
 2188         if (so->so_type != SOCK_STREAM)
 2189                 return (EINVAL);
 2190         if (psa != NULL)
 2191                 *psa = NULL;
 2192         if (flagsp != NULL)
 2193                 flags = *flagsp &~ MSG_EOR;
 2194         else
 2195                 flags = 0;
 2196         if (controlp != NULL)
 2197                 *controlp = NULL;
 2198         if (flags & MSG_OOB)
 2199                 return (soreceive_rcvoob(so, uio, flags));
 2200         if (mp0 != NULL)
 2201                 *mp0 = NULL;
 2202 
 2203         sb = &so->so_rcv;
 2204 
 2205         /* Prevent other readers from entering the socket. */
 2206         error = sblock(sb, SBLOCKWAIT(flags));
 2207         if (error)
 2208                 return (error);
 2209         SOCKBUF_LOCK(sb);
 2210 
 2211         /* Easy one, no space to copyout anything. */
 2212         if (uio->uio_resid == 0) {
 2213                 error = EINVAL;
 2214                 goto out;
 2215         }
 2216         oresid = uio->uio_resid;
 2217 
 2218         /* We will never ever get anything unless we are or were connected. */
 2219         if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
 2220                 error = ENOTCONN;
 2221                 goto out;
 2222         }
 2223 
 2224 restart:
 2225         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2226 
 2227         /* Abort if socket has reported problems. */
 2228         if (so->so_error) {
 2229                 if (sbavail(sb) > 0)
 2230                         goto deliver;
 2231                 if (oresid > uio->uio_resid)
 2232                         goto out;
 2233                 error = so->so_error;
 2234                 if (!(flags & MSG_PEEK))
 2235                         so->so_error = 0;
 2236                 goto out;
 2237         }
 2238 
 2239         /* Door is closed.  Deliver what is left, if any. */
 2240         if (sb->sb_state & SBS_CANTRCVMORE) {
 2241                 if (sbavail(sb) > 0)
 2242                         goto deliver;
 2243                 else
 2244                         goto out;
 2245         }
 2246 
 2247         /* Socket buffer is empty and we shall not block. */
 2248         if (sbavail(sb) == 0 &&
 2249             ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) {
 2250                 error = EAGAIN;
 2251                 goto out;
 2252         }
 2253 
 2254         /* Socket buffer got some data that we shall deliver now. */
 2255         if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) &&
 2256             ((so->so_state & SS_NBIO) ||
 2257              (flags & (MSG_DONTWAIT|MSG_NBIO)) ||
 2258              sbavail(sb) >= sb->sb_lowat ||
 2259              sbavail(sb) >= uio->uio_resid ||
 2260              sbavail(sb) >= sb->sb_hiwat) ) {
 2261                 goto deliver;
 2262         }
 2263 
 2264         /* On MSG_WAITALL we must wait until all data or error arrives. */
 2265         if ((flags & MSG_WAITALL) &&
 2266             (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat))
 2267                 goto deliver;
 2268 
 2269         /*
 2270          * Wait and block until (more) data comes in.
 2271          * NB: Drops the sockbuf lock during wait.
 2272          */
 2273         error = sbwait(sb);
 2274         if (error)
 2275                 goto out;
 2276         goto restart;
 2277 
 2278 deliver:
 2279         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2280         KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__));
 2281         KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__));
 2282 
 2283         /* Statistics. */
 2284         if (uio->uio_td)
 2285                 uio->uio_td->td_ru.ru_msgrcv++;
 2286 
 2287         /* Fill uio until full or current end of socket buffer is reached. */
 2288         len = min(uio->uio_resid, sbavail(sb));
 2289         if (mp0 != NULL) {
 2290                 /* Dequeue as many mbufs as possible. */
 2291                 if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) {
 2292                         if (*mp0 == NULL)
 2293                                 *mp0 = sb->sb_mb;
 2294                         else
 2295                                 m_cat(*mp0, sb->sb_mb);
 2296                         for (m = sb->sb_mb;
 2297                              m != NULL && m->m_len <= len;
 2298                              m = m->m_next) {
 2299                                 KASSERT(!(m->m_flags & M_NOTAVAIL),
 2300                                     ("%s: m %p not available", __func__, m));
 2301                                 len -= m->m_len;
 2302                                 uio->uio_resid -= m->m_len;
 2303                                 sbfree(sb, m);
 2304                                 n = m;
 2305                         }
 2306                         n->m_next = NULL;
 2307                         sb->sb_mb = m;
 2308                         sb->sb_lastrecord = sb->sb_mb;
 2309                         if (sb->sb_mb == NULL)
 2310                                 SB_EMPTY_FIXUP(sb);
 2311                 }
 2312                 /* Copy the remainder. */
 2313                 if (len > 0) {
 2314                         KASSERT(sb->sb_mb != NULL,
 2315                             ("%s: len > 0 && sb->sb_mb empty", __func__));
 2316 
 2317                         m = m_copym(sb->sb_mb, 0, len, M_NOWAIT);
 2318                         if (m == NULL)
 2319                                 len = 0;        /* Don't flush data from sockbuf. */
 2320                         else
 2321                                 uio->uio_resid -= len;
 2322                         if (*mp0 != NULL)
 2323                                 m_cat(*mp0, m);
 2324                         else
 2325                                 *mp0 = m;
 2326                         if (*mp0 == NULL) {
 2327                                 error = ENOBUFS;
 2328                                 goto out;
 2329                         }
 2330                 }
 2331         } else {
 2332                 /* NB: Must unlock socket buffer as uiomove may sleep. */
 2333                 SOCKBUF_UNLOCK(sb);
 2334                 error = m_mbuftouio(uio, sb->sb_mb, len);
 2335                 SOCKBUF_LOCK(sb);
 2336                 if (error)
 2337                         goto out;
 2338         }
 2339         SBLASTRECORDCHK(sb);
 2340         SBLASTMBUFCHK(sb);
 2341 
 2342         /*
 2343          * Remove the delivered data from the socket buffer unless we
 2344          * were only peeking.
 2345          */
 2346         if (!(flags & MSG_PEEK)) {
 2347                 if (len > 0)
 2348                         sbdrop_locked(sb, len);
 2349 
 2350                 /* Notify protocol that we drained some data. */
 2351                 if ((so->so_proto->pr_flags & PR_WANTRCVD) &&
 2352                     (((flags & MSG_WAITALL) && uio->uio_resid > 0) ||
 2353                      !(flags & MSG_SOCALLBCK))) {
 2354                         SOCKBUF_UNLOCK(sb);
 2355                         VNET_SO_ASSERT(so);
 2356                         (*so->so_proto->pr_usrreqs->pru_rcvd)(so, flags);
 2357                         SOCKBUF_LOCK(sb);
 2358                 }
 2359         }
 2360 
 2361         /*
 2362          * For MSG_WAITALL we may have to loop again and wait for
 2363          * more data to come in.
 2364          */
 2365         if ((flags & MSG_WAITALL) && uio->uio_resid > 0)
 2366                 goto restart;
 2367 out:
 2368         SOCKBUF_LOCK_ASSERT(sb);
 2369         SBLASTRECORDCHK(sb);
 2370         SBLASTMBUFCHK(sb);
 2371         SOCKBUF_UNLOCK(sb);
 2372         sbunlock(sb);
 2373         return (error);
 2374 }
 2375 
 2376 /*
 2377  * Optimized version of soreceive() for simple datagram cases from userspace.
 2378  * Unlike in the stream case, we're able to drop a datagram if copyout()
 2379  * fails, and because we handle datagrams atomically, we don't need to use a
 2380  * sleep lock to prevent I/O interlacing.
 2381  */
 2382 int
 2383 soreceive_dgram(struct socket *so, struct sockaddr **psa, struct uio *uio,
 2384     struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 2385 {
 2386         struct mbuf *m, *m2;
 2387         int flags, error;
 2388         ssize_t len;
 2389         struct protosw *pr = so->so_proto;
 2390         struct mbuf *nextrecord;
 2391 
 2392         if (psa != NULL)
 2393                 *psa = NULL;
 2394         if (controlp != NULL)
 2395                 *controlp = NULL;
 2396         if (flagsp != NULL)
 2397                 flags = *flagsp &~ MSG_EOR;
 2398         else
 2399                 flags = 0;
 2400 
 2401         /*
 2402          * For any complicated cases, fall back to the full
 2403          * soreceive_generic().
 2404          */
 2405         if (mp0 != NULL || (flags & MSG_PEEK) || (flags & MSG_OOB))
 2406                 return (soreceive_generic(so, psa, uio, mp0, controlp,
 2407                     flagsp));
 2408 
 2409         /*
 2410          * Enforce restrictions on use.
 2411          */
 2412         KASSERT((pr->pr_flags & PR_WANTRCVD) == 0,
 2413             ("soreceive_dgram: wantrcvd"));
 2414         KASSERT(pr->pr_flags & PR_ATOMIC, ("soreceive_dgram: !atomic"));
 2415         KASSERT((so->so_rcv.sb_state & SBS_RCVATMARK) == 0,
 2416             ("soreceive_dgram: SBS_RCVATMARK"));
 2417         KASSERT((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0,
 2418             ("soreceive_dgram: P_CONNREQUIRED"));
 2419 
 2420         /*
 2421          * Loop blocking while waiting for a datagram.
 2422          */
 2423         SOCKBUF_LOCK(&so->so_rcv);
 2424         while ((m = so->so_rcv.sb_mb) == NULL) {
 2425                 KASSERT(sbavail(&so->so_rcv) == 0,
 2426                     ("soreceive_dgram: sb_mb NULL but sbavail %u",
 2427                     sbavail(&so->so_rcv)));
 2428                 if (so->so_error) {
 2429                         error = so->so_error;
 2430                         so->so_error = 0;
 2431                         SOCKBUF_UNLOCK(&so->so_rcv);
 2432                         return (error);
 2433                 }
 2434                 if (so->so_rcv.sb_state & SBS_CANTRCVMORE ||
 2435                     uio->uio_resid == 0) {
 2436                         SOCKBUF_UNLOCK(&so->so_rcv);
 2437                         return (0);
 2438                 }
 2439                 if ((so->so_state & SS_NBIO) ||
 2440                     (flags & (MSG_DONTWAIT|MSG_NBIO))) {
 2441                         SOCKBUF_UNLOCK(&so->so_rcv);
 2442                         return (EWOULDBLOCK);
 2443                 }
 2444                 SBLASTRECORDCHK(&so->so_rcv);
 2445                 SBLASTMBUFCHK(&so->so_rcv);
 2446                 error = sbwait(&so->so_rcv);
 2447                 if (error) {
 2448                         SOCKBUF_UNLOCK(&so->so_rcv);
 2449                         return (error);
 2450                 }
 2451         }
 2452         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2453 
 2454         if (uio->uio_td)
 2455                 uio->uio_td->td_ru.ru_msgrcv++;
 2456         SBLASTRECORDCHK(&so->so_rcv);
 2457         SBLASTMBUFCHK(&so->so_rcv);
 2458         nextrecord = m->m_nextpkt;
 2459         if (nextrecord == NULL) {
 2460                 KASSERT(so->so_rcv.sb_lastrecord == m,
 2461                     ("soreceive_dgram: lastrecord != m"));
 2462         }
 2463 
 2464         KASSERT(so->so_rcv.sb_mb->m_nextpkt == nextrecord,
 2465             ("soreceive_dgram: m_nextpkt != nextrecord"));
 2466 
 2467         /*
 2468          * Pull 'm' and its chain off the front of the packet queue.
 2469          */
 2470         so->so_rcv.sb_mb = NULL;
 2471         sockbuf_pushsync(&so->so_rcv, nextrecord);
 2472 
 2473         /*
 2474          * Walk 'm's chain and free that many bytes from the socket buffer.
 2475          */
 2476         for (m2 = m; m2 != NULL; m2 = m2->m_next)
 2477                 sbfree(&so->so_rcv, m2);
 2478 
 2479         /*
 2480          * Do a few last checks before we let go of the lock.
 2481          */
 2482         SBLASTRECORDCHK(&so->so_rcv);
 2483         SBLASTMBUFCHK(&so->so_rcv);
 2484         SOCKBUF_UNLOCK(&so->so_rcv);
 2485 
 2486         if (pr->pr_flags & PR_ADDR) {
 2487                 KASSERT(m->m_type == MT_SONAME,
 2488                     ("m->m_type == %d", m->m_type));
 2489                 if (psa != NULL)
 2490                         *psa = sodupsockaddr(mtod(m, struct sockaddr *),
 2491                             M_NOWAIT);
 2492                 m = m_free(m);
 2493         }
 2494         if (m == NULL) {
 2495                 /* XXXRW: Can this happen? */
 2496                 return (0);
 2497         }
 2498 
 2499         /*
 2500          * Packet to copyout() is now in 'm' and it is disconnected from the
 2501          * queue.
 2502          *
 2503          * Process one or more MT_CONTROL mbufs present before any data mbufs
 2504          * in the first mbuf chain on the socket buffer.  We call into the
 2505          * protocol to perform externalization (or freeing if controlp ==
 2506          * NULL). In some cases there can be only MT_CONTROL mbufs without
 2507          * MT_DATA mbufs.
 2508          */
 2509         if (m->m_type == MT_CONTROL) {
 2510                 struct mbuf *cm = NULL, *cmn;
 2511                 struct mbuf **cme = &cm;
 2512 
 2513                 do {
 2514                         m2 = m->m_next;
 2515                         m->m_next = NULL;
 2516                         *cme = m;
 2517                         cme = &(*cme)->m_next;
 2518                         m = m2;
 2519                 } while (m != NULL && m->m_type == MT_CONTROL);
 2520                 while (cm != NULL) {
 2521                         cmn = cm->m_next;
 2522                         cm->m_next = NULL;
 2523                         if (pr->pr_domain->dom_externalize != NULL) {
 2524                                 error = (*pr->pr_domain->dom_externalize)
 2525                                     (cm, controlp, flags);
 2526                         } else if (controlp != NULL)
 2527                                 *controlp = cm;
 2528                         else
 2529                                 m_freem(cm);
 2530                         if (controlp != NULL) {
 2531                                 while (*controlp != NULL)
 2532                                         controlp = &(*controlp)->m_next;
 2533                         }
 2534                         cm = cmn;
 2535                 }
 2536         }
 2537         KASSERT(m == NULL || m->m_type == MT_DATA,
 2538             ("soreceive_dgram: !data"));
 2539         while (m != NULL && uio->uio_resid > 0) {
 2540                 len = uio->uio_resid;
 2541                 if (len > m->m_len)
 2542                         len = m->m_len;
 2543                 error = uiomove(mtod(m, char *), (int)len, uio);
 2544                 if (error) {
 2545                         m_freem(m);
 2546                         return (error);
 2547                 }
 2548                 if (len == m->m_len)
 2549                         m = m_free(m);
 2550                 else {
 2551                         m->m_data += len;
 2552                         m->m_len -= len;
 2553                 }
 2554         }
 2555         if (m != NULL) {
 2556                 flags |= MSG_TRUNC;
 2557                 m_freem(m);
 2558         }
 2559         if (flagsp != NULL)
 2560                 *flagsp |= flags;
 2561         return (0);
 2562 }
 2563 
 2564 int
 2565 soreceive(struct socket *so, struct sockaddr **psa, struct uio *uio,
 2566     struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 2567 {
 2568         int error;
 2569 
 2570         CURVNET_SET(so->so_vnet);
 2571         if (!SOLISTENING(so))
 2572                 error = (so->so_proto->pr_usrreqs->pru_soreceive(so, psa, uio,
 2573                     mp0, controlp, flagsp));
 2574         else
 2575                 error = ENOTCONN;
 2576         CURVNET_RESTORE();
 2577         return (error);
 2578 }
 2579 
 2580 int
 2581 soshutdown(struct socket *so, int how)
 2582 {
 2583         struct protosw *pr = so->so_proto;
 2584         int error, soerror_enotconn;
 2585 
 2586         if (!(how == SHUT_RD || how == SHUT_WR || how == SHUT_RDWR))
 2587                 return (EINVAL);
 2588 
 2589         soerror_enotconn = 0;
 2590         if ((so->so_state &
 2591             (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING)) == 0) {
 2592                 /*
 2593                  * POSIX mandates us to return ENOTCONN when shutdown(2) is
 2594                  * invoked on a datagram sockets, however historically we would
 2595                  * actually tear socket down. This is known to be leveraged by
 2596                  * some applications to unblock process waiting in recvXXX(2)
 2597                  * by other process that it shares that socket with. Try to meet
 2598                  * both backward-compatibility and POSIX requirements by forcing
 2599                  * ENOTCONN but still asking protocol to perform pru_shutdown().
 2600                  */
 2601                 if (so->so_type != SOCK_DGRAM && !SOLISTENING(so))
 2602                         return (ENOTCONN);
 2603                 soerror_enotconn = 1;
 2604         }
 2605 
 2606         if (SOLISTENING(so)) {
 2607                 if (how != SHUT_WR) {
 2608                         SOLISTEN_LOCK(so);
 2609                         so->so_error = ECONNABORTED;
 2610                         solisten_wakeup(so);    /* unlocks so */
 2611                 }
 2612                 goto done;
 2613         }
 2614 
 2615         CURVNET_SET(so->so_vnet);
 2616         if (pr->pr_usrreqs->pru_flush != NULL)
 2617                 (*pr->pr_usrreqs->pru_flush)(so, how);
 2618         if (how != SHUT_WR)
 2619                 sorflush(so);
 2620         if (how != SHUT_RD) {
 2621                 error = (*pr->pr_usrreqs->pru_shutdown)(so);
 2622                 wakeup(&so->so_timeo);
 2623                 CURVNET_RESTORE();
 2624                 return ((error == 0 && soerror_enotconn) ? ENOTCONN : error);
 2625         }
 2626         wakeup(&so->so_timeo);
 2627         CURVNET_RESTORE();
 2628 
 2629 done:
 2630         return (soerror_enotconn ? ENOTCONN : 0);
 2631 }
 2632 
 2633 void
 2634 sorflush(struct socket *so)
 2635 {
 2636         struct sockbuf *sb = &so->so_rcv;
 2637         struct protosw *pr = so->so_proto;
 2638         struct socket aso;
 2639 
 2640         VNET_SO_ASSERT(so);
 2641 
 2642         /*
 2643          * In order to avoid calling dom_dispose with the socket buffer mutex
 2644          * held, and in order to generally avoid holding the lock for a long
 2645          * time, we make a copy of the socket buffer and clear the original
 2646          * (except locks, state).  The new socket buffer copy won't have
 2647          * initialized locks so we can only call routines that won't use or
 2648          * assert those locks.
 2649          *
 2650          * Dislodge threads currently blocked in receive and wait to acquire
 2651          * a lock against other simultaneous readers before clearing the
 2652          * socket buffer.  Don't let our acquire be interrupted by a signal
 2653          * despite any existing socket disposition on interruptable waiting.
 2654          */
 2655         socantrcvmore(so);
 2656         (void) sblock(sb, SBL_WAIT | SBL_NOINTR);
 2657 
 2658         /*
 2659          * Invalidate/clear most of the sockbuf structure, but leave selinfo
 2660          * and mutex data unchanged.
 2661          */
 2662         SOCKBUF_LOCK(sb);
 2663         bzero(&aso, sizeof(aso));
 2664         aso.so_pcb = so->so_pcb;
 2665         bcopy(&sb->sb_startzero, &aso.so_rcv.sb_startzero,
 2666             sizeof(*sb) - offsetof(struct sockbuf, sb_startzero));
 2667         bzero(&sb->sb_startzero,
 2668             sizeof(*sb) - offsetof(struct sockbuf, sb_startzero));
 2669         SOCKBUF_UNLOCK(sb);
 2670         sbunlock(sb);
 2671 
 2672         /*
 2673          * Dispose of special rights and flush the copied socket.  Don't call
 2674          * any unsafe routines (that rely on locks being initialized) on aso.
 2675          */
 2676         if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL)
 2677                 (*pr->pr_domain->dom_dispose)(&aso);
 2678         sbrelease_internal(&aso.so_rcv, so);
 2679 }
 2680 
 2681 /*
 2682  * Wrapper for Socket established helper hook.
 2683  * Parameters: socket, context of the hook point, hook id.
 2684  */
 2685 static int inline
 2686 hhook_run_socket(struct socket *so, void *hctx, int32_t h_id)
 2687 {
 2688         struct socket_hhook_data hhook_data = {
 2689                 .so = so,
 2690                 .hctx = hctx,
 2691                 .m = NULL,
 2692                 .status = 0
 2693         };
 2694 
 2695         CURVNET_SET(so->so_vnet);
 2696         HHOOKS_RUN_IF(V_socket_hhh[h_id], &hhook_data, &so->osd);
 2697         CURVNET_RESTORE();
 2698 
 2699         /* Ugly but needed, since hhooks return void for now */
 2700         return (hhook_data.status);
 2701 }
 2702 
 2703 /*
 2704  * Perhaps this routine, and sooptcopyout(), below, ought to come in an
 2705  * additional variant to handle the case where the option value needs to be
 2706  * some kind of integer, but not a specific size.  In addition to their use
 2707  * here, these functions are also called by the protocol-level pr_ctloutput()
 2708  * routines.
 2709  */
 2710 int
 2711 sooptcopyin(struct sockopt *sopt, void *buf, size_t len, size_t minlen)
 2712 {
 2713         size_t  valsize;
 2714 
 2715         /*
 2716          * If the user gives us more than we wanted, we ignore it, but if we
 2717          * don't get the minimum length the caller wants, we return EINVAL.
 2718          * On success, sopt->sopt_valsize is set to however much we actually
 2719          * retrieved.
 2720          */
 2721         if ((valsize = sopt->sopt_valsize) < minlen)
 2722                 return EINVAL;
 2723         if (valsize > len)
 2724                 sopt->sopt_valsize = valsize = len;
 2725 
 2726         if (sopt->sopt_td != NULL)
 2727                 return (copyin(sopt->sopt_val, buf, valsize));
 2728 
 2729         bcopy(sopt->sopt_val, buf, valsize);
 2730         return (0);
 2731 }
 2732 
 2733 /*
 2734  * Kernel version of setsockopt(2).
 2735  *
 2736  * XXX: optlen is size_t, not socklen_t
 2737  */
 2738 int
 2739 so_setsockopt(struct socket *so, int level, int optname, void *optval,
 2740     size_t optlen)
 2741 {
 2742         struct sockopt sopt;
 2743 
 2744         sopt.sopt_level = level;
 2745         sopt.sopt_name = optname;
 2746         sopt.sopt_dir = SOPT_SET;
 2747         sopt.sopt_val = optval;
 2748         sopt.sopt_valsize = optlen;
 2749         sopt.sopt_td = NULL;
 2750         return (sosetopt(so, &sopt));
 2751 }
 2752 
 2753 int
 2754 sosetopt(struct socket *so, struct sockopt *sopt)
 2755 {
 2756         int     error, optval;
 2757         struct  linger l;
 2758         struct  timeval tv;
 2759         sbintime_t val;
 2760         uint32_t val32;
 2761 #ifdef MAC
 2762         struct mac extmac;
 2763 #endif
 2764 
 2765         CURVNET_SET(so->so_vnet);
 2766         error = 0;
 2767         if (sopt->sopt_level != SOL_SOCKET) {
 2768                 if (so->so_proto->pr_ctloutput != NULL) {
 2769                         error = (*so->so_proto->pr_ctloutput)(so, sopt);
 2770                         CURVNET_RESTORE();
 2771                         return (error);
 2772                 }
 2773                 error = ENOPROTOOPT;
 2774         } else {
 2775                 switch (sopt->sopt_name) {
 2776                 case SO_ACCEPTFILTER:
 2777                         error = accept_filt_setopt(so, sopt);
 2778                         if (error)
 2779                                 goto bad;
 2780                         break;
 2781 
 2782                 case SO_LINGER:
 2783                         error = sooptcopyin(sopt, &l, sizeof l, sizeof l);
 2784                         if (error)
 2785                                 goto bad;
 2786                         if (l.l_linger < 0 ||
 2787                             l.l_linger > USHRT_MAX ||
 2788                             l.l_linger > (INT_MAX / hz)) {
 2789                                 error = EDOM;
 2790                                 goto bad;
 2791                         }
 2792                         SOCK_LOCK(so);
 2793                         so->so_linger = l.l_linger;
 2794                         if (l.l_onoff)
 2795                                 so->so_options |= SO_LINGER;
 2796                         else
 2797                                 so->so_options &= ~SO_LINGER;
 2798                         SOCK_UNLOCK(so);
 2799                         break;
 2800 
 2801                 case SO_DEBUG:
 2802                 case SO_KEEPALIVE:
 2803                 case SO_DONTROUTE:
 2804                 case SO_USELOOPBACK:
 2805                 case SO_BROADCAST:
 2806                 case SO_REUSEADDR:
 2807                 case SO_REUSEPORT:
 2808                 case SO_REUSEPORT_LB:
 2809                 case SO_OOBINLINE:
 2810                 case SO_TIMESTAMP:
 2811                 case SO_BINTIME:
 2812                 case SO_NOSIGPIPE:
 2813                 case SO_NO_DDP:
 2814                 case SO_NO_OFFLOAD:
 2815                 case SO_RERROR:
 2816                         error = sooptcopyin(sopt, &optval, sizeof optval,
 2817                             sizeof optval);
 2818                         if (error)
 2819                                 goto bad;
 2820                         SOCK_LOCK(so);
 2821                         if (optval)
 2822                                 so->so_options |= sopt->sopt_name;
 2823                         else
 2824                                 so->so_options &= ~sopt->sopt_name;
 2825                         SOCK_UNLOCK(so);
 2826                         break;
 2827 
 2828                 case SO_SETFIB:
 2829                         error = sooptcopyin(sopt, &optval, sizeof optval,
 2830                             sizeof optval);
 2831                         if (error)
 2832                                 goto bad;
 2833 
 2834                         if (optval < 0 || optval >= rt_numfibs) {
 2835                                 error = EINVAL;
 2836                                 goto bad;
 2837                         }
 2838                         if (((so->so_proto->pr_domain->dom_family == PF_INET) ||
 2839                            (so->so_proto->pr_domain->dom_family == PF_INET6) ||
 2840                            (so->so_proto->pr_domain->dom_family == PF_ROUTE)))
 2841                                 so->so_fibnum = optval;
 2842                         else
 2843                                 so->so_fibnum = 0;
 2844                         break;
 2845 
 2846                 case SO_USER_COOKIE:
 2847                         error = sooptcopyin(sopt, &val32, sizeof val32,
 2848                             sizeof val32);
 2849                         if (error)
 2850                                 goto bad;
 2851                         so->so_user_cookie = val32;
 2852                         break;
 2853 
 2854                 case SO_SNDBUF:
 2855                 case SO_RCVBUF:
 2856                 case SO_SNDLOWAT:
 2857                 case SO_RCVLOWAT:
 2858                         error = sooptcopyin(sopt, &optval, sizeof optval,
 2859                             sizeof optval);
 2860                         if (error)
 2861                                 goto bad;
 2862 
 2863                         /*
 2864                          * Values < 1 make no sense for any of these options,
 2865                          * so disallow them.
 2866                          */
 2867                         if (optval < 1) {
 2868                                 error = EINVAL;
 2869                                 goto bad;
 2870                         }
 2871 
 2872                         error = sbsetopt(so, sopt->sopt_name, optval);
 2873                         break;
 2874 
 2875                 case SO_SNDTIMEO:
 2876                 case SO_RCVTIMEO:
 2877 #ifdef COMPAT_FREEBSD32
 2878                         if (SV_CURPROC_FLAG(SV_ILP32)) {
 2879                                 struct timeval32 tv32;
 2880 
 2881                                 error = sooptcopyin(sopt, &tv32, sizeof tv32,
 2882                                     sizeof tv32);
 2883                                 CP(tv32, tv, tv_sec);
 2884                                 CP(tv32, tv, tv_usec);
 2885                         } else
 2886 #endif
 2887                                 error = sooptcopyin(sopt, &tv, sizeof tv,
 2888                                     sizeof tv);
 2889                         if (error)
 2890                                 goto bad;
 2891                         if (tv.tv_sec < 0 || tv.tv_usec < 0 ||
 2892                             tv.tv_usec >= 1000000) {
 2893                                 error = EDOM;
 2894                                 goto bad;
 2895                         }
 2896                         if (tv.tv_sec > INT32_MAX)
 2897                                 val = SBT_MAX;
 2898                         else
 2899                                 val = tvtosbt(tv);
 2900                         switch (sopt->sopt_name) {
 2901                         case SO_SNDTIMEO:
 2902                                 so->so_snd.sb_timeo = val;
 2903                                 break;
 2904                         case SO_RCVTIMEO:
 2905                                 so->so_rcv.sb_timeo = val;
 2906                                 break;
 2907                         }
 2908                         break;
 2909 
 2910                 case SO_LABEL:
 2911 #ifdef MAC
 2912                         error = sooptcopyin(sopt, &extmac, sizeof extmac,
 2913                             sizeof extmac);
 2914                         if (error)
 2915                                 goto bad;
 2916                         error = mac_setsockopt_label(sopt->sopt_td->td_ucred,
 2917                             so, &extmac);
 2918 #else
 2919                         error = EOPNOTSUPP;
 2920 #endif
 2921                         break;
 2922 
 2923                 case SO_TS_CLOCK:
 2924                         error = sooptcopyin(sopt, &optval, sizeof optval,
 2925                             sizeof optval);
 2926                         if (error)
 2927                                 goto bad;
 2928                         if (optval < 0 || optval > SO_TS_CLOCK_MAX) {
 2929                                 error = EINVAL;
 2930                                 goto bad;
 2931                         }
 2932                         so->so_ts_clock = optval;
 2933                         break;
 2934 
 2935                 case SO_MAX_PACING_RATE:
 2936                         error = sooptcopyin(sopt, &val32, sizeof(val32),
 2937                             sizeof(val32));
 2938                         if (error)
 2939                                 goto bad;
 2940                         so->so_max_pacing_rate = val32;
 2941                         break;
 2942 
 2943                 default:
 2944                         if (V_socket_hhh[HHOOK_SOCKET_OPT]->hhh_nhooks > 0)
 2945                                 error = hhook_run_socket(so, sopt,
 2946                                     HHOOK_SOCKET_OPT);
 2947                         else
 2948                                 error = ENOPROTOOPT;
 2949                         break;
 2950                 }
 2951                 if (error == 0 && so->so_proto->pr_ctloutput != NULL)
 2952                         (void)(*so->so_proto->pr_ctloutput)(so, sopt);
 2953         }
 2954 bad:
 2955         CURVNET_RESTORE();
 2956         return (error);
 2957 }
 2958 
 2959 /*
 2960  * Helper routine for getsockopt.
 2961  */
 2962 int
 2963 sooptcopyout(struct sockopt *sopt, const void *buf, size_t len)
 2964 {
 2965         int     error;
 2966         size_t  valsize;
 2967 
 2968         error = 0;
 2969 
 2970         /*
 2971          * Documented get behavior is that we always return a value, possibly
 2972          * truncated to fit in the user's buffer.  Traditional behavior is
 2973          * that we always tell the user precisely how much we copied, rather
 2974          * than something useful like the total amount we had available for
 2975          * her.  Note that this interface is not idempotent; the entire
 2976          * answer must be generated ahead of time.
 2977          */
 2978         valsize = min(len, sopt->sopt_valsize);
 2979         sopt->sopt_valsize = valsize;
 2980         if (sopt->sopt_val != NULL) {
 2981                 if (sopt->sopt_td != NULL)
 2982                         error = copyout(buf, sopt->sopt_val, valsize);
 2983                 else
 2984                         bcopy(buf, sopt->sopt_val, valsize);
 2985         }
 2986         return (error);
 2987 }
 2988 
 2989 int
 2990 sogetopt(struct socket *so, struct sockopt *sopt)
 2991 {
 2992         int     error, optval;
 2993         struct  linger l;
 2994         struct  timeval tv;
 2995 #ifdef MAC
 2996         struct mac extmac;
 2997 #endif
 2998 
 2999         CURVNET_SET(so->so_vnet);
 3000         error = 0;
 3001         if (sopt->sopt_level != SOL_SOCKET) {
 3002                 if (so->so_proto->pr_ctloutput != NULL)
 3003                         error = (*so->so_proto->pr_ctloutput)(so, sopt);
 3004                 else
 3005                         error = ENOPROTOOPT;
 3006                 CURVNET_RESTORE();
 3007                 return (error);
 3008         } else {
 3009                 switch (sopt->sopt_name) {
 3010                 case SO_ACCEPTFILTER:
 3011                         error = accept_filt_getopt(so, sopt);
 3012                         break;
 3013 
 3014                 case SO_LINGER:
 3015                         SOCK_LOCK(so);
 3016                         l.l_onoff = so->so_options & SO_LINGER;
 3017                         l.l_linger = so->so_linger;
 3018                         SOCK_UNLOCK(so);
 3019                         error = sooptcopyout(sopt, &l, sizeof l);
 3020                         break;
 3021 
 3022                 case SO_USELOOPBACK:
 3023                 case SO_DONTROUTE:
 3024                 case SO_DEBUG:
 3025                 case SO_KEEPALIVE:
 3026                 case SO_REUSEADDR:
 3027                 case SO_REUSEPORT:
 3028                 case SO_REUSEPORT_LB:
 3029                 case SO_BROADCAST:
 3030                 case SO_OOBINLINE:
 3031                 case SO_ACCEPTCONN:
 3032                 case SO_TIMESTAMP:
 3033                 case SO_BINTIME:
 3034                 case SO_NOSIGPIPE:
 3035                 case SO_NO_DDP:
 3036                 case SO_NO_OFFLOAD:
 3037                 case SO_RERROR:
 3038                         optval = so->so_options & sopt->sopt_name;
 3039 integer:
 3040                         error = sooptcopyout(sopt, &optval, sizeof optval);
 3041                         break;
 3042 
 3043                 case SO_DOMAIN:
 3044                         optval = so->so_proto->pr_domain->dom_family;
 3045                         goto integer;
 3046 
 3047                 case SO_TYPE:
 3048                         optval = so->so_type;
 3049                         goto integer;
 3050 
 3051                 case SO_PROTOCOL:
 3052                         optval = so->so_proto->pr_protocol;
 3053                         goto integer;
 3054 
 3055                 case SO_ERROR:
 3056                         SOCK_LOCK(so);
 3057                         if (so->so_error) {
 3058                                 optval = so->so_error;
 3059                                 so->so_error = 0;
 3060                         } else {
 3061                                 optval = so->so_rerror;
 3062                                 so->so_rerror = 0;
 3063                         }
 3064                         SOCK_UNLOCK(so);
 3065                         goto integer;
 3066 
 3067                 case SO_SNDBUF:
 3068                         optval = SOLISTENING(so) ? so->sol_sbsnd_hiwat :
 3069                             so->so_snd.sb_hiwat;
 3070                         goto integer;
 3071 
 3072                 case SO_RCVBUF:
 3073                         optval = SOLISTENING(so) ? so->sol_sbrcv_hiwat :
 3074                             so->so_rcv.sb_hiwat;
 3075                         goto integer;
 3076 
 3077                 case SO_SNDLOWAT:
 3078                         optval = SOLISTENING(so) ? so->sol_sbsnd_lowat :
 3079                             so->so_snd.sb_lowat;
 3080                         goto integer;
 3081 
 3082                 case SO_RCVLOWAT:
 3083                         optval = SOLISTENING(so) ? so->sol_sbrcv_lowat :
 3084                             so->so_rcv.sb_lowat;
 3085                         goto integer;
 3086 
 3087                 case SO_SNDTIMEO:
 3088                 case SO_RCVTIMEO:
 3089                         tv = sbttotv(sopt->sopt_name == SO_SNDTIMEO ?
 3090                             so->so_snd.sb_timeo : so->so_rcv.sb_timeo);
 3091 #ifdef COMPAT_FREEBSD32
 3092                         if (SV_CURPROC_FLAG(SV_ILP32)) {
 3093                                 struct timeval32 tv32;
 3094 
 3095                                 CP(tv, tv32, tv_sec);
 3096                                 CP(tv, tv32, tv_usec);
 3097                                 error = sooptcopyout(sopt, &tv32, sizeof tv32);
 3098                         } else
 3099 #endif
 3100                                 error = sooptcopyout(sopt, &tv, sizeof tv);
 3101                         break;
 3102 
 3103                 case SO_LABEL:
 3104 #ifdef MAC
 3105                         error = sooptcopyin(sopt, &extmac, sizeof(extmac),
 3106                             sizeof(extmac));
 3107                         if (error)
 3108                                 goto bad;
 3109                         error = mac_getsockopt_label(sopt->sopt_td->td_ucred,
 3110                             so, &extmac);
 3111                         if (error)
 3112                                 goto bad;
 3113                         error = sooptcopyout(sopt, &extmac, sizeof extmac);
 3114 #else
 3115                         error = EOPNOTSUPP;
 3116 #endif
 3117                         break;
 3118 
 3119                 case SO_PEERLABEL:
 3120 #ifdef MAC
 3121                         error = sooptcopyin(sopt, &extmac, sizeof(extmac),
 3122                             sizeof(extmac));
 3123                         if (error)
 3124                                 goto bad;
 3125                         error = mac_getsockopt_peerlabel(
 3126                             sopt->sopt_td->td_ucred, so, &extmac);
 3127                         if (error)
 3128                                 goto bad;
 3129                         error = sooptcopyout(sopt, &extmac, sizeof extmac);
 3130 #else
 3131                         error = EOPNOTSUPP;
 3132 #endif
 3133                         break;
 3134 
 3135                 case SO_LISTENQLIMIT:
 3136                         optval = SOLISTENING(so) ? so->sol_qlimit : 0;
 3137                         goto integer;
 3138 
 3139                 case SO_LISTENQLEN:
 3140                         optval = SOLISTENING(so) ? so->sol_qlen : 0;
 3141                         goto integer;
 3142 
 3143                 case SO_LISTENINCQLEN:
 3144                         optval = SOLISTENING(so) ? so->sol_incqlen : 0;
 3145                         goto integer;
 3146 
 3147                 case SO_TS_CLOCK:
 3148                         optval = so->so_ts_clock;
 3149                         goto integer;
 3150 
 3151                 case SO_MAX_PACING_RATE:
 3152                         optval = so->so_max_pacing_rate;
 3153                         goto integer;
 3154 
 3155                 default:
 3156                         if (V_socket_hhh[HHOOK_SOCKET_OPT]->hhh_nhooks > 0)
 3157                                 error = hhook_run_socket(so, sopt,
 3158                                     HHOOK_SOCKET_OPT);
 3159                         else
 3160                                 error = ENOPROTOOPT;
 3161                         break;
 3162                 }
 3163         }
 3164 #ifdef MAC
 3165 bad:
 3166 #endif
 3167         CURVNET_RESTORE();
 3168         return (error);
 3169 }
 3170 
 3171 int
 3172 soopt_getm(struct sockopt *sopt, struct mbuf **mp)
 3173 {
 3174         struct mbuf *m, *m_prev;
 3175         int sopt_size = sopt->sopt_valsize;
 3176 
 3177         MGET(m, sopt->sopt_td ? M_WAITOK : M_NOWAIT, MT_DATA);
 3178         if (m == NULL)
 3179                 return ENOBUFS;
 3180         if (sopt_size > MLEN) {
 3181                 MCLGET(m, sopt->sopt_td ? M_WAITOK : M_NOWAIT);
 3182                 if ((m->m_flags & M_EXT) == 0) {
 3183                         m_free(m);
 3184                         return ENOBUFS;
 3185                 }
 3186                 m->m_len = min(MCLBYTES, sopt_size);
 3187         } else {
 3188                 m->m_len = min(MLEN, sopt_size);
 3189         }
 3190         sopt_size -= m->m_len;
 3191         *mp = m;
 3192         m_prev = m;
 3193 
 3194         while (sopt_size) {
 3195                 MGET(m, sopt->sopt_td ? M_WAITOK : M_NOWAIT, MT_DATA);
 3196                 if (m == NULL) {
 3197                         m_freem(*mp);
 3198                         return ENOBUFS;
 3199                 }
 3200                 if (sopt_size > MLEN) {
 3201                         MCLGET(m, sopt->sopt_td != NULL ? M_WAITOK :
 3202                             M_NOWAIT);
 3203                         if ((m->m_flags & M_EXT) == 0) {
 3204                                 m_freem(m);
 3205                                 m_freem(*mp);
 3206                                 return ENOBUFS;
 3207                         }
 3208                         m->m_len = min(MCLBYTES, sopt_size);
 3209                 } else {
 3210                         m->m_len = min(MLEN, sopt_size);
 3211                 }
 3212                 sopt_size -= m->m_len;
 3213                 m_prev->m_next = m;
 3214                 m_prev = m;
 3215         }
 3216         return (0);
 3217 }
 3218 
 3219 int
 3220 soopt_mcopyin(struct sockopt *sopt, struct mbuf *m)
 3221 {
 3222         struct mbuf *m0 = m;
 3223 
 3224         if (sopt->sopt_val == NULL)
 3225                 return (0);
 3226         while (m != NULL && sopt->sopt_valsize >= m->m_len) {
 3227                 if (sopt->sopt_td != NULL) {
 3228                         int error;
 3229 
 3230                         error = copyin(sopt->sopt_val, mtod(m, char *),
 3231                             m->m_len);
 3232                         if (error != 0) {
 3233                                 m_freem(m0);
 3234                                 return(error);
 3235                         }
 3236                 } else
 3237                         bcopy(sopt->sopt_val, mtod(m, char *), m->m_len);
 3238                 sopt->sopt_valsize -= m->m_len;
 3239                 sopt->sopt_val = (char *)sopt->sopt_val + m->m_len;
 3240                 m = m->m_next;
 3241         }
 3242         if (m != NULL) /* should be allocated enoughly at ip6_sooptmcopyin() */
 3243                 panic("ip6_sooptmcopyin");
 3244         return (0);
 3245 }
 3246 
 3247 int
 3248 soopt_mcopyout(struct sockopt *sopt, struct mbuf *m)
 3249 {
 3250         struct mbuf *m0 = m;
 3251         size_t valsize = 0;
 3252 
 3253         if (sopt->sopt_val == NULL)
 3254                 return (0);
 3255         while (m != NULL && sopt->sopt_valsize >= m->m_len) {
 3256                 if (sopt->sopt_td != NULL) {
 3257                         int error;
 3258 
 3259                         error = copyout(mtod(m, char *), sopt->sopt_val,
 3260                             m->m_len);
 3261                         if (error != 0) {
 3262                                 m_freem(m0);
 3263                                 return(error);
 3264                         }
 3265                 } else
 3266                         bcopy(mtod(m, char *), sopt->sopt_val, m->m_len);
 3267                 sopt->sopt_valsize -= m->m_len;
 3268                 sopt->sopt_val = (char *)sopt->sopt_val + m->m_len;
 3269                 valsize += m->m_len;
 3270                 m = m->m_next;
 3271         }
 3272         if (m != NULL) {
 3273                 /* enough soopt buffer should be given from user-land */
 3274                 m_freem(m0);
 3275                 return(EINVAL);
 3276         }
 3277         sopt->sopt_valsize = valsize;
 3278         return (0);
 3279 }
 3280 
 3281 /*
 3282  * sohasoutofband(): protocol notifies socket layer of the arrival of new
 3283  * out-of-band data, which will then notify socket consumers.
 3284  */
 3285 void
 3286 sohasoutofband(struct socket *so)
 3287 {
 3288 
 3289         if (so->so_sigio != NULL)
 3290                 pgsigio(&so->so_sigio, SIGURG, 0);
 3291         selwakeuppri(&so->so_rdsel, PSOCK);
 3292 }
 3293 
 3294 int
 3295 sopoll(struct socket *so, int events, struct ucred *active_cred,
 3296     struct thread *td)
 3297 {
 3298 
 3299         /*
 3300          * We do not need to set or assert curvnet as long as everyone uses
 3301          * sopoll_generic().
 3302          */
 3303         return (so->so_proto->pr_usrreqs->pru_sopoll(so, events, active_cred,
 3304             td));
 3305 }
 3306 
 3307 int
 3308 sopoll_generic(struct socket *so, int events, struct ucred *active_cred,
 3309     struct thread *td)
 3310 {
 3311         int revents;
 3312 
 3313         SOCK_LOCK(so);
 3314         if (SOLISTENING(so)) {
 3315                 if (!(events & (POLLIN | POLLRDNORM)))
 3316                         revents = 0;
 3317                 else if (!TAILQ_EMPTY(&so->sol_comp))
 3318                         revents = events & (POLLIN | POLLRDNORM);
 3319                 else if ((events & POLLINIGNEOF) == 0 && so->so_error)
 3320                         revents = (events & (POLLIN | POLLRDNORM)) | POLLHUP;
 3321                 else {
 3322                         selrecord(td, &so->so_rdsel);
 3323                         revents = 0;
 3324                 }
 3325         } else {
 3326                 revents = 0;
 3327                 SOCKBUF_LOCK(&so->so_snd);
 3328                 SOCKBUF_LOCK(&so->so_rcv);
 3329                 if (events & (POLLIN | POLLRDNORM))
 3330                         if (soreadabledata(so))
 3331                                 revents |= events & (POLLIN | POLLRDNORM);
 3332                 if (events & (POLLOUT | POLLWRNORM))
 3333                         if (sowriteable(so))
 3334                                 revents |= events & (POLLOUT | POLLWRNORM);
 3335                 if (events & (POLLPRI | POLLRDBAND))
 3336                         if (so->so_oobmark ||
 3337                             (so->so_rcv.sb_state & SBS_RCVATMARK))
 3338                                 revents |= events & (POLLPRI | POLLRDBAND);
 3339                 if ((events & POLLINIGNEOF) == 0) {
 3340                         if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 3341                                 revents |= events & (POLLIN | POLLRDNORM);
 3342                                 if (so->so_snd.sb_state & SBS_CANTSENDMORE)
 3343                                         revents |= POLLHUP;
 3344                         }
 3345                 }
 3346                 if (revents == 0) {
 3347                         if (events &
 3348                             (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
 3349                                 selrecord(td, &so->so_rdsel);
 3350                                 so->so_rcv.sb_flags |= SB_SEL;
 3351                         }
 3352                         if (events & (POLLOUT | POLLWRNORM)) {
 3353                                 selrecord(td, &so->so_wrsel);
 3354                                 so->so_snd.sb_flags |= SB_SEL;
 3355                         }
 3356                 }
 3357                 SOCKBUF_UNLOCK(&so->so_rcv);
 3358                 SOCKBUF_UNLOCK(&so->so_snd);
 3359         }
 3360         SOCK_UNLOCK(so);
 3361         return (revents);
 3362 }
 3363 
 3364 int
 3365 soo_kqfilter(struct file *fp, struct knote *kn)
 3366 {
 3367         struct socket *so = kn->kn_fp->f_data;
 3368         struct sockbuf *sb;
 3369         struct knlist *knl;
 3370 
 3371         switch (kn->kn_filter) {
 3372         case EVFILT_READ:
 3373                 kn->kn_fop = &soread_filtops;
 3374                 knl = &so->so_rdsel.si_note;
 3375                 sb = &so->so_rcv;
 3376                 break;
 3377         case EVFILT_WRITE:
 3378                 kn->kn_fop = &sowrite_filtops;
 3379                 knl = &so->so_wrsel.si_note;
 3380                 sb = &so->so_snd;
 3381                 break;
 3382         case EVFILT_EMPTY:
 3383                 kn->kn_fop = &soempty_filtops;
 3384                 knl = &so->so_wrsel.si_note;
 3385                 sb = &so->so_snd;
 3386                 break;
 3387         default:
 3388                 return (EINVAL);
 3389         }
 3390 
 3391         SOCK_LOCK(so);
 3392         if (SOLISTENING(so)) {
 3393                 knlist_add(knl, kn, 1);
 3394         } else {
 3395                 SOCKBUF_LOCK(sb);
 3396                 knlist_add(knl, kn, 1);
 3397                 sb->sb_flags |= SB_KNOTE;
 3398                 SOCKBUF_UNLOCK(sb);
 3399         }
 3400         SOCK_UNLOCK(so);
 3401         return (0);
 3402 }
 3403 
 3404 /*
 3405  * Some routines that return EOPNOTSUPP for entry points that are not
 3406  * supported by a protocol.  Fill in as needed.
 3407  */
 3408 int
 3409 pru_accept_notsupp(struct socket *so, struct sockaddr **nam)
 3410 {
 3411 
 3412         return EOPNOTSUPP;
 3413 }
 3414 
 3415 int
 3416 pru_aio_queue_notsupp(struct socket *so, struct kaiocb *job)
 3417 {
 3418 
 3419         return EOPNOTSUPP;
 3420 }
 3421 
 3422 int
 3423 pru_attach_notsupp(struct socket *so, int proto, struct thread *td)
 3424 {
 3425 
 3426         return EOPNOTSUPP;
 3427 }
 3428 
 3429 int
 3430 pru_bind_notsupp(struct socket *so, struct sockaddr *nam, struct thread *td)
 3431 {
 3432 
 3433         return EOPNOTSUPP;
 3434 }
 3435 
 3436 int
 3437 pru_bindat_notsupp(int fd, struct socket *so, struct sockaddr *nam,
 3438     struct thread *td)
 3439 {
 3440 
 3441         return EOPNOTSUPP;
 3442 }
 3443 
 3444 int
 3445 pru_connect_notsupp(struct socket *so, struct sockaddr *nam, struct thread *td)
 3446 {
 3447 
 3448         return EOPNOTSUPP;
 3449 }
 3450 
 3451 int
 3452 pru_connectat_notsupp(int fd, struct socket *so, struct sockaddr *nam,
 3453     struct thread *td)
 3454 {
 3455 
 3456         return EOPNOTSUPP;
 3457 }
 3458 
 3459 int
 3460 pru_connect2_notsupp(struct socket *so1, struct socket *so2)
 3461 {
 3462 
 3463         return EOPNOTSUPP;
 3464 }
 3465 
 3466 int
 3467 pru_control_notsupp(struct socket *so, u_long cmd, caddr_t data,
 3468     struct ifnet *ifp, struct thread *td)
 3469 {
 3470 
 3471         return EOPNOTSUPP;
 3472 }
 3473 
 3474 int
 3475 pru_disconnect_notsupp(struct socket *so)
 3476 {
 3477 
 3478         return EOPNOTSUPP;
 3479 }
 3480 
 3481 int
 3482 pru_listen_notsupp(struct socket *so, int backlog, struct thread *td)
 3483 {
 3484 
 3485         return EOPNOTSUPP;
 3486 }
 3487 
 3488 int
 3489 pru_peeraddr_notsupp(struct socket *so, struct sockaddr **nam)
 3490 {
 3491 
 3492         return EOPNOTSUPP;
 3493 }
 3494 
 3495 int
 3496 pru_rcvd_notsupp(struct socket *so, int flags)
 3497 {
 3498 
 3499         return EOPNOTSUPP;
 3500 }
 3501 
 3502 int
 3503 pru_rcvoob_notsupp(struct socket *so, struct mbuf *m, int flags)
 3504 {
 3505 
 3506         return EOPNOTSUPP;
 3507 }
 3508 
 3509 int
 3510 pru_send_notsupp(struct socket *so, int flags, struct mbuf *m,
 3511     struct sockaddr *addr, struct mbuf *control, struct thread *td)
 3512 {
 3513 
 3514         return EOPNOTSUPP;
 3515 }
 3516 
 3517 int
 3518 pru_ready_notsupp(struct socket *so, struct mbuf *m, int count)
 3519 {
 3520 
 3521         return (EOPNOTSUPP);
 3522 }
 3523 
 3524 /*
 3525  * This isn't really a ``null'' operation, but it's the default one and
 3526  * doesn't do anything destructive.
 3527  */
 3528 int
 3529 pru_sense_null(struct socket *so, struct stat *sb)
 3530 {
 3531 
 3532         sb->st_blksize = so->so_snd.sb_hiwat;
 3533         return 0;
 3534 }
 3535 
 3536 int
 3537 pru_shutdown_notsupp(struct socket *so)
 3538 {
 3539 
 3540         return EOPNOTSUPP;
 3541 }
 3542 
 3543 int
 3544 pru_sockaddr_notsupp(struct socket *so, struct sockaddr **nam)
 3545 {
 3546 
 3547         return EOPNOTSUPP;
 3548 }
 3549 
 3550 int
 3551 pru_sosend_notsupp(struct socket *so, struct sockaddr *addr, struct uio *uio,
 3552     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 3553 {
 3554 
 3555         return EOPNOTSUPP;
 3556 }
 3557 
 3558 int
 3559 pru_soreceive_notsupp(struct socket *so, struct sockaddr **paddr,
 3560     struct uio *uio, struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 3561 {
 3562 
 3563         return EOPNOTSUPP;
 3564 }
 3565 
 3566 int
 3567 pru_sopoll_notsupp(struct socket *so, int events, struct ucred *cred,
 3568     struct thread *td)
 3569 {
 3570 
 3571         return EOPNOTSUPP;
 3572 }
 3573 
 3574 static void
 3575 filt_sordetach(struct knote *kn)
 3576 {
 3577         struct socket *so = kn->kn_fp->f_data;
 3578 
 3579         so_rdknl_lock(so);
 3580         knlist_remove(&so->so_rdsel.si_note, kn, 1);
 3581         if (!SOLISTENING(so) && knlist_empty(&so->so_rdsel.si_note))
 3582                 so->so_rcv.sb_flags &= ~SB_KNOTE;
 3583         so_rdknl_unlock(so);
 3584 }
 3585 
 3586 /*ARGSUSED*/
 3587 static int
 3588 filt_soread(struct knote *kn, long hint)
 3589 {
 3590         struct socket *so;
 3591 
 3592         so = kn->kn_fp->f_data;
 3593 
 3594         if (SOLISTENING(so)) {
 3595                 SOCK_LOCK_ASSERT(so);
 3596                 kn->kn_data = so->sol_qlen;
 3597                 if (so->so_error) {
 3598                         kn->kn_flags |= EV_EOF;
 3599                         kn->kn_fflags = so->so_error;
 3600                         return (1);
 3601                 }
 3602                 return (!TAILQ_EMPTY(&so->sol_comp));
 3603         }
 3604 
 3605         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 3606 
 3607         kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl;
 3608         if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 3609                 kn->kn_flags |= EV_EOF;
 3610                 kn->kn_fflags = so->so_error;
 3611                 return (1);
 3612         } else if (so->so_error || so->so_rerror)
 3613                 return (1);
 3614 
 3615         if (kn->kn_sfflags & NOTE_LOWAT) {
 3616                 if (kn->kn_data >= kn->kn_sdata)
 3617                         return (1);
 3618         } else if (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat)
 3619                 return (1);
 3620 
 3621         /* This hook returning non-zero indicates an event, not error */
 3622         return (hhook_run_socket(so, NULL, HHOOK_FILT_SOREAD));
 3623 }
 3624 
 3625 static void
 3626 filt_sowdetach(struct knote *kn)
 3627 {
 3628         struct socket *so = kn->kn_fp->f_data;
 3629 
 3630         so_wrknl_lock(so);
 3631         knlist_remove(&so->so_wrsel.si_note, kn, 1);
 3632         if (!SOLISTENING(so) && knlist_empty(&so->so_wrsel.si_note))
 3633                 so->so_snd.sb_flags &= ~SB_KNOTE;
 3634         so_wrknl_unlock(so);
 3635 }
 3636 
 3637 /*ARGSUSED*/
 3638 static int
 3639 filt_sowrite(struct knote *kn, long hint)
 3640 {
 3641         struct socket *so;
 3642 
 3643         so = kn->kn_fp->f_data;
 3644 
 3645         if (SOLISTENING(so))
 3646                 return (0);
 3647 
 3648         SOCKBUF_LOCK_ASSERT(&so->so_snd);
 3649         kn->kn_data = sbspace(&so->so_snd);
 3650 
 3651         hhook_run_socket(so, kn, HHOOK_FILT_SOWRITE);
 3652 
 3653         if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
 3654                 kn->kn_flags |= EV_EOF;
 3655                 kn->kn_fflags = so->so_error;
 3656                 return (1);
 3657         } else if (so->so_error)        /* temporary udp error */
 3658                 return (1);
 3659         else if (((so->so_state & SS_ISCONNECTED) == 0) &&
 3660             (so->so_proto->pr_flags & PR_CONNREQUIRED))
 3661                 return (0);
 3662         else if (kn->kn_sfflags & NOTE_LOWAT)
 3663                 return (kn->kn_data >= kn->kn_sdata);
 3664         else
 3665                 return (kn->kn_data >= so->so_snd.sb_lowat);
 3666 }
 3667 
 3668 static int
 3669 filt_soempty(struct knote *kn, long hint)
 3670 {
 3671         struct socket *so;
 3672 
 3673         so = kn->kn_fp->f_data;
 3674 
 3675         if (SOLISTENING(so))
 3676                 return (1);
 3677 
 3678         SOCKBUF_LOCK_ASSERT(&so->so_snd);
 3679         kn->kn_data = sbused(&so->so_snd);
 3680 
 3681         if (kn->kn_data == 0)
 3682                 return (1);
 3683         else
 3684                 return (0);
 3685 }
 3686 
 3687 int
 3688 socheckuid(struct socket *so, uid_t uid)
 3689 {
 3690 
 3691         if (so == NULL)
 3692                 return (EPERM);
 3693         if (so->so_cred->cr_uid != uid)
 3694                 return (EPERM);
 3695         return (0);
 3696 }
 3697 
 3698 /*
 3699  * These functions are used by protocols to notify the socket layer (and its
 3700  * consumers) of state changes in the sockets driven by protocol-side events.
 3701  */
 3702 
 3703 /*
 3704  * Procedures to manipulate state flags of socket and do appropriate wakeups.
 3705  *
 3706  * Normal sequence from the active (originating) side is that
 3707  * soisconnecting() is called during processing of connect() call, resulting
 3708  * in an eventual call to soisconnected() if/when the connection is
 3709  * established.  When the connection is torn down soisdisconnecting() is
 3710  * called during processing of disconnect() call, and soisdisconnected() is
 3711  * called when the connection to the peer is totally severed.  The semantics
 3712  * of these routines are such that connectionless protocols can call
 3713  * soisconnected() and soisdisconnected() only, bypassing the in-progress
 3714  * calls when setting up a ``connection'' takes no time.
 3715  *
 3716  * From the passive side, a socket is created with two queues of sockets:
 3717  * so_incomp for connections in progress and so_comp for connections already
 3718  * made and awaiting user acceptance.  As a protocol is preparing incoming
 3719  * connections, it creates a socket structure queued on so_incomp by calling
 3720  * sonewconn().  When the connection is established, soisconnected() is
 3721  * called, and transfers the socket structure to so_comp, making it available
 3722  * to accept().
 3723  *
 3724  * If a socket is closed with sockets on either so_incomp or so_comp, these
 3725  * sockets are dropped.
 3726  *
 3727  * If higher-level protocols are implemented in the kernel, the wakeups done
 3728  * here will sometimes cause software-interrupt process scheduling.
 3729  */
 3730 void
 3731 soisconnecting(struct socket *so)
 3732 {
 3733 
 3734         SOCK_LOCK(so);
 3735         so->so_state &= ~(SS_ISCONNECTED|SS_ISDISCONNECTING);
 3736         so->so_state |= SS_ISCONNECTING;
 3737         SOCK_UNLOCK(so);
 3738 }
 3739 
 3740 void
 3741 soisconnected(struct socket *so)
 3742 {
 3743 
 3744         SOCK_LOCK(so);
 3745         so->so_state &= ~(SS_ISCONNECTING|SS_ISDISCONNECTING|SS_ISCONFIRMING);
 3746         so->so_state |= SS_ISCONNECTED;
 3747 
 3748         if (so->so_qstate == SQ_INCOMP) {
 3749                 struct socket *head = so->so_listen;
 3750                 int ret;
 3751 
 3752                 KASSERT(head, ("%s: so %p on incomp of NULL", __func__, so));
 3753                 /*
 3754                  * Promoting a socket from incomplete queue to complete, we
 3755                  * need to go through reverse order of locking.  We first do
 3756                  * trylock, and if that doesn't succeed, we go the hard way
 3757                  * leaving a reference and rechecking consistency after proper
 3758                  * locking.
 3759                  */
 3760                 if (__predict_false(SOLISTEN_TRYLOCK(head) == 0)) {
 3761                         soref(head);
 3762                         SOCK_UNLOCK(so);
 3763                         SOLISTEN_LOCK(head);
 3764                         SOCK_LOCK(so);
 3765                         if (__predict_false(head != so->so_listen)) {
 3766                                 /*
 3767                                  * The socket went off the listen queue,
 3768                                  * should be lost race to close(2) of sol.
 3769                                  * The socket is about to soabort().
 3770                                  */
 3771                                 SOCK_UNLOCK(so);
 3772                                 sorele(head);
 3773                                 return;
 3774                         }
 3775                         /* Not the last one, as so holds a ref. */
 3776                         refcount_release(&head->so_count);
 3777                 }
 3778 again:
 3779                 if ((so->so_options & SO_ACCEPTFILTER) == 0) {
 3780                         TAILQ_REMOVE(&head->sol_incomp, so, so_list);
 3781                         head->sol_incqlen--;
 3782                         TAILQ_INSERT_TAIL(&head->sol_comp, so, so_list);
 3783                         head->sol_qlen++;
 3784                         so->so_qstate = SQ_COMP;
 3785                         SOCK_UNLOCK(so);
 3786                         solisten_wakeup(head);  /* unlocks */
 3787                 } else {
 3788                         SOCKBUF_LOCK(&so->so_rcv);
 3789                         soupcall_set(so, SO_RCV,
 3790                             head->sol_accept_filter->accf_callback,
 3791                             head->sol_accept_filter_arg);
 3792                         so->so_options &= ~SO_ACCEPTFILTER;
 3793                         ret = head->sol_accept_filter->accf_callback(so,
 3794                             head->sol_accept_filter_arg, M_NOWAIT);
 3795                         if (ret == SU_ISCONNECTED) {
 3796                                 soupcall_clear(so, SO_RCV);
 3797                                 SOCKBUF_UNLOCK(&so->so_rcv);
 3798                                 goto again;
 3799                         }
 3800                         SOCKBUF_UNLOCK(&so->so_rcv);
 3801                         SOCK_UNLOCK(so);
 3802                         SOLISTEN_UNLOCK(head);
 3803                 }
 3804                 return;
 3805         }
 3806         SOCK_UNLOCK(so);
 3807         wakeup(&so->so_timeo);
 3808         sorwakeup(so);
 3809         sowwakeup(so);
 3810 }
 3811 
 3812 void
 3813 soisdisconnecting(struct socket *so)
 3814 {
 3815 
 3816         SOCK_LOCK(so);
 3817         so->so_state &= ~SS_ISCONNECTING;
 3818         so->so_state |= SS_ISDISCONNECTING;
 3819 
 3820         if (!SOLISTENING(so)) {
 3821                 SOCKBUF_LOCK(&so->so_rcv);
 3822                 socantrcvmore_locked(so);
 3823                 SOCKBUF_LOCK(&so->so_snd);
 3824                 socantsendmore_locked(so);
 3825         }
 3826         SOCK_UNLOCK(so);
 3827         wakeup(&so->so_timeo);
 3828 }
 3829 
 3830 void
 3831 soisdisconnected(struct socket *so)
 3832 {
 3833 
 3834         SOCK_LOCK(so);
 3835 
 3836         /*
 3837          * There is at least one reader of so_state that does not
 3838          * acquire socket lock, namely soreceive_generic().  Ensure
 3839          * that it never sees all flags that track connection status
 3840          * cleared, by ordering the update with a barrier semantic of
 3841          * our release thread fence.
 3842          */
 3843         so->so_state |= SS_ISDISCONNECTED;
 3844         atomic_thread_fence_rel();
 3845         so->so_state &= ~(SS_ISCONNECTING|SS_ISCONNECTED|SS_ISDISCONNECTING);
 3846 
 3847         if (!SOLISTENING(so)) {
 3848                 SOCK_UNLOCK(so);
 3849                 SOCKBUF_LOCK(&so->so_rcv);
 3850                 socantrcvmore_locked(so);
 3851                 SOCKBUF_LOCK(&so->so_snd);
 3852                 sbdrop_locked(&so->so_snd, sbused(&so->so_snd));
 3853                 socantsendmore_locked(so);
 3854         } else
 3855                 SOCK_UNLOCK(so);
 3856         wakeup(&so->so_timeo);
 3857 }
 3858 
 3859 /*
 3860  * Make a copy of a sockaddr in a malloced buffer of type M_SONAME.
 3861  */
 3862 struct sockaddr *
 3863 sodupsockaddr(const struct sockaddr *sa, int mflags)
 3864 {
 3865         struct sockaddr *sa2;
 3866 
 3867         sa2 = malloc(sa->sa_len, M_SONAME, mflags);
 3868         if (sa2)
 3869                 bcopy(sa, sa2, sa->sa_len);
 3870         return sa2;
 3871 }
 3872 
 3873 /*
 3874  * Register per-socket destructor.
 3875  */
 3876 void
 3877 sodtor_set(struct socket *so, so_dtor_t *func)
 3878 {
 3879 
 3880         SOCK_LOCK_ASSERT(so);
 3881         so->so_dtor = func;
 3882 }
 3883 
 3884 /*
 3885  * Register per-socket buffer upcalls.
 3886  */
 3887 void
 3888 soupcall_set(struct socket *so, int which, so_upcall_t func, void *arg)
 3889 {
 3890         struct sockbuf *sb;
 3891 
 3892         KASSERT(!SOLISTENING(so), ("%s: so %p listening", __func__, so));
 3893 
 3894         switch (which) {
 3895         case SO_RCV:
 3896                 sb = &so->so_rcv;
 3897                 break;
 3898         case SO_SND:
 3899                 sb = &so->so_snd;
 3900                 break;
 3901         default:
 3902                 panic("soupcall_set: bad which");
 3903         }
 3904         SOCKBUF_LOCK_ASSERT(sb);
 3905         sb->sb_upcall = func;
 3906         sb->sb_upcallarg = arg;
 3907         sb->sb_flags |= SB_UPCALL;
 3908 }
 3909 
 3910 void
 3911 soupcall_clear(struct socket *so, int which)
 3912 {
 3913         struct sockbuf *sb;
 3914 
 3915         KASSERT(!SOLISTENING(so), ("%s: so %p listening", __func__, so));
 3916 
 3917         switch (which) {
 3918         case SO_RCV:
 3919                 sb = &so->so_rcv;
 3920                 break;
 3921         case SO_SND:
 3922                 sb = &so->so_snd;
 3923                 break;
 3924         default:
 3925                 panic("soupcall_clear: bad which");
 3926         }
 3927         SOCKBUF_LOCK_ASSERT(sb);
 3928         KASSERT(sb->sb_upcall != NULL,
 3929             ("%s: so %p no upcall to clear", __func__, so));
 3930         sb->sb_upcall = NULL;
 3931         sb->sb_upcallarg = NULL;
 3932         sb->sb_flags &= ~SB_UPCALL;
 3933 }
 3934 
 3935 void
 3936 solisten_upcall_set(struct socket *so, so_upcall_t func, void *arg)
 3937 {
 3938 
 3939         SOLISTEN_LOCK_ASSERT(so);
 3940         so->sol_upcall = func;
 3941         so->sol_upcallarg = arg;
 3942 }
 3943 
 3944 static void
 3945 so_rdknl_lock(void *arg)
 3946 {
 3947         struct socket *so = arg;
 3948 
 3949         if (SOLISTENING(so))
 3950                 SOCK_LOCK(so);
 3951         else
 3952                 SOCKBUF_LOCK(&so->so_rcv);
 3953 }
 3954 
 3955 static void
 3956 so_rdknl_unlock(void *arg)
 3957 {
 3958         struct socket *so = arg;
 3959 
 3960         if (SOLISTENING(so))
 3961                 SOCK_UNLOCK(so);
 3962         else
 3963                 SOCKBUF_UNLOCK(&so->so_rcv);
 3964 }
 3965 
 3966 static void
 3967 so_rdknl_assert_locked(void *arg)
 3968 {
 3969         struct socket *so = arg;
 3970 
 3971         if (SOLISTENING(so))
 3972                 SOCK_LOCK_ASSERT(so);
 3973         else
 3974                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 3975 }
 3976 
 3977 static void
 3978 so_rdknl_assert_unlocked(void *arg)
 3979 {
 3980         struct socket *so = arg;
 3981 
 3982         if (SOLISTENING(so))
 3983                 SOCK_UNLOCK_ASSERT(so);
 3984         else
 3985                 SOCKBUF_UNLOCK_ASSERT(&so->so_rcv);
 3986 }
 3987 
 3988 static void
 3989 so_wrknl_lock(void *arg)
 3990 {
 3991         struct socket *so = arg;
 3992 
 3993         if (SOLISTENING(so))
 3994                 SOCK_LOCK(so);
 3995         else
 3996                 SOCKBUF_LOCK(&so->so_snd);
 3997 }
 3998 
 3999 static void
 4000 so_wrknl_unlock(void *arg)
 4001 {
 4002         struct socket *so = arg;
 4003 
 4004         if (SOLISTENING(so))
 4005                 SOCK_UNLOCK(so);
 4006         else
 4007                 SOCKBUF_UNLOCK(&so->so_snd);
 4008 }
 4009 
 4010 static void
 4011 so_wrknl_assert_locked(void *arg)
 4012 {
 4013         struct socket *so = arg;
 4014 
 4015         if (SOLISTENING(so))
 4016                 SOCK_LOCK_ASSERT(so);
 4017         else
 4018                 SOCKBUF_LOCK_ASSERT(&so->so_snd);
 4019 }
 4020 
 4021 static void
 4022 so_wrknl_assert_unlocked(void *arg)
 4023 {
 4024         struct socket *so = arg;
 4025 
 4026         if (SOLISTENING(so))
 4027                 SOCK_UNLOCK_ASSERT(so);
 4028         else
 4029                 SOCKBUF_UNLOCK_ASSERT(&so->so_snd);
 4030 }
 4031 
 4032 /*
 4033  * Create an external-format (``xsocket'') structure using the information in
 4034  * the kernel-format socket structure pointed to by so.  This is done to
 4035  * reduce the spew of irrelevant information over this interface, to isolate
 4036  * user code from changes in the kernel structure, and potentially to provide
 4037  * information-hiding if we decide that some of this information should be
 4038  * hidden from users.
 4039  */
 4040 void
 4041 sotoxsocket(struct socket *so, struct xsocket *xso)
 4042 {
 4043 
 4044         bzero(xso, sizeof(*xso));
 4045         xso->xso_len = sizeof *xso;
 4046         xso->xso_so = (uintptr_t)so;
 4047         xso->so_type = so->so_type;
 4048         xso->so_options = so->so_options;
 4049         xso->so_linger = so->so_linger;
 4050         xso->so_state = so->so_state;
 4051         xso->so_pcb = (uintptr_t)so->so_pcb;
 4052         xso->xso_protocol = so->so_proto->pr_protocol;
 4053         xso->xso_family = so->so_proto->pr_domain->dom_family;
 4054         xso->so_timeo = so->so_timeo;
 4055         xso->so_error = so->so_error;
 4056         xso->so_uid = so->so_cred->cr_uid;
 4057         xso->so_pgid = so->so_sigio ? so->so_sigio->sio_pgid : 0;
 4058         if (SOLISTENING(so)) {
 4059                 xso->so_qlen = so->sol_qlen;
 4060                 xso->so_incqlen = so->sol_incqlen;
 4061                 xso->so_qlimit = so->sol_qlimit;
 4062                 xso->so_oobmark = 0;
 4063         } else {
 4064                 xso->so_state |= so->so_qstate;
 4065                 xso->so_qlen = xso->so_incqlen = xso->so_qlimit = 0;
 4066                 xso->so_oobmark = so->so_oobmark;
 4067                 sbtoxsockbuf(&so->so_snd, &xso->so_snd);
 4068                 sbtoxsockbuf(&so->so_rcv, &xso->so_rcv);
 4069         }
 4070 }
 4071 
 4072 struct sockbuf *
 4073 so_sockbuf_rcv(struct socket *so)
 4074 {
 4075 
 4076         return (&so->so_rcv);
 4077 }
 4078 
 4079 struct sockbuf *
 4080 so_sockbuf_snd(struct socket *so)
 4081 {
 4082 
 4083         return (&so->so_snd);
 4084 }
 4085 
 4086 int
 4087 so_state_get(const struct socket *so)
 4088 {
 4089 
 4090         return (so->so_state);
 4091 }
 4092 
 4093 void
 4094 so_state_set(struct socket *so, int val)
 4095 {
 4096 
 4097         so->so_state = val;
 4098 }
 4099 
 4100 int
 4101 so_options_get(const struct socket *so)
 4102 {
 4103 
 4104         return (so->so_options);
 4105 }
 4106 
 4107 void
 4108 so_options_set(struct socket *so, int val)
 4109 {
 4110 
 4111         so->so_options = val;
 4112 }
 4113 
 4114 int
 4115 so_error_get(const struct socket *so)
 4116 {
 4117 
 4118         return (so->so_error);
 4119 }
 4120 
 4121 void
 4122 so_error_set(struct socket *so, int val)
 4123 {
 4124 
 4125         so->so_error = val;
 4126 }
 4127 
 4128 int
 4129 so_linger_get(const struct socket *so)
 4130 {
 4131 
 4132         return (so->so_linger);
 4133 }
 4134 
 4135 void
 4136 so_linger_set(struct socket *so, int val)
 4137 {
 4138 
 4139         KASSERT(val >= 0 && val <= USHRT_MAX && val <= (INT_MAX / hz),
 4140             ("%s: val %d out of range", __func__, val));
 4141 
 4142         so->so_linger = val;
 4143 }
 4144 
 4145 struct protosw *
 4146 so_protosw_get(const struct socket *so)
 4147 {
 4148 
 4149         return (so->so_proto);
 4150 }
 4151 
 4152 void
 4153 so_protosw_set(struct socket *so, struct protosw *val)
 4154 {
 4155 
 4156         so->so_proto = val;
 4157 }
 4158 
 4159 void
 4160 so_sorwakeup(struct socket *so)
 4161 {
 4162 
 4163         sorwakeup(so);
 4164 }
 4165 
 4166 void
 4167 so_sowwakeup(struct socket *so)
 4168 {
 4169 
 4170         sowwakeup(so);
 4171 }
 4172 
 4173 void
 4174 so_sorwakeup_locked(struct socket *so)
 4175 {
 4176 
 4177         sorwakeup_locked(so);
 4178 }
 4179 
 4180 void
 4181 so_sowwakeup_locked(struct socket *so)
 4182 {
 4183 
 4184         sowwakeup_locked(so);
 4185 }
 4186 
 4187 void
 4188 so_lock(struct socket *so)
 4189 {
 4190 
 4191         SOCK_LOCK(so);
 4192 }
 4193 
 4194 void
 4195 so_unlock(struct socket *so)
 4196 {
 4197 
 4198         SOCK_UNLOCK(so);
 4199 }

Cache object: 0168ef339b17d630f046b2bd235ada88


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.