The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/kern/uipc_socket.c

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 /*-
    2  * Copyright (c) 1982, 1986, 1988, 1990, 1993
    3  *      The Regents of the University of California.
    4  * Copyright (c) 2004 The FreeBSD Foundation
    5  * Copyright (c) 2004-2007 Robert N. M. Watson
    6  * All rights reserved.
    7  *
    8  * Redistribution and use in source and binary forms, with or without
    9  * modification, are permitted provided that the following conditions
   10  * are met:
   11  * 1. Redistributions of source code must retain the above copyright
   12  *    notice, this list of conditions and the following disclaimer.
   13  * 2. Redistributions in binary form must reproduce the above copyright
   14  *    notice, this list of conditions and the following disclaimer in the
   15  *    documentation and/or other materials provided with the distribution.
   16  * 4. Neither the name of the University nor the names of its contributors
   17  *    may be used to endorse or promote products derived from this software
   18  *    without specific prior written permission.
   19  *
   20  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
   21  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   22  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   23  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
   24  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
   25  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
   26  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
   27  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
   28  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
   29  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   30  * SUCH DAMAGE.
   31  *
   32  *      @(#)uipc_socket.c       8.3 (Berkeley) 4/15/94
   33  */
   34 
   35 /*
   36  * Comments on the socket life cycle:
   37  *
   38  * soalloc() sets of socket layer state for a socket, called only by
   39  * socreate() and sonewconn().  Socket layer private.
   40  *
   41  * sodealloc() tears down socket layer state for a socket, called only by
   42  * sofree() and sonewconn().  Socket layer private.
   43  *
   44  * pru_attach() associates protocol layer state with an allocated socket;
   45  * called only once, may fail, aborting socket allocation.  This is called
   46  * from socreate() and sonewconn().  Socket layer private.
   47  *
   48  * pru_detach() disassociates protocol layer state from an attached socket,
   49  * and will be called exactly once for sockets in which pru_attach() has
   50  * been successfully called.  If pru_attach() returned an error,
   51  * pru_detach() will not be called.  Socket layer private.
   52  *
   53  * pru_abort() and pru_close() notify the protocol layer that the last
   54  * consumer of a socket is starting to tear down the socket, and that the
   55  * protocol should terminate the connection.  Historically, pru_abort() also
   56  * detached protocol state from the socket state, but this is no longer the
   57  * case.
   58  *
   59  * socreate() creates a socket and attaches protocol state.  This is a public
   60  * interface that may be used by socket layer consumers to create new
   61  * sockets.
   62  *
   63  * sonewconn() creates a socket and attaches protocol state.  This is a
   64  * public interface  that may be used by protocols to create new sockets when
   65  * a new connection is received and will be available for accept() on a
   66  * listen socket.
   67  *
   68  * soclose() destroys a socket after possibly waiting for it to disconnect.
   69  * This is a public interface that socket consumers should use to close and
   70  * release a socket when done with it.
   71  *
   72  * soabort() destroys a socket without waiting for it to disconnect (used
   73  * only for incoming connections that are already partially or fully
   74  * connected).  This is used internally by the socket layer when clearing
   75  * listen socket queues (due to overflow or close on the listen socket), but
   76  * is also a public interface protocols may use to abort connections in
   77  * their incomplete listen queues should they no longer be required.  Sockets
   78  * placed in completed connection listen queues should not be aborted for
   79  * reasons described in the comment above the soclose() implementation.  This
   80  * is not a general purpose close routine, and except in the specific
   81  * circumstances described here, should not be used.
   82  *
   83  * sofree() will free a socket and its protocol state if all references on
   84  * the socket have been released, and is the public interface to attempt to
   85  * free a socket when a reference is removed.  This is a socket layer private
   86  * interface.
   87  *
   88  * NOTE: In addition to socreate() and soclose(), which provide a single
   89  * socket reference to the consumer to be managed as required, there are two
   90  * calls to explicitly manage socket references, soref(), and sorele().
   91  * Currently, these are generally required only when transitioning a socket
   92  * from a listen queue to a file descriptor, in order to prevent garbage
   93  * collection of the socket at an untimely moment.  For a number of reasons,
   94  * these interfaces are not preferred, and should be avoided.
   95  */
   96 
   97 #include <sys/cdefs.h>
   98 __FBSDID("$FreeBSD$");
   99 
  100 #include "opt_inet.h"
  101 #include "opt_mac.h"
  102 #include "opt_zero.h"
  103 #include "opt_compat.h"
  104 
  105 #include <sys/param.h>
  106 #include <sys/systm.h>
  107 #include <sys/fcntl.h>
  108 #include <sys/limits.h>
  109 #include <sys/lock.h>
  110 #include <sys/mac.h>
  111 #include <sys/malloc.h>
  112 #include <sys/mbuf.h>
  113 #include <sys/mutex.h>
  114 #include <sys/domain.h>
  115 #include <sys/file.h>                   /* for struct knote */
  116 #include <sys/kernel.h>
  117 #include <sys/event.h>
  118 #include <sys/eventhandler.h>
  119 #include <sys/poll.h>
  120 #include <sys/proc.h>
  121 #include <sys/protosw.h>
  122 #include <sys/socket.h>
  123 #include <sys/socketvar.h>
  124 #include <sys/resourcevar.h>
  125 #include <sys/signalvar.h>
  126 #include <sys/stat.h>
  127 #include <sys/sx.h>
  128 #include <sys/sysctl.h>
  129 #include <sys/uio.h>
  130 #include <sys/jail.h>
  131 
  132 #include <security/mac/mac_framework.h>
  133 
  134 #include <vm/uma.h>
  135 
  136 #ifdef COMPAT_IA32
  137 #include <sys/mount.h>
  138 #include <compat/freebsd32/freebsd32.h>
  139 
  140 extern struct sysentvec ia32_freebsd_sysvec;
  141 #endif
  142 
  143 static int      soreceive_rcvoob(struct socket *so, struct uio *uio,
  144                     int flags);
  145 
  146 static void     filt_sordetach(struct knote *kn);
  147 static int      filt_soread(struct knote *kn, long hint);
  148 static void     filt_sowdetach(struct knote *kn);
  149 static int      filt_sowrite(struct knote *kn, long hint);
  150 static int      filt_solisten(struct knote *kn, long hint);
  151 
  152 static struct filterops solisten_filtops =
  153         { 1, NULL, filt_sordetach, filt_solisten };
  154 static struct filterops soread_filtops =
  155         { 1, NULL, filt_sordetach, filt_soread };
  156 static struct filterops sowrite_filtops =
  157         { 1, NULL, filt_sowdetach, filt_sowrite };
  158 
  159 uma_zone_t socket_zone;
  160 so_gen_t        so_gencnt;      /* generation count for sockets */
  161 
  162 int     maxsockets;
  163 
  164 MALLOC_DEFINE(M_SONAME, "soname", "socket name");
  165 MALLOC_DEFINE(M_PCB, "pcb", "protocol control block");
  166 
  167 static int somaxconn = SOMAXCONN;
  168 static int sysctl_somaxconn(SYSCTL_HANDLER_ARGS);
  169 /* XXX: we dont have SYSCTL_USHORT */
  170 SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
  171     0, sizeof(int), sysctl_somaxconn, "I", "Maximum pending socket connection "
  172     "queue size");
  173 static int numopensockets;
  174 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
  175     &numopensockets, 0, "Number of open sockets");
  176 #ifdef ZERO_COPY_SOCKETS
  177 /* These aren't static because they're used in other files. */
  178 int so_zero_copy_send = 1;
  179 int so_zero_copy_receive = 1;
  180 SYSCTL_NODE(_kern_ipc, OID_AUTO, zero_copy, CTLFLAG_RD, 0,
  181     "Zero copy controls");
  182 SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, receive, CTLFLAG_RW,
  183     &so_zero_copy_receive, 0, "Enable zero copy receive");
  184 SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, send, CTLFLAG_RW,
  185     &so_zero_copy_send, 0, "Enable zero copy send");
  186 #endif /* ZERO_COPY_SOCKETS */
  187 
  188 /*
  189  * accept_mtx locks down per-socket fields relating to accept queues.  See
  190  * socketvar.h for an annotation of the protected fields of struct socket.
  191  */
  192 struct mtx accept_mtx;
  193 MTX_SYSINIT(accept_mtx, &accept_mtx, "accept", MTX_DEF);
  194 
  195 /*
  196  * so_global_mtx protects so_gencnt, numopensockets, and the per-socket
  197  * so_gencnt field.
  198  */
  199 static struct mtx so_global_mtx;
  200 MTX_SYSINIT(so_global_mtx, &so_global_mtx, "so_glabel", MTX_DEF);
  201 
  202 /*
  203  * General IPC sysctl name space, used by sockets and a variety of other IPC
  204  * types.
  205  */
  206 SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, "IPC");
  207 
  208 /*
  209  * Sysctl to get and set the maximum global sockets limit.  Notify protocols
  210  * of the change so that they can update their dependent limits as required.
  211  */
  212 static int
  213 sysctl_maxsockets(SYSCTL_HANDLER_ARGS)
  214 {
  215         int error, newmaxsockets;
  216 
  217         newmaxsockets = maxsockets;
  218         error = sysctl_handle_int(oidp, &newmaxsockets, 0, req);
  219         if (error == 0 && req->newptr) {
  220                 if (newmaxsockets > maxsockets) {
  221                         maxsockets = newmaxsockets;
  222                         if (maxsockets > ((maxfiles / 4) * 3)) {
  223                                 maxfiles = (maxsockets * 5) / 4;
  224                                 maxfilesperproc = (maxfiles * 9) / 10;
  225                         }
  226                         EVENTHANDLER_INVOKE(maxsockets_change);
  227                 } else
  228                         error = EINVAL;
  229         }
  230         return (error);
  231 }
  232 
  233 SYSCTL_PROC(_kern_ipc, OID_AUTO, maxsockets, CTLTYPE_INT|CTLFLAG_RW,
  234     &maxsockets, 0, sysctl_maxsockets, "IU",
  235     "Maximum number of sockets avaliable");
  236 
  237 /*
  238  * Initialise maxsockets.
  239  */
  240 static void init_maxsockets(void *ignored)
  241 {
  242         TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets);
  243         maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters));
  244 }
  245 SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL);
  246 
  247 /*
  248  * Socket operation routines.  These routines are called by the routines in
  249  * sys_socket.c or from a system process, and implement the semantics of
  250  * socket operations by switching out to the protocol specific routines.
  251  */
  252 
  253 /*
  254  * Get a socket structure from our zone, and initialize it.  Note that it
  255  * would probably be better to allocate socket and PCB at the same time, but
  256  * I'm not convinced that all the protocols can be easily modified to do
  257  * this.
  258  *
  259  * soalloc() returns a socket with a ref count of 0.
  260  */
  261 static struct socket *
  262 soalloc(void)
  263 {
  264         struct socket *so;
  265 
  266         so = uma_zalloc(socket_zone, M_NOWAIT | M_ZERO);
  267         if (so == NULL)
  268                 return (NULL);
  269 #ifdef MAC
  270         if (mac_init_socket(so, M_NOWAIT) != 0) {
  271                 uma_zfree(socket_zone, so);
  272                 return (NULL);
  273         }
  274 #endif
  275         SOCKBUF_LOCK_INIT(&so->so_snd, "so_snd");
  276         SOCKBUF_LOCK_INIT(&so->so_rcv, "so_rcv");
  277         sx_init(&so->so_snd.sb_sx, "so_snd_sx");
  278         sx_init(&so->so_rcv.sb_sx, "so_rcv_sx");
  279         TAILQ_INIT(&so->so_aiojobq);
  280         mtx_lock(&so_global_mtx);
  281         so->so_gencnt = ++so_gencnt;
  282         ++numopensockets;
  283         mtx_unlock(&so_global_mtx);
  284         return (so);
  285 }
  286 
  287 /*
  288  * Free the storage associated with a socket at the socket layer, tear down
  289  * locks, labels, etc.  All protocol state is assumed already to have been
  290  * torn down (and possibly never set up) by the caller.
  291  */
  292 static void
  293 sodealloc(struct socket *so)
  294 {
  295 
  296         KASSERT(so->so_count == 0, ("sodealloc(): so_count %d", so->so_count));
  297         KASSERT(so->so_pcb == NULL, ("sodealloc(): so_pcb != NULL"));
  298 
  299         mtx_lock(&so_global_mtx);
  300         so->so_gencnt = ++so_gencnt;
  301         --numopensockets;       /* Could be below, but faster here. */
  302         mtx_unlock(&so_global_mtx);
  303         if (so->so_rcv.sb_hiwat)
  304                 (void)chgsbsize(so->so_cred->cr_uidinfo,
  305                     &so->so_rcv.sb_hiwat, 0, RLIM_INFINITY);
  306         if (so->so_snd.sb_hiwat)
  307                 (void)chgsbsize(so->so_cred->cr_uidinfo,
  308                     &so->so_snd.sb_hiwat, 0, RLIM_INFINITY);
  309 #ifdef INET
  310         /* remove acccept filter if one is present. */
  311         if (so->so_accf != NULL)
  312                 do_setopt_accept_filter(so, NULL);
  313 #endif
  314 #ifdef MAC
  315         mac_destroy_socket(so);
  316 #endif
  317         crfree(so->so_cred);
  318         sx_destroy(&so->so_snd.sb_sx);
  319         sx_destroy(&so->so_rcv.sb_sx);
  320         SOCKBUF_LOCK_DESTROY(&so->so_snd);
  321         SOCKBUF_LOCK_DESTROY(&so->so_rcv);
  322         uma_zfree(socket_zone, so);
  323 }
  324 
  325 /*
  326  * socreate returns a socket with a ref count of 1.  The socket should be
  327  * closed with soclose().
  328  */
  329 int
  330 socreate(int dom, struct socket **aso, int type, int proto,
  331     struct ucred *cred, struct thread *td)
  332 {
  333         struct protosw *prp;
  334         struct socket *so;
  335         int error;
  336 
  337         if (proto)
  338                 prp = pffindproto(dom, proto, type);
  339         else
  340                 prp = pffindtype(dom, type);
  341 
  342         if (prp == NULL || prp->pr_usrreqs->pru_attach == NULL ||
  343             prp->pr_usrreqs->pru_attach == pru_attach_notsupp)
  344                 return (EPROTONOSUPPORT);
  345 
  346         if (jailed(cred) && jail_socket_unixiproute_only &&
  347             prp->pr_domain->dom_family != PF_LOCAL &&
  348             prp->pr_domain->dom_family != PF_INET &&
  349             prp->pr_domain->dom_family != PF_ROUTE) {
  350                 return (EPROTONOSUPPORT);
  351         }
  352 
  353         if (prp->pr_type != type)
  354                 return (EPROTOTYPE);
  355         so = soalloc();
  356         if (so == NULL)
  357                 return (ENOBUFS);
  358 
  359         TAILQ_INIT(&so->so_incomp);
  360         TAILQ_INIT(&so->so_comp);
  361         so->so_type = type;
  362         so->so_cred = crhold(cred);
  363         so->so_proto = prp;
  364 #ifdef MAC
  365         mac_create_socket(cred, so);
  366 #endif
  367         knlist_init(&so->so_rcv.sb_sel.si_note, SOCKBUF_MTX(&so->so_rcv),
  368             NULL, NULL, NULL);
  369         knlist_init(&so->so_snd.sb_sel.si_note, SOCKBUF_MTX(&so->so_snd),
  370             NULL, NULL, NULL);
  371         so->so_count = 1;
  372         /*
  373          * Auto-sizing of socket buffers is managed by the protocols and
  374          * the appropriate flags must be set in the pru_attach function.
  375          */
  376         error = (*prp->pr_usrreqs->pru_attach)(so, proto, td);
  377         if (error) {
  378                 KASSERT(so->so_count == 1, ("socreate: so_count %d",
  379                     so->so_count));
  380                 so->so_count = 0;
  381                 sodealloc(so);
  382                 return (error);
  383         }
  384         *aso = so;
  385         return (0);
  386 }
  387 
  388 #ifdef REGRESSION
  389 static int regression_sonewconn_earlytest = 1;
  390 SYSCTL_INT(_regression, OID_AUTO, sonewconn_earlytest, CTLFLAG_RW,
  391     &regression_sonewconn_earlytest, 0, "Perform early sonewconn limit test");
  392 #endif
  393 
  394 /*
  395  * When an attempt at a new connection is noted on a socket which accepts
  396  * connections, sonewconn is called.  If the connection is possible (subject
  397  * to space constraints, etc.) then we allocate a new structure, propoerly
  398  * linked into the data structure of the original socket, and return this.
  399  * Connstatus may be 0, or SO_ISCONFIRMING, or SO_ISCONNECTED.
  400  *
  401  * Note: the ref count on the socket is 0 on return.
  402  */
  403 struct socket *
  404 sonewconn(struct socket *head, int connstatus)
  405 {
  406         struct socket *so;
  407         int over;
  408 
  409         ACCEPT_LOCK();
  410         over = (head->so_qlen > 3 * head->so_qlimit / 2);
  411         ACCEPT_UNLOCK();
  412 #ifdef REGRESSION
  413         if (regression_sonewconn_earlytest && over)
  414 #else
  415         if (over)
  416 #endif
  417                 return (NULL);
  418         so = soalloc();
  419         if (so == NULL)
  420                 return (NULL);
  421         if ((head->so_options & SO_ACCEPTFILTER) != 0)
  422                 connstatus = 0;
  423         so->so_head = head;
  424         so->so_type = head->so_type;
  425         so->so_options = head->so_options &~ SO_ACCEPTCONN;
  426         so->so_linger = head->so_linger;
  427         so->so_state = head->so_state | SS_NOFDREF;
  428         so->so_proto = head->so_proto;
  429         so->so_cred = crhold(head->so_cred);
  430 #ifdef MAC
  431         SOCK_LOCK(head);
  432         mac_create_socket_from_socket(head, so);
  433         SOCK_UNLOCK(head);
  434 #endif
  435         knlist_init(&so->so_rcv.sb_sel.si_note, SOCKBUF_MTX(&so->so_rcv),
  436             NULL, NULL, NULL);
  437         knlist_init(&so->so_snd.sb_sel.si_note, SOCKBUF_MTX(&so->so_snd),
  438             NULL, NULL, NULL);
  439         if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat) ||
  440             (*so->so_proto->pr_usrreqs->pru_attach)(so, 0, NULL)) {
  441                 sodealloc(so);
  442                 return (NULL);
  443         }
  444         so->so_rcv.sb_lowat = head->so_rcv.sb_lowat;
  445         so->so_snd.sb_lowat = head->so_snd.sb_lowat;
  446         so->so_rcv.sb_timeo = head->so_rcv.sb_timeo;
  447         so->so_snd.sb_timeo = head->so_snd.sb_timeo;
  448         so->so_rcv.sb_flags |= head->so_rcv.sb_flags & SB_AUTOSIZE;
  449         so->so_snd.sb_flags |= head->so_snd.sb_flags & SB_AUTOSIZE;
  450         so->so_state |= connstatus;
  451         ACCEPT_LOCK();
  452         if (connstatus) {
  453                 TAILQ_INSERT_TAIL(&head->so_comp, so, so_list);
  454                 so->so_qstate |= SQ_COMP;
  455                 head->so_qlen++;
  456         } else {
  457                 /*
  458                  * Keep removing sockets from the head until there's room for
  459                  * us to insert on the tail.  In pre-locking revisions, this
  460                  * was a simple if(), but as we could be racing with other
  461                  * threads and soabort() requires dropping locks, we must
  462                  * loop waiting for the condition to be true.
  463                  */
  464                 while (head->so_incqlen > head->so_qlimit) {
  465                         struct socket *sp;
  466                         sp = TAILQ_FIRST(&head->so_incomp);
  467                         TAILQ_REMOVE(&head->so_incomp, sp, so_list);
  468                         head->so_incqlen--;
  469                         sp->so_qstate &= ~SQ_INCOMP;
  470                         sp->so_head = NULL;
  471                         ACCEPT_UNLOCK();
  472                         soabort(sp);
  473                         ACCEPT_LOCK();
  474                 }
  475                 TAILQ_INSERT_TAIL(&head->so_incomp, so, so_list);
  476                 so->so_qstate |= SQ_INCOMP;
  477                 head->so_incqlen++;
  478         }
  479         ACCEPT_UNLOCK();
  480         if (connstatus) {
  481                 sorwakeup(head);
  482                 wakeup_one(&head->so_timeo);
  483         }
  484         return (so);
  485 }
  486 
  487 int
  488 sobind(struct socket *so, struct sockaddr *nam, struct thread *td)
  489 {
  490 
  491         return ((*so->so_proto->pr_usrreqs->pru_bind)(so, nam, td));
  492 }
  493 
  494 /*
  495  * solisten() transitions a socket from a non-listening state to a listening
  496  * state, but can also be used to update the listen queue depth on an
  497  * existing listen socket.  The protocol will call back into the sockets
  498  * layer using solisten_proto_check() and solisten_proto() to check and set
  499  * socket-layer listen state.  Call backs are used so that the protocol can
  500  * acquire both protocol and socket layer locks in whatever order is required
  501  * by the protocol.
  502  *
  503  * Protocol implementors are advised to hold the socket lock across the
  504  * socket-layer test and set to avoid races at the socket layer.
  505  */
  506 int
  507 solisten(struct socket *so, int backlog, struct thread *td)
  508 {
  509 
  510         return ((*so->so_proto->pr_usrreqs->pru_listen)(so, backlog, td));
  511 }
  512 
  513 int
  514 solisten_proto_check(struct socket *so)
  515 {
  516 
  517         SOCK_LOCK_ASSERT(so);
  518 
  519         if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING |
  520             SS_ISDISCONNECTING))
  521                 return (EINVAL);
  522         return (0);
  523 }
  524 
  525 void
  526 solisten_proto(struct socket *so, int backlog)
  527 {
  528 
  529         SOCK_LOCK_ASSERT(so);
  530 
  531         if (backlog < 0 || backlog > somaxconn)
  532                 backlog = somaxconn;
  533         so->so_qlimit = backlog;
  534         so->so_options |= SO_ACCEPTCONN;
  535 }
  536 
  537 /*
  538  * Attempt to free a socket.  This should really be sotryfree().
  539  *
  540  * sofree() will succeed if:
  541  *
  542  * - There are no outstanding file descriptor references or related consumers
  543  *   (so_count == 0).
  544  *
  545  * - The socket has been closed by user space, if ever open (SS_NOFDREF).
  546  *
  547  * - The protocol does not have an outstanding strong reference on the socket
  548  *   (SS_PROTOREF).
  549  *
  550  * - The socket is not in a completed connection queue, so a process has been
  551  *   notified that it is present.  If it is removed, the user process may
  552  *   block in accept() despite select() saying the socket was ready.
  553  *
  554  * Otherwise, it will quietly abort so that a future call to sofree(), when
  555  * conditions are right, can succeed.
  556  */
  557 void
  558 sofree(struct socket *so)
  559 {
  560         struct protosw *pr = so->so_proto;
  561         struct socket *head;
  562 
  563         ACCEPT_LOCK_ASSERT();
  564         SOCK_LOCK_ASSERT(so);
  565 
  566         if ((so->so_state & SS_NOFDREF) == 0 || so->so_count != 0 ||
  567             (so->so_state & SS_PROTOREF) || (so->so_qstate & SQ_COMP)) {
  568                 SOCK_UNLOCK(so);
  569                 ACCEPT_UNLOCK();
  570                 return;
  571         }
  572 
  573         head = so->so_head;
  574         if (head != NULL) {
  575                 KASSERT((so->so_qstate & SQ_COMP) != 0 ||
  576                     (so->so_qstate & SQ_INCOMP) != 0,
  577                     ("sofree: so_head != NULL, but neither SQ_COMP nor "
  578                     "SQ_INCOMP"));
  579                 KASSERT((so->so_qstate & SQ_COMP) == 0 ||
  580                     (so->so_qstate & SQ_INCOMP) == 0,
  581                     ("sofree: so->so_qstate is SQ_COMP and also SQ_INCOMP"));
  582                 TAILQ_REMOVE(&head->so_incomp, so, so_list);
  583                 head->so_incqlen--;
  584                 so->so_qstate &= ~SQ_INCOMP;
  585                 so->so_head = NULL;
  586         }
  587         KASSERT((so->so_qstate & SQ_COMP) == 0 &&
  588             (so->so_qstate & SQ_INCOMP) == 0,
  589             ("sofree: so_head == NULL, but still SQ_COMP(%d) or SQ_INCOMP(%d)",
  590             so->so_qstate & SQ_COMP, so->so_qstate & SQ_INCOMP));
  591         if (so->so_options & SO_ACCEPTCONN) {
  592                 KASSERT((TAILQ_EMPTY(&so->so_comp)), ("sofree: so_comp populated"));
  593                 KASSERT((TAILQ_EMPTY(&so->so_incomp)), ("sofree: so_comp populated"));
  594         }
  595         SOCK_UNLOCK(so);
  596         ACCEPT_UNLOCK();
  597 
  598         if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL)
  599                 (*pr->pr_domain->dom_dispose)(so->so_rcv.sb_mb);
  600         if (pr->pr_usrreqs->pru_detach != NULL)
  601                 (*pr->pr_usrreqs->pru_detach)(so);
  602 
  603         /*
  604          * From this point on, we assume that no other references to this
  605          * socket exist anywhere else in the stack.  Therefore, no locks need
  606          * to be acquired or held.
  607          *
  608          * We used to do a lot of socket buffer and socket locking here, as
  609          * well as invoke sorflush() and perform wakeups.  The direct call to
  610          * dom_dispose() and sbrelease_internal() are an inlining of what was
  611          * necessary from sorflush().
  612          *
  613          * Notice that the socket buffer and kqueue state are torn down
  614          * before calling pru_detach.  This means that protocols shold not
  615          * assume they can perform socket wakeups, etc, in their detach code.
  616          */
  617         sbdestroy(&so->so_snd, so);
  618         sbdestroy(&so->so_rcv, so);
  619         knlist_destroy(&so->so_rcv.sb_sel.si_note);
  620         knlist_destroy(&so->so_snd.sb_sel.si_note);
  621         sodealloc(so);
  622 }
  623 
  624 /*
  625  * Close a socket on last file table reference removal.  Initiate disconnect
  626  * if connected.  Free socket when disconnect complete.
  627  *
  628  * This function will sorele() the socket.  Note that soclose() may be called
  629  * prior to the ref count reaching zero.  The actual socket structure will
  630  * not be freed until the ref count reaches zero.
  631  */
  632 int
  633 soclose(struct socket *so)
  634 {
  635         int error = 0;
  636 
  637         KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter"));
  638 
  639         funsetown(&so->so_sigio);
  640         if (so->so_state & SS_ISCONNECTED) {
  641                 if ((so->so_state & SS_ISDISCONNECTING) == 0) {
  642                         error = sodisconnect(so);
  643                         if (error)
  644                                 goto drop;
  645                 }
  646                 if (so->so_options & SO_LINGER) {
  647                         if ((so->so_state & SS_ISDISCONNECTING) &&
  648                             (so->so_state & SS_NBIO))
  649                                 goto drop;
  650                         while (so->so_state & SS_ISCONNECTED) {
  651                                 error = tsleep(&so->so_timeo,
  652                                     PSOCK | PCATCH, "soclos", so->so_linger * hz);
  653                                 if (error)
  654                                         break;
  655                         }
  656                 }
  657         }
  658 
  659 drop:
  660         if (so->so_proto->pr_usrreqs->pru_close != NULL)
  661                 (*so->so_proto->pr_usrreqs->pru_close)(so);
  662         if (so->so_options & SO_ACCEPTCONN) {
  663                 struct socket *sp;
  664                 ACCEPT_LOCK();
  665                 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) {
  666                         TAILQ_REMOVE(&so->so_incomp, sp, so_list);
  667                         so->so_incqlen--;
  668                         sp->so_qstate &= ~SQ_INCOMP;
  669                         sp->so_head = NULL;
  670                         ACCEPT_UNLOCK();
  671                         soabort(sp);
  672                         ACCEPT_LOCK();
  673                 }
  674                 while ((sp = TAILQ_FIRST(&so->so_comp)) != NULL) {
  675                         TAILQ_REMOVE(&so->so_comp, sp, so_list);
  676                         so->so_qlen--;
  677                         sp->so_qstate &= ~SQ_COMP;
  678                         sp->so_head = NULL;
  679                         ACCEPT_UNLOCK();
  680                         soabort(sp);
  681                         ACCEPT_LOCK();
  682                 }
  683                 ACCEPT_UNLOCK();
  684         }
  685         ACCEPT_LOCK();
  686         SOCK_LOCK(so);
  687         KASSERT((so->so_state & SS_NOFDREF) == 0, ("soclose: NOFDREF"));
  688         so->so_state |= SS_NOFDREF;
  689         sorele(so);
  690         return (error);
  691 }
  692 
  693 /*
  694  * soabort() is used to abruptly tear down a connection, such as when a
  695  * resource limit is reached (listen queue depth exceeded), or if a listen
  696  * socket is closed while there are sockets waiting to be accepted.
  697  *
  698  * This interface is tricky, because it is called on an unreferenced socket,
  699  * and must be called only by a thread that has actually removed the socket
  700  * from the listen queue it was on, or races with other threads are risked.
  701  *
  702  * This interface will call into the protocol code, so must not be called
  703  * with any socket locks held.  Protocols do call it while holding their own
  704  * recursible protocol mutexes, but this is something that should be subject
  705  * to review in the future.
  706  */
  707 void
  708 soabort(struct socket *so)
  709 {
  710 
  711         /*
  712          * In as much as is possible, assert that no references to this
  713          * socket are held.  This is not quite the same as asserting that the
  714          * current thread is responsible for arranging for no references, but
  715          * is as close as we can get for now.
  716          */
  717         KASSERT(so->so_count == 0, ("soabort: so_count"));
  718         KASSERT((so->so_state & SS_PROTOREF) == 0, ("soabort: SS_PROTOREF"));
  719         KASSERT(so->so_state & SS_NOFDREF, ("soabort: !SS_NOFDREF"));
  720         KASSERT((so->so_state & SQ_COMP) == 0, ("soabort: SQ_COMP"));
  721         KASSERT((so->so_state & SQ_INCOMP) == 0, ("soabort: SQ_INCOMP"));
  722 
  723         if (so->so_proto->pr_usrreqs->pru_abort != NULL)
  724                 (*so->so_proto->pr_usrreqs->pru_abort)(so);
  725         ACCEPT_LOCK();
  726         SOCK_LOCK(so);
  727         sofree(so);
  728 }
  729 
  730 int
  731 soaccept(struct socket *so, struct sockaddr **nam)
  732 {
  733         int error;
  734 
  735         SOCK_LOCK(so);
  736         KASSERT((so->so_state & SS_NOFDREF) != 0, ("soaccept: !NOFDREF"));
  737         so->so_state &= ~SS_NOFDREF;
  738         SOCK_UNLOCK(so);
  739         error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam);
  740         return (error);
  741 }
  742 
  743 int
  744 soconnect(struct socket *so, struct sockaddr *nam, struct thread *td)
  745 {
  746         int error;
  747 
  748         if (so->so_options & SO_ACCEPTCONN)
  749                 return (EOPNOTSUPP);
  750         /*
  751          * If protocol is connection-based, can only connect once.
  752          * Otherwise, if connected, try to disconnect first.  This allows
  753          * user to disconnect by connecting to, e.g., a null address.
  754          */
  755         if (so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING) &&
  756             ((so->so_proto->pr_flags & PR_CONNREQUIRED) ||
  757             (error = sodisconnect(so)))) {
  758                 error = EISCONN;
  759         } else {
  760                 /*
  761                  * Prevent accumulated error from previous connection from
  762                  * biting us.
  763                  */
  764                 so->so_error = 0;
  765                 error = (*so->so_proto->pr_usrreqs->pru_connect)(so, nam, td);
  766         }
  767 
  768         return (error);
  769 }
  770 
  771 int
  772 soconnect2(struct socket *so1, struct socket *so2)
  773 {
  774 
  775         return ((*so1->so_proto->pr_usrreqs->pru_connect2)(so1, so2));
  776 }
  777 
  778 int
  779 sodisconnect(struct socket *so)
  780 {
  781         int error;
  782 
  783         if ((so->so_state & SS_ISCONNECTED) == 0)
  784                 return (ENOTCONN);
  785         if (so->so_state & SS_ISDISCONNECTING)
  786                 return (EALREADY);
  787         error = (*so->so_proto->pr_usrreqs->pru_disconnect)(so);
  788         return (error);
  789 }
  790 
  791 #ifdef ZERO_COPY_SOCKETS
  792 struct so_zerocopy_stats{
  793         int size_ok;
  794         int align_ok;
  795         int found_ifp;
  796 };
  797 struct so_zerocopy_stats so_zerocp_stats = {0,0,0};
  798 #include <netinet/in.h>
  799 #include <net/route.h>
  800 #include <netinet/in_pcb.h>
  801 #include <vm/vm.h>
  802 #include <vm/vm_page.h>
  803 #include <vm/vm_object.h>
  804 
  805 /*
  806  * sosend_copyin() is only used if zero copy sockets are enabled.  Otherwise
  807  * sosend_dgram() and sosend_generic() use m_uiotombuf().
  808  * 
  809  * sosend_copyin() accepts a uio and prepares an mbuf chain holding part or
  810  * all of the data referenced by the uio.  If desired, it uses zero-copy.
  811  * *space will be updated to reflect data copied in.
  812  *
  813  * NB: If atomic I/O is requested, the caller must already have checked that
  814  * space can hold resid bytes.
  815  *
  816  * NB: In the event of an error, the caller may need to free the partial
  817  * chain pointed to by *mpp.  The contents of both *uio and *space may be
  818  * modified even in the case of an error.
  819  */
  820 static int
  821 sosend_copyin(struct uio *uio, struct mbuf **retmp, int atomic, long *space,
  822     int flags)
  823 {
  824         struct mbuf *m, **mp, *top;
  825         long len, resid;
  826         int error;
  827 #ifdef ZERO_COPY_SOCKETS
  828         int cow_send;
  829 #endif
  830 
  831         *retmp = top = NULL;
  832         mp = &top;
  833         len = 0;
  834         resid = uio->uio_resid;
  835         error = 0;
  836         do {
  837 #ifdef ZERO_COPY_SOCKETS
  838                 cow_send = 0;
  839 #endif /* ZERO_COPY_SOCKETS */
  840                 if (resid >= MINCLSIZE) {
  841 #ifdef ZERO_COPY_SOCKETS
  842                         if (top == NULL) {
  843                                 m = m_gethdr(M_WAITOK, MT_DATA);
  844                                 m->m_pkthdr.len = 0;
  845                                 m->m_pkthdr.rcvif = NULL;
  846                         } else
  847                                 m = m_get(M_WAITOK, MT_DATA);
  848                         if (so_zero_copy_send &&
  849                             resid>=PAGE_SIZE &&
  850                             *space>=PAGE_SIZE &&
  851                             uio->uio_iov->iov_len>=PAGE_SIZE) {
  852                                 so_zerocp_stats.size_ok++;
  853                                 so_zerocp_stats.align_ok++;
  854                                 cow_send = socow_setup(m, uio);
  855                                 len = cow_send;
  856                         }
  857                         if (!cow_send) {
  858                                 m_clget(m, M_WAITOK);
  859                                 len = min(min(MCLBYTES, resid), *space);
  860                         }
  861 #else /* ZERO_COPY_SOCKETS */
  862                         if (top == NULL) {
  863                                 m = m_getcl(M_TRYWAIT, MT_DATA, M_PKTHDR);
  864                                 m->m_pkthdr.len = 0;
  865                                 m->m_pkthdr.rcvif = NULL;
  866                         } else
  867                                 m = m_getcl(M_TRYWAIT, MT_DATA, 0);
  868                         len = min(min(MCLBYTES, resid), *space);
  869 #endif /* ZERO_COPY_SOCKETS */
  870                 } else {
  871                         if (top == NULL) {
  872                                 m = m_gethdr(M_TRYWAIT, MT_DATA);
  873                                 m->m_pkthdr.len = 0;
  874                                 m->m_pkthdr.rcvif = NULL;
  875 
  876                                 len = min(min(MHLEN, resid), *space);
  877                                 /*
  878                                  * For datagram protocols, leave room
  879                                  * for protocol headers in first mbuf.
  880                                  */
  881                                 if (atomic && m && len < MHLEN)
  882                                         MH_ALIGN(m, len);
  883                         } else {
  884                                 m = m_get(M_TRYWAIT, MT_DATA);
  885                                 len = min(min(MLEN, resid), *space);
  886                         }
  887                 }
  888                 if (m == NULL) {
  889                         error = ENOBUFS;
  890                         goto out;
  891                 }
  892 
  893                 *space -= len;
  894 #ifdef ZERO_COPY_SOCKETS
  895                 if (cow_send)
  896                         error = 0;
  897                 else
  898 #endif /* ZERO_COPY_SOCKETS */
  899                 error = uiomove(mtod(m, void *), (int)len, uio);
  900                 resid = uio->uio_resid;
  901                 m->m_len = len;
  902                 *mp = m;
  903                 top->m_pkthdr.len += len;
  904                 if (error)
  905                         goto out;
  906                 mp = &m->m_next;
  907                 if (resid <= 0) {
  908                         if (flags & MSG_EOR)
  909                                 top->m_flags |= M_EOR;
  910                         break;
  911                 }
  912         } while (*space > 0 && atomic);
  913 out:
  914         *retmp = top;
  915         return (error);
  916 }
  917 #endif /*ZERO_COPY_SOCKETS*/
  918 
  919 #define SBLOCKWAIT(f)   (((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT)
  920 
  921 int
  922 sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
  923     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
  924 {
  925         long space, resid;
  926         int clen = 0, error, dontroute;
  927 #ifdef ZERO_COPY_SOCKETS
  928         int atomic = sosendallatonce(so) || top;
  929 #endif
  930 
  931         KASSERT(so->so_type == SOCK_DGRAM, ("sodgram_send: !SOCK_DGRAM"));
  932         KASSERT(so->so_proto->pr_flags & PR_ATOMIC,
  933             ("sodgram_send: !PR_ATOMIC"));
  934 
  935         if (uio != NULL)
  936                 resid = uio->uio_resid;
  937         else
  938                 resid = top->m_pkthdr.len;
  939         /*
  940          * In theory resid should be unsigned.  However, space must be
  941          * signed, as it might be less than 0 if we over-committed, and we
  942          * must use a signed comparison of space and resid.  On the other
  943          * hand, a negative resid causes us to loop sending 0-length
  944          * segments to the protocol.
  945          *
  946          * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM
  947          * type sockets since that's an error.
  948          */
  949         if (resid < 0) {
  950                 error = EINVAL;
  951                 goto out;
  952         }
  953 
  954         dontroute =
  955             (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0;
  956         if (td != NULL)
  957                 td->td_ru.ru_msgsnd++;
  958         if (control != NULL)
  959                 clen = control->m_len;
  960 
  961         SOCKBUF_LOCK(&so->so_snd);
  962         if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
  963                 SOCKBUF_UNLOCK(&so->so_snd);
  964                 error = EPIPE;
  965                 goto out;
  966         }
  967         if (so->so_error) {
  968                 error = so->so_error;
  969                 so->so_error = 0;
  970                 SOCKBUF_UNLOCK(&so->so_snd);
  971                 goto out;
  972         }
  973         if ((so->so_state & SS_ISCONNECTED) == 0) {
  974                 /*
  975                  * `sendto' and `sendmsg' is allowed on a connection-based
  976                  * socket if it supports implied connect.  Return ENOTCONN if
  977                  * not connected and no address is supplied.
  978                  */
  979                 if ((so->so_proto->pr_flags & PR_CONNREQUIRED) &&
  980                     (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) {
  981                         if ((so->so_state & SS_ISCONFIRMING) == 0 &&
  982                             !(resid == 0 && clen != 0)) {
  983                                 SOCKBUF_UNLOCK(&so->so_snd);
  984                                 error = ENOTCONN;
  985                                 goto out;
  986                         }
  987                 } else if (addr == NULL) {
  988                         if (so->so_proto->pr_flags & PR_CONNREQUIRED)
  989                                 error = ENOTCONN;
  990                         else
  991                                 error = EDESTADDRREQ;
  992                         SOCKBUF_UNLOCK(&so->so_snd);
  993                         goto out;
  994                 }
  995         }
  996 
  997         /*
  998          * Do we need MSG_OOB support in SOCK_DGRAM?  Signs here may be a
  999          * problem and need fixing.
 1000          */
 1001         space = sbspace(&so->so_snd);
 1002         if (flags & MSG_OOB)
 1003                 space += 1024;
 1004         space -= clen;
 1005         SOCKBUF_UNLOCK(&so->so_snd);
 1006         if (resid > space) {
 1007                 error = EMSGSIZE;
 1008                 goto out;
 1009         }
 1010         if (uio == NULL) {
 1011                 resid = 0;
 1012                 if (flags & MSG_EOR)
 1013                         top->m_flags |= M_EOR;
 1014         } else {
 1015 #ifdef ZERO_COPY_SOCKETS
 1016                 error = sosend_copyin(uio, &top, atomic, &space, flags);
 1017                 if (error)
 1018                         goto out;
 1019 #else
 1020                 /*
 1021                  * Copy the data from userland into a mbuf chain.
 1022                  * If no data is to be copied in, a single empty mbuf
 1023                  * is returned.
 1024                  */
 1025                 top = m_uiotombuf(uio, M_WAITOK, space, max_hdr,
 1026                     (M_PKTHDR | ((flags & MSG_EOR) ? M_EOR : 0)));
 1027                 if (top == NULL) {
 1028                         error = EFAULT; /* only possible error */
 1029                         goto out;
 1030                 }
 1031                 space -= resid - uio->uio_resid;
 1032 #endif
 1033                 resid = uio->uio_resid;
 1034         }
 1035         KASSERT(resid == 0, ("sosend_dgram: resid != 0"));
 1036         /*
 1037          * XXXRW: Frobbing SO_DONTROUTE here is even worse without sblock
 1038          * than with.
 1039          */
 1040         if (dontroute) {
 1041                 SOCK_LOCK(so);
 1042                 so->so_options |= SO_DONTROUTE;
 1043                 SOCK_UNLOCK(so);
 1044         }
 1045         /*
 1046          * XXX all the SBS_CANTSENDMORE checks previously done could be out
 1047          * of date.  We could have recieved a reset packet in an interrupt or
 1048          * maybe we slept while doing page faults in uiomove() etc.  We could
 1049          * probably recheck again inside the locking protection here, but
 1050          * there are probably other places that this also happens.  We must
 1051          * rethink this.
 1052          */
 1053         error = (*so->so_proto->pr_usrreqs->pru_send)(so,
 1054             (flags & MSG_OOB) ? PRUS_OOB :
 1055         /*
 1056          * If the user set MSG_EOF, the protocol understands this flag and
 1057          * nothing left to send then use PRU_SEND_EOF instead of PRU_SEND.
 1058          */
 1059             ((flags & MSG_EOF) &&
 1060              (so->so_proto->pr_flags & PR_IMPLOPCL) &&
 1061              (resid <= 0)) ?
 1062                 PRUS_EOF :
 1063                 /* If there is more to send set PRUS_MORETOCOME */
 1064                 (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0,
 1065                 top, addr, control, td);
 1066         if (dontroute) {
 1067                 SOCK_LOCK(so);
 1068                 so->so_options &= ~SO_DONTROUTE;
 1069                 SOCK_UNLOCK(so);
 1070         }
 1071         clen = 0;
 1072         control = NULL;
 1073         top = NULL;
 1074 out:
 1075         if (top != NULL)
 1076                 m_freem(top);
 1077         if (control != NULL)
 1078                 m_freem(control);
 1079         return (error);
 1080 }
 1081 
 1082 /*
 1083  * Send on a socket.  If send must go all at once and message is larger than
 1084  * send buffering, then hard error.  Lock against other senders.  If must go
 1085  * all at once and not enough room now, then inform user that this would
 1086  * block and do nothing.  Otherwise, if nonblocking, send as much as
 1087  * possible.  The data to be sent is described by "uio" if nonzero, otherwise
 1088  * by the mbuf chain "top" (which must be null if uio is not).  Data provided
 1089  * in mbuf chain must be small enough to send all at once.
 1090  *
 1091  * Returns nonzero on error, timeout or signal; callers must check for short
 1092  * counts if EINTR/ERESTART are returned.  Data and control buffers are freed
 1093  * on return.
 1094  */
 1095 int
 1096 sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
 1097     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 1098 {
 1099         long space, resid;
 1100         int clen = 0, error, dontroute;
 1101         int atomic = sosendallatonce(so) || top;
 1102 
 1103         if (uio != NULL)
 1104                 resid = uio->uio_resid;
 1105         else
 1106                 resid = top->m_pkthdr.len;
 1107         /*
 1108          * In theory resid should be unsigned.  However, space must be
 1109          * signed, as it might be less than 0 if we over-committed, and we
 1110          * must use a signed comparison of space and resid.  On the other
 1111          * hand, a negative resid causes us to loop sending 0-length
 1112          * segments to the protocol.
 1113          *
 1114          * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM
 1115          * type sockets since that's an error.
 1116          */
 1117         if (resid < 0 || (so->so_type == SOCK_STREAM && (flags & MSG_EOR))) {
 1118                 error = EINVAL;
 1119                 goto out;
 1120         }
 1121 
 1122         dontroute =
 1123             (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0 &&
 1124             (so->so_proto->pr_flags & PR_ATOMIC);
 1125         if (td != NULL)
 1126                 td->td_ru.ru_msgsnd++;
 1127         if (control != NULL)
 1128                 clen = control->m_len;
 1129 
 1130         error = sblock(&so->so_snd, SBLOCKWAIT(flags));
 1131         if (error)
 1132                 goto out;
 1133 
 1134 restart:
 1135         do {
 1136                 SOCKBUF_LOCK(&so->so_snd);
 1137                 if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
 1138                         SOCKBUF_UNLOCK(&so->so_snd);
 1139                         error = EPIPE;
 1140                         goto release;
 1141                 }
 1142                 if (so->so_error) {
 1143                         error = so->so_error;
 1144                         so->so_error = 0;
 1145                         SOCKBUF_UNLOCK(&so->so_snd);
 1146                         goto release;
 1147                 }
 1148                 if ((so->so_state & SS_ISCONNECTED) == 0) {
 1149                         /*
 1150                          * `sendto' and `sendmsg' is allowed on a connection-
 1151                          * based socket if it supports implied connect.
 1152                          * Return ENOTCONN if not connected and no address is
 1153                          * supplied.
 1154                          */
 1155                         if ((so->so_proto->pr_flags & PR_CONNREQUIRED) &&
 1156                             (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) {
 1157                                 if ((so->so_state & SS_ISCONFIRMING) == 0 &&
 1158                                     !(resid == 0 && clen != 0)) {
 1159                                         SOCKBUF_UNLOCK(&so->so_snd);
 1160                                         error = ENOTCONN;
 1161                                         goto release;
 1162                                 }
 1163                         } else if (addr == NULL) {
 1164                                 SOCKBUF_UNLOCK(&so->so_snd);
 1165                                 if (so->so_proto->pr_flags & PR_CONNREQUIRED)
 1166                                         error = ENOTCONN;
 1167                                 else
 1168                                         error = EDESTADDRREQ;
 1169                                 goto release;
 1170                         }
 1171                 }
 1172                 space = sbspace(&so->so_snd);
 1173                 if (flags & MSG_OOB)
 1174                         space += 1024;
 1175                 if ((atomic && resid > so->so_snd.sb_hiwat) ||
 1176                     clen > so->so_snd.sb_hiwat) {
 1177                         SOCKBUF_UNLOCK(&so->so_snd);
 1178                         error = EMSGSIZE;
 1179                         goto release;
 1180                 }
 1181                 if (space < resid + clen &&
 1182                     (atomic || space < so->so_snd.sb_lowat || space < clen)) {
 1183                         if ((so->so_state & SS_NBIO) || (flags & MSG_NBIO)) {
 1184                                 SOCKBUF_UNLOCK(&so->so_snd);
 1185                                 error = EWOULDBLOCK;
 1186                                 goto release;
 1187                         }
 1188                         error = sbwait(&so->so_snd);
 1189                         SOCKBUF_UNLOCK(&so->so_snd);
 1190                         if (error)
 1191                                 goto release;
 1192                         goto restart;
 1193                 }
 1194                 SOCKBUF_UNLOCK(&so->so_snd);
 1195                 space -= clen;
 1196                 do {
 1197                         if (uio == NULL) {
 1198                                 resid = 0;
 1199                                 if (flags & MSG_EOR)
 1200                                         top->m_flags |= M_EOR;
 1201                         } else {
 1202 #ifdef ZERO_COPY_SOCKETS
 1203                                 error = sosend_copyin(uio, &top, atomic,
 1204                                     &space, flags);
 1205                                 if (error != 0)
 1206                                         goto release;
 1207 #else
 1208                                 /*
 1209                                  * Copy the data from userland into a mbuf
 1210                                  * chain.  If no data is to be copied in,
 1211                                  * a single empty mbuf is returned.
 1212                                  */
 1213                                 top = m_uiotombuf(uio, M_WAITOK, space,
 1214                                     (atomic ? max_hdr : 0),
 1215                                     (atomic ? M_PKTHDR : 0) |
 1216                                     ((flags & MSG_EOR) ? M_EOR : 0));
 1217                                 if (top == NULL) {
 1218                                         error = EFAULT; /* only possible error */
 1219                                         goto release;
 1220                                 }
 1221                                 space -= resid - uio->uio_resid;
 1222 #endif
 1223                                 resid = uio->uio_resid;
 1224                         }
 1225                         if (dontroute) {
 1226                                 SOCK_LOCK(so);
 1227                                 so->so_options |= SO_DONTROUTE;
 1228                                 SOCK_UNLOCK(so);
 1229                         }
 1230                         /*
 1231                          * XXX all the SBS_CANTSENDMORE checks previously
 1232                          * done could be out of date.  We could have recieved
 1233                          * a reset packet in an interrupt or maybe we slept
 1234                          * while doing page faults in uiomove() etc.  We
 1235                          * could probably recheck again inside the locking
 1236                          * protection here, but there are probably other
 1237                          * places that this also happens.  We must rethink
 1238                          * this.
 1239                          */
 1240                         error = (*so->so_proto->pr_usrreqs->pru_send)(so,
 1241                             (flags & MSG_OOB) ? PRUS_OOB :
 1242                         /*
 1243                          * If the user set MSG_EOF, the protocol understands
 1244                          * this flag and nothing left to send then use
 1245                          * PRU_SEND_EOF instead of PRU_SEND.
 1246                          */
 1247                             ((flags & MSG_EOF) &&
 1248                              (so->so_proto->pr_flags & PR_IMPLOPCL) &&
 1249                              (resid <= 0)) ?
 1250                                 PRUS_EOF :
 1251                         /* If there is more to send set PRUS_MORETOCOME. */
 1252                             (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0,
 1253                             top, addr, control, td);
 1254                         if (dontroute) {
 1255                                 SOCK_LOCK(so);
 1256                                 so->so_options &= ~SO_DONTROUTE;
 1257                                 SOCK_UNLOCK(so);
 1258                         }
 1259                         clen = 0;
 1260                         control = NULL;
 1261                         top = NULL;
 1262                         if (error)
 1263                                 goto release;
 1264                 } while (resid && space > 0);
 1265         } while (resid);
 1266 
 1267 release:
 1268         sbunlock(&so->so_snd);
 1269 out:
 1270         if (top != NULL)
 1271                 m_freem(top);
 1272         if (control != NULL)
 1273                 m_freem(control);
 1274         return (error);
 1275 }
 1276 
 1277 int
 1278 sosend(struct socket *so, struct sockaddr *addr, struct uio *uio,
 1279     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 1280 {
 1281 
 1282         /* XXXRW: Temporary debugging. */
 1283         KASSERT(so->so_proto->pr_usrreqs->pru_sosend != sosend,
 1284             ("sosend: protocol calls sosend"));
 1285 
 1286         return (so->so_proto->pr_usrreqs->pru_sosend(so, addr, uio, top,
 1287             control, flags, td));
 1288 }
 1289 
 1290 /*
 1291  * The part of soreceive() that implements reading non-inline out-of-band
 1292  * data from a socket.  For more complete comments, see soreceive(), from
 1293  * which this code originated.
 1294  *
 1295  * Note that soreceive_rcvoob(), unlike the remainder of soreceive(), is
 1296  * unable to return an mbuf chain to the caller.
 1297  */
 1298 static int
 1299 soreceive_rcvoob(struct socket *so, struct uio *uio, int flags)
 1300 {
 1301         struct protosw *pr = so->so_proto;
 1302         struct mbuf *m;
 1303         int error;
 1304 
 1305         KASSERT(flags & MSG_OOB, ("soreceive_rcvoob: (flags & MSG_OOB) == 0"));
 1306 
 1307         m = m_get(M_TRYWAIT, MT_DATA);
 1308         if (m == NULL)
 1309                 return (ENOBUFS);
 1310         error = (*pr->pr_usrreqs->pru_rcvoob)(so, m, flags & MSG_PEEK);
 1311         if (error)
 1312                 goto bad;
 1313         do {
 1314 #ifdef ZERO_COPY_SOCKETS
 1315                 if (so_zero_copy_receive) {
 1316                         int disposable;
 1317 
 1318                         if ((m->m_flags & M_EXT)
 1319                          && (m->m_ext.ext_type == EXT_DISPOSABLE))
 1320                                 disposable = 1;
 1321                         else
 1322                                 disposable = 0;
 1323 
 1324                         error = uiomoveco(mtod(m, void *),
 1325                                           min(uio->uio_resid, m->m_len),
 1326                                           uio, disposable);
 1327                 } else
 1328 #endif /* ZERO_COPY_SOCKETS */
 1329                 error = uiomove(mtod(m, void *),
 1330                     (int) min(uio->uio_resid, m->m_len), uio);
 1331                 m = m_free(m);
 1332         } while (uio->uio_resid && error == 0 && m);
 1333 bad:
 1334         if (m != NULL)
 1335                 m_freem(m);
 1336         return (error);
 1337 }
 1338 
 1339 /*
 1340  * Following replacement or removal of the first mbuf on the first mbuf chain
 1341  * of a socket buffer, push necessary state changes back into the socket
 1342  * buffer so that other consumers see the values consistently.  'nextrecord'
 1343  * is the callers locally stored value of the original value of
 1344  * sb->sb_mb->m_nextpkt which must be restored when the lead mbuf changes.
 1345  * NOTE: 'nextrecord' may be NULL.
 1346  */
 1347 static __inline void
 1348 sockbuf_pushsync(struct sockbuf *sb, struct mbuf *nextrecord)
 1349 {
 1350 
 1351         SOCKBUF_LOCK_ASSERT(sb);
 1352         /*
 1353          * First, update for the new value of nextrecord.  If necessary, make
 1354          * it the first record.
 1355          */
 1356         if (sb->sb_mb != NULL)
 1357                 sb->sb_mb->m_nextpkt = nextrecord;
 1358         else
 1359                 sb->sb_mb = nextrecord;
 1360 
 1361         /*
 1362          * Now update any dependent socket buffer fields to reflect the new
 1363          * state.  This is an expanded inline of SB_EMPTY_FIXUP(), with the
 1364          * addition of a second clause that takes care of the case where
 1365          * sb_mb has been updated, but remains the last record.
 1366          */
 1367         if (sb->sb_mb == NULL) {
 1368                 sb->sb_mbtail = NULL;
 1369                 sb->sb_lastrecord = NULL;
 1370         } else if (sb->sb_mb->m_nextpkt == NULL)
 1371                 sb->sb_lastrecord = sb->sb_mb;
 1372 }
 1373 
 1374 
 1375 /*
 1376  * Implement receive operations on a socket.  We depend on the way that
 1377  * records are added to the sockbuf by sbappend.  In particular, each record
 1378  * (mbufs linked through m_next) must begin with an address if the protocol
 1379  * so specifies, followed by an optional mbuf or mbufs containing ancillary
 1380  * data, and then zero or more mbufs of data.  In order to allow parallelism
 1381  * between network receive and copying to user space, as well as avoid
 1382  * sleeping with a mutex held, we release the socket buffer mutex during the
 1383  * user space copy.  Although the sockbuf is locked, new data may still be
 1384  * appended, and thus we must maintain consistency of the sockbuf during that
 1385  * time.
 1386  *
 1387  * The caller may receive the data as a single mbuf chain by supplying an
 1388  * mbuf **mp0 for use in returning the chain.  The uio is then used only for
 1389  * the count in uio_resid.
 1390  */
 1391 int
 1392 soreceive_generic(struct socket *so, struct sockaddr **psa, struct uio *uio,
 1393     struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 1394 {
 1395         struct mbuf *m, **mp;
 1396         int flags, len, error, offset;
 1397         struct protosw *pr = so->so_proto;
 1398         struct mbuf *nextrecord;
 1399         int moff, type = 0;
 1400         int orig_resid = uio->uio_resid;
 1401 
 1402         mp = mp0;
 1403         if (psa != NULL)
 1404                 *psa = NULL;
 1405         if (controlp != NULL)
 1406                 *controlp = NULL;
 1407         if (flagsp != NULL)
 1408                 flags = *flagsp &~ MSG_EOR;
 1409         else
 1410                 flags = 0;
 1411         if (flags & MSG_OOB)
 1412                 return (soreceive_rcvoob(so, uio, flags));
 1413         if (mp != NULL)
 1414                 *mp = NULL;
 1415         if ((pr->pr_flags & PR_WANTRCVD) && (so->so_state & SS_ISCONFIRMING)
 1416             && uio->uio_resid)
 1417                 (*pr->pr_usrreqs->pru_rcvd)(so, 0);
 1418 
 1419         error = sblock(&so->so_rcv, SBLOCKWAIT(flags));
 1420         if (error)
 1421                 return (error);
 1422 
 1423 restart:
 1424         SOCKBUF_LOCK(&so->so_rcv);
 1425         m = so->so_rcv.sb_mb;
 1426         /*
 1427          * If we have less data than requested, block awaiting more (subject
 1428          * to any timeout) if:
 1429          *   1. the current count is less than the low water mark, or
 1430          *   2. MSG_WAITALL is set, and it is possible to do the entire
 1431          *      receive operation at once if we block (resid <= hiwat).
 1432          *   3. MSG_DONTWAIT is not set
 1433          * If MSG_WAITALL is set but resid is larger than the receive buffer,
 1434          * we have to do the receive in sections, and thus risk returning a
 1435          * short count if a timeout or signal occurs after we start.
 1436          */
 1437         if (m == NULL || (((flags & MSG_DONTWAIT) == 0 &&
 1438             so->so_rcv.sb_cc < uio->uio_resid) &&
 1439             (so->so_rcv.sb_cc < so->so_rcv.sb_lowat ||
 1440             ((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) &&
 1441             m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) {
 1442                 KASSERT(m != NULL || !so->so_rcv.sb_cc,
 1443                     ("receive: m == %p so->so_rcv.sb_cc == %u",
 1444                     m, so->so_rcv.sb_cc));
 1445                 if (so->so_error) {
 1446                         if (m != NULL)
 1447                                 goto dontblock;
 1448                         error = so->so_error;
 1449                         if ((flags & MSG_PEEK) == 0)
 1450                                 so->so_error = 0;
 1451                         SOCKBUF_UNLOCK(&so->so_rcv);
 1452                         goto release;
 1453                 }
 1454                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1455                 if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 1456                         if (m == NULL) {
 1457                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1458                                 goto release;
 1459                         } else
 1460                                 goto dontblock;
 1461                 }
 1462                 for (; m != NULL; m = m->m_next)
 1463                         if (m->m_type == MT_OOBDATA  || (m->m_flags & M_EOR)) {
 1464                                 m = so->so_rcv.sb_mb;
 1465                                 goto dontblock;
 1466                         }
 1467                 if ((so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING)) == 0 &&
 1468                     (so->so_proto->pr_flags & PR_CONNREQUIRED)) {
 1469                         SOCKBUF_UNLOCK(&so->so_rcv);
 1470                         error = ENOTCONN;
 1471                         goto release;
 1472                 }
 1473                 if (uio->uio_resid == 0) {
 1474                         SOCKBUF_UNLOCK(&so->so_rcv);
 1475                         goto release;
 1476                 }
 1477                 if ((so->so_state & SS_NBIO) ||
 1478                     (flags & (MSG_DONTWAIT|MSG_NBIO))) {
 1479                         SOCKBUF_UNLOCK(&so->so_rcv);
 1480                         error = EWOULDBLOCK;
 1481                         goto release;
 1482                 }
 1483                 SBLASTRECORDCHK(&so->so_rcv);
 1484                 SBLASTMBUFCHK(&so->so_rcv);
 1485                 error = sbwait(&so->so_rcv);
 1486                 SOCKBUF_UNLOCK(&so->so_rcv);
 1487                 if (error)
 1488                         goto release;
 1489                 goto restart;
 1490         }
 1491 dontblock:
 1492         /*
 1493          * From this point onward, we maintain 'nextrecord' as a cache of the
 1494          * pointer to the next record in the socket buffer.  We must keep the
 1495          * various socket buffer pointers and local stack versions of the
 1496          * pointers in sync, pushing out modifications before dropping the
 1497          * socket buffer mutex, and re-reading them when picking it up.
 1498          *
 1499          * Otherwise, we will race with the network stack appending new data
 1500          * or records onto the socket buffer by using inconsistent/stale
 1501          * versions of the field, possibly resulting in socket buffer
 1502          * corruption.
 1503          *
 1504          * By holding the high-level sblock(), we prevent simultaneous
 1505          * readers from pulling off the front of the socket buffer.
 1506          */
 1507         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1508         if (uio->uio_td)
 1509                 uio->uio_td->td_ru.ru_msgrcv++;
 1510         KASSERT(m == so->so_rcv.sb_mb, ("soreceive: m != so->so_rcv.sb_mb"));
 1511         SBLASTRECORDCHK(&so->so_rcv);
 1512         SBLASTMBUFCHK(&so->so_rcv);
 1513         nextrecord = m->m_nextpkt;
 1514         if (pr->pr_flags & PR_ADDR) {
 1515                 KASSERT(m->m_type == MT_SONAME,
 1516                     ("m->m_type == %d", m->m_type));
 1517                 orig_resid = 0;
 1518                 if (psa != NULL)
 1519                         *psa = sodupsockaddr(mtod(m, struct sockaddr *),
 1520                             M_NOWAIT);
 1521                 if (flags & MSG_PEEK) {
 1522                         m = m->m_next;
 1523                 } else {
 1524                         sbfree(&so->so_rcv, m);
 1525                         so->so_rcv.sb_mb = m_free(m);
 1526                         m = so->so_rcv.sb_mb;
 1527                         sockbuf_pushsync(&so->so_rcv, nextrecord);
 1528                 }
 1529         }
 1530 
 1531         /*
 1532          * Process one or more MT_CONTROL mbufs present before any data mbufs
 1533          * in the first mbuf chain on the socket buffer.  If MSG_PEEK, we
 1534          * just copy the data; if !MSG_PEEK, we call into the protocol to
 1535          * perform externalization (or freeing if controlp == NULL).
 1536          */
 1537         if (m != NULL && m->m_type == MT_CONTROL) {
 1538                 struct mbuf *cm = NULL, *cmn;
 1539                 struct mbuf **cme = &cm;
 1540 
 1541                 do {
 1542                         if (flags & MSG_PEEK) {
 1543                                 if (controlp != NULL) {
 1544                                         *controlp = m_copy(m, 0, m->m_len);
 1545                                         controlp = &(*controlp)->m_next;
 1546                                 }
 1547                                 m = m->m_next;
 1548                         } else {
 1549                                 sbfree(&so->so_rcv, m);
 1550                                 so->so_rcv.sb_mb = m->m_next;
 1551                                 m->m_next = NULL;
 1552                                 *cme = m;
 1553                                 cme = &(*cme)->m_next;
 1554                                 m = so->so_rcv.sb_mb;
 1555                         }
 1556                 } while (m != NULL && m->m_type == MT_CONTROL);
 1557                 if ((flags & MSG_PEEK) == 0)
 1558                         sockbuf_pushsync(&so->so_rcv, nextrecord);
 1559                 while (cm != NULL) {
 1560                         cmn = cm->m_next;
 1561                         cm->m_next = NULL;
 1562                         if (pr->pr_domain->dom_externalize != NULL) {
 1563                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1564                                 error = (*pr->pr_domain->dom_externalize)
 1565                                     (cm, controlp);
 1566                                 SOCKBUF_LOCK(&so->so_rcv);
 1567                         } else if (controlp != NULL)
 1568                                 *controlp = cm;
 1569                         else
 1570                                 m_freem(cm);
 1571                         if (controlp != NULL) {
 1572                                 orig_resid = 0;
 1573                                 while (*controlp != NULL)
 1574                                         controlp = &(*controlp)->m_next;
 1575                         }
 1576                         cm = cmn;
 1577                 }
 1578                 if (m != NULL)
 1579                         nextrecord = so->so_rcv.sb_mb->m_nextpkt;
 1580                 else
 1581                         nextrecord = so->so_rcv.sb_mb;
 1582                 orig_resid = 0;
 1583         }
 1584         if (m != NULL) {
 1585                 if ((flags & MSG_PEEK) == 0) {
 1586                         KASSERT(m->m_nextpkt == nextrecord,
 1587                             ("soreceive: post-control, nextrecord !sync"));
 1588                         if (nextrecord == NULL) {
 1589                                 KASSERT(so->so_rcv.sb_mb == m,
 1590                                     ("soreceive: post-control, sb_mb!=m"));
 1591                                 KASSERT(so->so_rcv.sb_lastrecord == m,
 1592                                     ("soreceive: post-control, lastrecord!=m"));
 1593                         }
 1594                 }
 1595                 type = m->m_type;
 1596                 if (type == MT_OOBDATA)
 1597                         flags |= MSG_OOB;
 1598         } else {
 1599                 if ((flags & MSG_PEEK) == 0) {
 1600                         KASSERT(so->so_rcv.sb_mb == nextrecord,
 1601                             ("soreceive: sb_mb != nextrecord"));
 1602                         if (so->so_rcv.sb_mb == NULL) {
 1603                                 KASSERT(so->so_rcv.sb_lastrecord == NULL,
 1604                                     ("soreceive: sb_lastercord != NULL"));
 1605                         }
 1606                 }
 1607         }
 1608         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1609         SBLASTRECORDCHK(&so->so_rcv);
 1610         SBLASTMBUFCHK(&so->so_rcv);
 1611 
 1612         /*
 1613          * Now continue to read any data mbufs off of the head of the socket
 1614          * buffer until the read request is satisfied.  Note that 'type' is
 1615          * used to store the type of any mbuf reads that have happened so far
 1616          * such that soreceive() can stop reading if the type changes, which
 1617          * causes soreceive() to return only one of regular data and inline
 1618          * out-of-band data in a single socket receive operation.
 1619          */
 1620         moff = 0;
 1621         offset = 0;
 1622         while (m != NULL && uio->uio_resid > 0 && error == 0) {
 1623                 /*
 1624                  * If the type of mbuf has changed since the last mbuf
 1625                  * examined ('type'), end the receive operation.
 1626                  */
 1627                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1628                 if (m->m_type == MT_OOBDATA) {
 1629                         if (type != MT_OOBDATA)
 1630                                 break;
 1631                 } else if (type == MT_OOBDATA)
 1632                         break;
 1633                 else
 1634                     KASSERT(m->m_type == MT_DATA,
 1635                         ("m->m_type == %d", m->m_type));
 1636                 so->so_rcv.sb_state &= ~SBS_RCVATMARK;
 1637                 len = uio->uio_resid;
 1638                 if (so->so_oobmark && len > so->so_oobmark - offset)
 1639                         len = so->so_oobmark - offset;
 1640                 if (len > m->m_len - moff)
 1641                         len = m->m_len - moff;
 1642                 /*
 1643                  * If mp is set, just pass back the mbufs.  Otherwise copy
 1644                  * them out via the uio, then free.  Sockbuf must be
 1645                  * consistent here (points to current mbuf, it points to next
 1646                  * record) when we drop priority; we must note any additions
 1647                  * to the sockbuf when we block interrupts again.
 1648                  */
 1649                 if (mp == NULL) {
 1650                         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1651                         SBLASTRECORDCHK(&so->so_rcv);
 1652                         SBLASTMBUFCHK(&so->so_rcv);
 1653                         SOCKBUF_UNLOCK(&so->so_rcv);
 1654 #ifdef ZERO_COPY_SOCKETS
 1655                         if (so_zero_copy_receive) {
 1656                                 int disposable;
 1657 
 1658                                 if ((m->m_flags & M_EXT)
 1659                                  && (m->m_ext.ext_type == EXT_DISPOSABLE))
 1660                                         disposable = 1;
 1661                                 else
 1662                                         disposable = 0;
 1663 
 1664                                 error = uiomoveco(mtod(m, char *) + moff,
 1665                                                   (int)len, uio,
 1666                                                   disposable);
 1667                         } else
 1668 #endif /* ZERO_COPY_SOCKETS */
 1669                         error = uiomove(mtod(m, char *) + moff, (int)len, uio);
 1670                         SOCKBUF_LOCK(&so->so_rcv);
 1671                         if (error) {
 1672                                 /*
 1673                                  * The MT_SONAME mbuf has already been removed
 1674                                  * from the record, so it is necessary to
 1675                                  * remove the data mbufs, if any, to preserve
 1676                                  * the invariant in the case of PR_ADDR that
 1677                                  * requires MT_SONAME mbufs at the head of
 1678                                  * each record.
 1679                                  */
 1680                                 if (m && pr->pr_flags & PR_ATOMIC &&
 1681                                     ((flags & MSG_PEEK) == 0))
 1682                                         (void)sbdroprecord_locked(&so->so_rcv);
 1683                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1684                                 goto release;
 1685                         }
 1686                 } else
 1687                         uio->uio_resid -= len;
 1688                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1689                 if (len == m->m_len - moff) {
 1690                         if (m->m_flags & M_EOR)
 1691                                 flags |= MSG_EOR;
 1692                         if (flags & MSG_PEEK) {
 1693                                 m = m->m_next;
 1694                                 moff = 0;
 1695                         } else {
 1696                                 nextrecord = m->m_nextpkt;
 1697                                 sbfree(&so->so_rcv, m);
 1698                                 if (mp != NULL) {
 1699                                         *mp = m;
 1700                                         mp = &m->m_next;
 1701                                         so->so_rcv.sb_mb = m = m->m_next;
 1702                                         *mp = NULL;
 1703                                 } else {
 1704                                         so->so_rcv.sb_mb = m_free(m);
 1705                                         m = so->so_rcv.sb_mb;
 1706                                 }
 1707                                 sockbuf_pushsync(&so->so_rcv, nextrecord);
 1708                                 SBLASTRECORDCHK(&so->so_rcv);
 1709                                 SBLASTMBUFCHK(&so->so_rcv);
 1710                         }
 1711                 } else {
 1712                         if (flags & MSG_PEEK)
 1713                                 moff += len;
 1714                         else {
 1715                                 if (mp != NULL) {
 1716                                         int copy_flag;
 1717 
 1718                                         if (flags & MSG_DONTWAIT)
 1719                                                 copy_flag = M_DONTWAIT;
 1720                                         else
 1721                                                 copy_flag = M_TRYWAIT;
 1722                                         if (copy_flag == M_TRYWAIT)
 1723                                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1724                                         *mp = m_copym(m, 0, len, copy_flag);
 1725                                         if (copy_flag == M_TRYWAIT)
 1726                                                 SOCKBUF_LOCK(&so->so_rcv);
 1727                                         if (*mp == NULL) {
 1728                                                 /*
 1729                                                  * m_copym() couldn't
 1730                                                  * allocate an mbuf.  Adjust
 1731                                                  * uio_resid back (it was
 1732                                                  * adjusted down by len
 1733                                                  * bytes, which we didn't end
 1734                                                  * up "copying" over).
 1735                                                  */
 1736                                                 uio->uio_resid += len;
 1737                                                 break;
 1738                                         }
 1739                                 }
 1740                                 m->m_data += len;
 1741                                 m->m_len -= len;
 1742                                 so->so_rcv.sb_cc -= len;
 1743                         }
 1744                 }
 1745                 SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1746                 if (so->so_oobmark) {
 1747                         if ((flags & MSG_PEEK) == 0) {
 1748                                 so->so_oobmark -= len;
 1749                                 if (so->so_oobmark == 0) {
 1750                                         so->so_rcv.sb_state |= SBS_RCVATMARK;
 1751                                         break;
 1752                                 }
 1753                         } else {
 1754                                 offset += len;
 1755                                 if (offset == so->so_oobmark)
 1756                                         break;
 1757                         }
 1758                 }
 1759                 if (flags & MSG_EOR)
 1760                         break;
 1761                 /*
 1762                  * If the MSG_WAITALL flag is set (for non-atomic socket), we
 1763                  * must not quit until "uio->uio_resid == 0" or an error
 1764                  * termination.  If a signal/timeout occurs, return with a
 1765                  * short count but without error.  Keep sockbuf locked
 1766                  * against other readers.
 1767                  */
 1768                 while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 &&
 1769                     !sosendallatonce(so) && nextrecord == NULL) {
 1770                         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1771                         if (so->so_error || so->so_rcv.sb_state & SBS_CANTRCVMORE)
 1772                                 break;
 1773                         /*
 1774                          * Notify the protocol that some data has been
 1775                          * drained before blocking.
 1776                          */
 1777                         if (pr->pr_flags & PR_WANTRCVD) {
 1778                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1779                                 (*pr->pr_usrreqs->pru_rcvd)(so, flags);
 1780                                 SOCKBUF_LOCK(&so->so_rcv);
 1781                         }
 1782                         SBLASTRECORDCHK(&so->so_rcv);
 1783                         SBLASTMBUFCHK(&so->so_rcv);
 1784                         error = sbwait(&so->so_rcv);
 1785                         if (error) {
 1786                                 SOCKBUF_UNLOCK(&so->so_rcv);
 1787                                 goto release;
 1788                         }
 1789                         m = so->so_rcv.sb_mb;
 1790                         if (m != NULL)
 1791                                 nextrecord = m->m_nextpkt;
 1792                 }
 1793         }
 1794 
 1795         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1796         if (m != NULL && pr->pr_flags & PR_ATOMIC) {
 1797                 flags |= MSG_TRUNC;
 1798                 if ((flags & MSG_PEEK) == 0)
 1799                         (void) sbdroprecord_locked(&so->so_rcv);
 1800         }
 1801         if ((flags & MSG_PEEK) == 0) {
 1802                 if (m == NULL) {
 1803                         /*
 1804                          * First part is an inline SB_EMPTY_FIXUP().  Second
 1805                          * part makes sure sb_lastrecord is up-to-date if
 1806                          * there is still data in the socket buffer.
 1807                          */
 1808                         so->so_rcv.sb_mb = nextrecord;
 1809                         if (so->so_rcv.sb_mb == NULL) {
 1810                                 so->so_rcv.sb_mbtail = NULL;
 1811                                 so->so_rcv.sb_lastrecord = NULL;
 1812                         } else if (nextrecord->m_nextpkt == NULL)
 1813                                 so->so_rcv.sb_lastrecord = nextrecord;
 1814                 }
 1815                 SBLASTRECORDCHK(&so->so_rcv);
 1816                 SBLASTMBUFCHK(&so->so_rcv);
 1817                 /*
 1818                  * If soreceive() is being done from the socket callback,
 1819                  * then don't need to generate ACK to peer to update window,
 1820                  * since ACK will be generated on return to TCP.
 1821                  */
 1822                 if (!(flags & MSG_SOCALLBCK) &&
 1823                     (pr->pr_flags & PR_WANTRCVD)) {
 1824                         SOCKBUF_UNLOCK(&so->so_rcv);
 1825                         (*pr->pr_usrreqs->pru_rcvd)(so, flags);
 1826                         SOCKBUF_LOCK(&so->so_rcv);
 1827                 }
 1828         }
 1829         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 1830         if (orig_resid == uio->uio_resid && orig_resid &&
 1831             (flags & MSG_EOR) == 0 && (so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0) {
 1832                 SOCKBUF_UNLOCK(&so->so_rcv);
 1833                 goto restart;
 1834         }
 1835         SOCKBUF_UNLOCK(&so->so_rcv);
 1836 
 1837         if (flagsp != NULL)
 1838                 *flagsp |= flags;
 1839 release:
 1840         sbunlock(&so->so_rcv);
 1841         return (error);
 1842 }
 1843 
 1844 int
 1845 soreceive(struct socket *so, struct sockaddr **psa, struct uio *uio,
 1846     struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 1847 {
 1848 
 1849         /* XXXRW: Temporary debugging. */
 1850         KASSERT(so->so_proto->pr_usrreqs->pru_soreceive != soreceive,
 1851             ("soreceive: protocol calls soreceive"));
 1852 
 1853         return (so->so_proto->pr_usrreqs->pru_soreceive(so, psa, uio, mp0,
 1854             controlp, flagsp));
 1855 }
 1856 
 1857 int
 1858 soshutdown(struct socket *so, int how)
 1859 {
 1860         struct protosw *pr = so->so_proto;
 1861 
 1862         if (!(how == SHUT_RD || how == SHUT_WR || how == SHUT_RDWR))
 1863                 return (EINVAL);
 1864 
 1865         if (how != SHUT_WR)
 1866                 sorflush(so);
 1867         if (how != SHUT_RD)
 1868                 return ((*pr->pr_usrreqs->pru_shutdown)(so));
 1869         return (0);
 1870 }
 1871 
 1872 void
 1873 sorflush(struct socket *so)
 1874 {
 1875         struct sockbuf *sb = &so->so_rcv;
 1876         struct protosw *pr = so->so_proto;
 1877         struct sockbuf asb;
 1878 
 1879         /*
 1880          * XXXRW: This is quite ugly.  Previously, this code made a copy of
 1881          * the socket buffer, then zero'd the original to clear the buffer
 1882          * fields.  However, with mutexes in the socket buffer, this causes
 1883          * problems.  We only clear the zeroable bits of the original;
 1884          * however, we have to initialize and destroy the mutex in the copy
 1885          * so that dom_dispose() and sbrelease() can lock t as needed.
 1886          */
 1887 
 1888         /*
 1889          * Dislodge threads currently blocked in receive and wait to acquire
 1890          * a lock against other simultaneous readers before clearing the
 1891          * socket buffer.  Don't let our acquire be interrupted by a signal
 1892          * despite any existing socket disposition on interruptable waiting.
 1893          */
 1894         socantrcvmore(so);
 1895         (void) sblock(sb, SBL_WAIT | SBL_NOINTR);
 1896 
 1897         /*
 1898          * Invalidate/clear most of the sockbuf structure, but leave selinfo
 1899          * and mutex data unchanged.
 1900          */
 1901         SOCKBUF_LOCK(sb);
 1902         bzero(&asb, offsetof(struct sockbuf, sb_startzero));
 1903         bcopy(&sb->sb_startzero, &asb.sb_startzero,
 1904             sizeof(*sb) - offsetof(struct sockbuf, sb_startzero));
 1905         bzero(&sb->sb_startzero,
 1906             sizeof(*sb) - offsetof(struct sockbuf, sb_startzero));
 1907         SOCKBUF_UNLOCK(sb);
 1908         sbunlock(sb);
 1909 
 1910         SOCKBUF_LOCK_INIT(&asb, "so_rcv");
 1911         if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL)
 1912                 (*pr->pr_domain->dom_dispose)(asb.sb_mb);
 1913         sbrelease(&asb, so);
 1914         SOCKBUF_LOCK_DESTROY(&asb);
 1915 }
 1916 
 1917 /*
 1918  * Perhaps this routine, and sooptcopyout(), below, ought to come in an
 1919  * additional variant to handle the case where the option value needs to be
 1920  * some kind of integer, but not a specific size.  In addition to their use
 1921  * here, these functions are also called by the protocol-level pr_ctloutput()
 1922  * routines.
 1923  */
 1924 int
 1925 sooptcopyin(struct sockopt *sopt, void *buf, size_t len, size_t minlen)
 1926 {
 1927         size_t  valsize;
 1928 
 1929         /*
 1930          * If the user gives us more than we wanted, we ignore it, but if we
 1931          * don't get the minimum length the caller wants, we return EINVAL.
 1932          * On success, sopt->sopt_valsize is set to however much we actually
 1933          * retrieved.
 1934          */
 1935         if ((valsize = sopt->sopt_valsize) < minlen)
 1936                 return EINVAL;
 1937         if (valsize > len)
 1938                 sopt->sopt_valsize = valsize = len;
 1939 
 1940         if (sopt->sopt_td != NULL)
 1941                 return (copyin(sopt->sopt_val, buf, valsize));
 1942 
 1943         bcopy(sopt->sopt_val, buf, valsize);
 1944         return (0);
 1945 }
 1946 
 1947 /*
 1948  * Kernel version of setsockopt(2).
 1949  *
 1950  * XXX: optlen is size_t, not socklen_t
 1951  */
 1952 int
 1953 so_setsockopt(struct socket *so, int level, int optname, void *optval,
 1954     size_t optlen)
 1955 {
 1956         struct sockopt sopt;
 1957 
 1958         sopt.sopt_level = level;
 1959         sopt.sopt_name = optname;
 1960         sopt.sopt_dir = SOPT_SET;
 1961         sopt.sopt_val = optval;
 1962         sopt.sopt_valsize = optlen;
 1963         sopt.sopt_td = NULL;
 1964         return (sosetopt(so, &sopt));
 1965 }
 1966 
 1967 int
 1968 sosetopt(struct socket *so, struct sockopt *sopt)
 1969 {
 1970         int     error, optval;
 1971         struct  linger l;
 1972         struct  timeval tv;
 1973         u_long  val;
 1974 #ifdef MAC
 1975         struct mac extmac;
 1976 #endif
 1977 
 1978         error = 0;
 1979         if (sopt->sopt_level != SOL_SOCKET) {
 1980                 if (so->so_proto && so->so_proto->pr_ctloutput)
 1981                         return ((*so->so_proto->pr_ctloutput)
 1982                                   (so, sopt));
 1983                 error = ENOPROTOOPT;
 1984         } else {
 1985                 switch (sopt->sopt_name) {
 1986 #ifdef INET
 1987                 case SO_ACCEPTFILTER:
 1988                         error = do_setopt_accept_filter(so, sopt);
 1989                         if (error)
 1990                                 goto bad;
 1991                         break;
 1992 #endif
 1993                 case SO_LINGER:
 1994                         error = sooptcopyin(sopt, &l, sizeof l, sizeof l);
 1995                         if (error)
 1996                                 goto bad;
 1997 
 1998                         SOCK_LOCK(so);
 1999                         so->so_linger = l.l_linger;
 2000                         if (l.l_onoff)
 2001                                 so->so_options |= SO_LINGER;
 2002                         else
 2003                                 so->so_options &= ~SO_LINGER;
 2004                         SOCK_UNLOCK(so);
 2005                         break;
 2006 
 2007                 case SO_DEBUG:
 2008                 case SO_KEEPALIVE:
 2009                 case SO_DONTROUTE:
 2010                 case SO_USELOOPBACK:
 2011                 case SO_BROADCAST:
 2012                 case SO_REUSEADDR:
 2013                 case SO_REUSEPORT:
 2014                 case SO_OOBINLINE:
 2015                 case SO_TIMESTAMP:
 2016                 case SO_BINTIME:
 2017                 case SO_NOSIGPIPE:
 2018                         error = sooptcopyin(sopt, &optval, sizeof optval,
 2019                                             sizeof optval);
 2020                         if (error)
 2021                                 goto bad;
 2022                         SOCK_LOCK(so);
 2023                         if (optval)
 2024                                 so->so_options |= sopt->sopt_name;
 2025                         else
 2026                                 so->so_options &= ~sopt->sopt_name;
 2027                         SOCK_UNLOCK(so);
 2028                         break;
 2029 
 2030                 case SO_SNDBUF:
 2031                 case SO_RCVBUF:
 2032                 case SO_SNDLOWAT:
 2033                 case SO_RCVLOWAT:
 2034                         error = sooptcopyin(sopt, &optval, sizeof optval,
 2035                                             sizeof optval);
 2036                         if (error)
 2037                                 goto bad;
 2038 
 2039                         /*
 2040                          * Values < 1 make no sense for any of these options,
 2041                          * so disallow them.
 2042                          */
 2043                         if (optval < 1) {
 2044                                 error = EINVAL;
 2045                                 goto bad;
 2046                         }
 2047 
 2048                         switch (sopt->sopt_name) {
 2049                         case SO_SNDBUF:
 2050                         case SO_RCVBUF:
 2051                                 if (sbreserve(sopt->sopt_name == SO_SNDBUF ?
 2052                                     &so->so_snd : &so->so_rcv, (u_long)optval,
 2053                                     so, curthread) == 0) {
 2054                                         error = ENOBUFS;
 2055                                         goto bad;
 2056                                 }
 2057                                 (sopt->sopt_name == SO_SNDBUF ? &so->so_snd :
 2058                                     &so->so_rcv)->sb_flags &= ~SB_AUTOSIZE;
 2059                                 break;
 2060 
 2061                         /*
 2062                          * Make sure the low-water is never greater than the
 2063                          * high-water.
 2064                          */
 2065                         case SO_SNDLOWAT:
 2066                                 SOCKBUF_LOCK(&so->so_snd);
 2067                                 so->so_snd.sb_lowat =
 2068                                     (optval > so->so_snd.sb_hiwat) ?
 2069                                     so->so_snd.sb_hiwat : optval;
 2070                                 SOCKBUF_UNLOCK(&so->so_snd);
 2071                                 break;
 2072                         case SO_RCVLOWAT:
 2073                                 SOCKBUF_LOCK(&so->so_rcv);
 2074                                 so->so_rcv.sb_lowat =
 2075                                     (optval > so->so_rcv.sb_hiwat) ?
 2076                                     so->so_rcv.sb_hiwat : optval;
 2077                                 SOCKBUF_UNLOCK(&so->so_rcv);
 2078                                 break;
 2079                         }
 2080                         break;
 2081 
 2082                 case SO_SNDTIMEO:
 2083                 case SO_RCVTIMEO:
 2084 #ifdef COMPAT_IA32
 2085                         if (curthread->td_proc->p_sysent == &ia32_freebsd_sysvec) {
 2086                                 struct timeval32 tv32;
 2087 
 2088                                 error = sooptcopyin(sopt, &tv32, sizeof tv32,
 2089                                     sizeof tv32);
 2090                                 CP(tv32, tv, tv_sec);
 2091                                 CP(tv32, tv, tv_usec);
 2092                         } else
 2093 #endif
 2094                                 error = sooptcopyin(sopt, &tv, sizeof tv,
 2095                                     sizeof tv);
 2096                         if (error)
 2097                                 goto bad;
 2098 
 2099                         /* assert(hz > 0); */
 2100                         if (tv.tv_sec < 0 || tv.tv_sec > INT_MAX / hz ||
 2101                             tv.tv_usec < 0 || tv.tv_usec >= 1000000) {
 2102                                 error = EDOM;
 2103                                 goto bad;
 2104                         }
 2105                         /* assert(tick > 0); */
 2106                         /* assert(ULONG_MAX - INT_MAX >= 1000000); */
 2107                         val = (u_long)(tv.tv_sec * hz) + tv.tv_usec / tick;
 2108                         if (val > INT_MAX) {
 2109                                 error = EDOM;
 2110                                 goto bad;
 2111                         }
 2112                         if (val == 0 && tv.tv_usec != 0)
 2113                                 val = 1;
 2114 
 2115                         switch (sopt->sopt_name) {
 2116                         case SO_SNDTIMEO:
 2117                                 so->so_snd.sb_timeo = val;
 2118                                 break;
 2119                         case SO_RCVTIMEO:
 2120                                 so->so_rcv.sb_timeo = val;
 2121                                 break;
 2122                         }
 2123                         break;
 2124 
 2125                 case SO_LABEL:
 2126 #ifdef MAC
 2127                         error = sooptcopyin(sopt, &extmac, sizeof extmac,
 2128                             sizeof extmac);
 2129                         if (error)
 2130                                 goto bad;
 2131                         error = mac_setsockopt_label(sopt->sopt_td->td_ucred,
 2132                             so, &extmac);
 2133 #else
 2134                         error = EOPNOTSUPP;
 2135 #endif
 2136                         break;
 2137 
 2138                 default:
 2139                         error = ENOPROTOOPT;
 2140                         break;
 2141                 }
 2142                 if (error == 0 && so->so_proto != NULL &&
 2143                     so->so_proto->pr_ctloutput != NULL) {
 2144                         (void) ((*so->so_proto->pr_ctloutput)
 2145                                   (so, sopt));
 2146                 }
 2147         }
 2148 bad:
 2149         return (error);
 2150 }
 2151 
 2152 /*
 2153  * Helper routine for getsockopt.
 2154  */
 2155 int
 2156 sooptcopyout(struct sockopt *sopt, const void *buf, size_t len)
 2157 {
 2158         int     error;
 2159         size_t  valsize;
 2160 
 2161         error = 0;
 2162 
 2163         /*
 2164          * Documented get behavior is that we always return a value, possibly
 2165          * truncated to fit in the user's buffer.  Traditional behavior is
 2166          * that we always tell the user precisely how much we copied, rather
 2167          * than something useful like the total amount we had available for
 2168          * her.  Note that this interface is not idempotent; the entire
 2169          * answer must generated ahead of time.
 2170          */
 2171         valsize = min(len, sopt->sopt_valsize);
 2172         sopt->sopt_valsize = valsize;
 2173         if (sopt->sopt_val != NULL) {
 2174                 if (sopt->sopt_td != NULL)
 2175                         error = copyout(buf, sopt->sopt_val, valsize);
 2176                 else
 2177                         bcopy(buf, sopt->sopt_val, valsize);
 2178         }
 2179         return (error);
 2180 }
 2181 
 2182 int
 2183 sogetopt(struct socket *so, struct sockopt *sopt)
 2184 {
 2185         int     error, optval;
 2186         struct  linger l;
 2187         struct  timeval tv;
 2188 #ifdef MAC
 2189         struct mac extmac;
 2190 #endif
 2191 
 2192         error = 0;
 2193         if (sopt->sopt_level != SOL_SOCKET) {
 2194                 if (so->so_proto && so->so_proto->pr_ctloutput) {
 2195                         return ((*so->so_proto->pr_ctloutput)
 2196                                   (so, sopt));
 2197                 } else
 2198                         return (ENOPROTOOPT);
 2199         } else {
 2200                 switch (sopt->sopt_name) {
 2201 #ifdef INET
 2202                 case SO_ACCEPTFILTER:
 2203                         error = do_getopt_accept_filter(so, sopt);
 2204                         break;
 2205 #endif
 2206                 case SO_LINGER:
 2207                         SOCK_LOCK(so);
 2208                         l.l_onoff = so->so_options & SO_LINGER;
 2209                         l.l_linger = so->so_linger;
 2210                         SOCK_UNLOCK(so);
 2211                         error = sooptcopyout(sopt, &l, sizeof l);
 2212                         break;
 2213 
 2214                 case SO_USELOOPBACK:
 2215                 case SO_DONTROUTE:
 2216                 case SO_DEBUG:
 2217                 case SO_KEEPALIVE:
 2218                 case SO_REUSEADDR:
 2219                 case SO_REUSEPORT:
 2220                 case SO_BROADCAST:
 2221                 case SO_OOBINLINE:
 2222                 case SO_ACCEPTCONN:
 2223                 case SO_TIMESTAMP:
 2224                 case SO_BINTIME:
 2225                 case SO_NOSIGPIPE:
 2226                         optval = so->so_options & sopt->sopt_name;
 2227 integer:
 2228                         error = sooptcopyout(sopt, &optval, sizeof optval);
 2229                         break;
 2230 
 2231                 case SO_TYPE:
 2232                         optval = so->so_type;
 2233                         goto integer;
 2234 
 2235                 case SO_ERROR:
 2236                         SOCK_LOCK(so);
 2237                         optval = so->so_error;
 2238                         so->so_error = 0;
 2239                         SOCK_UNLOCK(so);
 2240                         goto integer;
 2241 
 2242                 case SO_SNDBUF:
 2243                         optval = so->so_snd.sb_hiwat;
 2244                         goto integer;
 2245 
 2246                 case SO_RCVBUF:
 2247                         optval = so->so_rcv.sb_hiwat;
 2248                         goto integer;
 2249 
 2250                 case SO_SNDLOWAT:
 2251                         optval = so->so_snd.sb_lowat;
 2252                         goto integer;
 2253 
 2254                 case SO_RCVLOWAT:
 2255                         optval = so->so_rcv.sb_lowat;
 2256                         goto integer;
 2257 
 2258                 case SO_SNDTIMEO:
 2259                 case SO_RCVTIMEO:
 2260                         optval = (sopt->sopt_name == SO_SNDTIMEO ?
 2261                                   so->so_snd.sb_timeo : so->so_rcv.sb_timeo);
 2262 
 2263                         tv.tv_sec = optval / hz;
 2264                         tv.tv_usec = (optval % hz) * tick;
 2265 #ifdef COMPAT_IA32
 2266                         if (curthread->td_proc->p_sysent == &ia32_freebsd_sysvec) {
 2267                                 struct timeval32 tv32;
 2268 
 2269                                 CP(tv, tv32, tv_sec);
 2270                                 CP(tv, tv32, tv_usec);
 2271                                 error = sooptcopyout(sopt, &tv32, sizeof tv32);
 2272                         } else
 2273 #endif
 2274                                 error = sooptcopyout(sopt, &tv, sizeof tv);
 2275                         break;
 2276 
 2277                 case SO_LABEL:
 2278 #ifdef MAC
 2279                         error = sooptcopyin(sopt, &extmac, sizeof(extmac),
 2280                             sizeof(extmac));
 2281                         if (error)
 2282                                 return (error);
 2283                         error = mac_getsockopt_label(sopt->sopt_td->td_ucred,
 2284                             so, &extmac);
 2285                         if (error)
 2286                                 return (error);
 2287                         error = sooptcopyout(sopt, &extmac, sizeof extmac);
 2288 #else
 2289                         error = EOPNOTSUPP;
 2290 #endif
 2291                         break;
 2292 
 2293                 case SO_PEERLABEL:
 2294 #ifdef MAC
 2295                         error = sooptcopyin(sopt, &extmac, sizeof(extmac),
 2296                             sizeof(extmac));
 2297                         if (error)
 2298                                 return (error);
 2299                         error = mac_getsockopt_peerlabel(
 2300                             sopt->sopt_td->td_ucred, so, &extmac);
 2301                         if (error)
 2302                                 return (error);
 2303                         error = sooptcopyout(sopt, &extmac, sizeof extmac);
 2304 #else
 2305                         error = EOPNOTSUPP;
 2306 #endif
 2307                         break;
 2308 
 2309                 case SO_LISTENQLIMIT:
 2310                         optval = so->so_qlimit;
 2311                         goto integer;
 2312 
 2313                 case SO_LISTENQLEN:
 2314                         optval = so->so_qlen;
 2315                         goto integer;
 2316 
 2317                 case SO_LISTENINCQLEN:
 2318                         optval = so->so_incqlen;
 2319                         goto integer;
 2320 
 2321                 default:
 2322                         error = ENOPROTOOPT;
 2323                         break;
 2324                 }
 2325                 return (error);
 2326         }
 2327 }
 2328 
 2329 /* XXX; prepare mbuf for (__FreeBSD__ < 3) routines. */
 2330 int
 2331 soopt_getm(struct sockopt *sopt, struct mbuf **mp)
 2332 {
 2333         struct mbuf *m, *m_prev;
 2334         int sopt_size = sopt->sopt_valsize;
 2335 
 2336         MGET(m, sopt->sopt_td ? M_TRYWAIT : M_DONTWAIT, MT_DATA);
 2337         if (m == NULL)
 2338                 return ENOBUFS;
 2339         if (sopt_size > MLEN) {
 2340                 MCLGET(m, sopt->sopt_td ? M_TRYWAIT : M_DONTWAIT);
 2341                 if ((m->m_flags & M_EXT) == 0) {
 2342                         m_free(m);
 2343                         return ENOBUFS;
 2344                 }
 2345                 m->m_len = min(MCLBYTES, sopt_size);
 2346         } else {
 2347                 m->m_len = min(MLEN, sopt_size);
 2348         }
 2349         sopt_size -= m->m_len;
 2350         *mp = m;
 2351         m_prev = m;
 2352 
 2353         while (sopt_size) {
 2354                 MGET(m, sopt->sopt_td ? M_TRYWAIT : M_DONTWAIT, MT_DATA);
 2355                 if (m == NULL) {
 2356                         m_freem(*mp);
 2357                         return ENOBUFS;
 2358                 }
 2359                 if (sopt_size > MLEN) {
 2360                         MCLGET(m, sopt->sopt_td != NULL ? M_TRYWAIT :
 2361                             M_DONTWAIT);
 2362                         if ((m->m_flags & M_EXT) == 0) {
 2363                                 m_freem(m);
 2364                                 m_freem(*mp);
 2365                                 return ENOBUFS;
 2366                         }
 2367                         m->m_len = min(MCLBYTES, sopt_size);
 2368                 } else {
 2369                         m->m_len = min(MLEN, sopt_size);
 2370                 }
 2371                 sopt_size -= m->m_len;
 2372                 m_prev->m_next = m;
 2373                 m_prev = m;
 2374         }
 2375         return (0);
 2376 }
 2377 
 2378 /* XXX; copyin sopt data into mbuf chain for (__FreeBSD__ < 3) routines. */
 2379 int
 2380 soopt_mcopyin(struct sockopt *sopt, struct mbuf *m)
 2381 {
 2382         struct mbuf *m0 = m;
 2383 
 2384         if (sopt->sopt_val == NULL)
 2385                 return (0);
 2386         while (m != NULL && sopt->sopt_valsize >= m->m_len) {
 2387                 if (sopt->sopt_td != NULL) {
 2388                         int error;
 2389 
 2390                         error = copyin(sopt->sopt_val, mtod(m, char *),
 2391                                        m->m_len);
 2392                         if (error != 0) {
 2393                                 m_freem(m0);
 2394                                 return(error);
 2395                         }
 2396                 } else
 2397                         bcopy(sopt->sopt_val, mtod(m, char *), m->m_len);
 2398                 sopt->sopt_valsize -= m->m_len;
 2399                 sopt->sopt_val = (char *)sopt->sopt_val + m->m_len;
 2400                 m = m->m_next;
 2401         }
 2402         if (m != NULL) /* should be allocated enoughly at ip6_sooptmcopyin() */
 2403                 panic("ip6_sooptmcopyin");
 2404         return (0);
 2405 }
 2406 
 2407 /* XXX; copyout mbuf chain data into soopt for (__FreeBSD__ < 3) routines. */
 2408 int
 2409 soopt_mcopyout(struct sockopt *sopt, struct mbuf *m)
 2410 {
 2411         struct mbuf *m0 = m;
 2412         size_t valsize = 0;
 2413 
 2414         if (sopt->sopt_val == NULL)
 2415                 return (0);
 2416         while (m != NULL && sopt->sopt_valsize >= m->m_len) {
 2417                 if (sopt->sopt_td != NULL) {
 2418                         int error;
 2419 
 2420                         error = copyout(mtod(m, char *), sopt->sopt_val,
 2421                                        m->m_len);
 2422                         if (error != 0) {
 2423                                 m_freem(m0);
 2424                                 return(error);
 2425                         }
 2426                 } else
 2427                         bcopy(mtod(m, char *), sopt->sopt_val, m->m_len);
 2428                sopt->sopt_valsize -= m->m_len;
 2429                sopt->sopt_val = (char *)sopt->sopt_val + m->m_len;
 2430                valsize += m->m_len;
 2431                m = m->m_next;
 2432         }
 2433         if (m != NULL) {
 2434                 /* enough soopt buffer should be given from user-land */
 2435                 m_freem(m0);
 2436                 return(EINVAL);
 2437         }
 2438         sopt->sopt_valsize = valsize;
 2439         return (0);
 2440 }
 2441 
 2442 /*
 2443  * sohasoutofband(): protocol notifies socket layer of the arrival of new
 2444  * out-of-band data, which will then notify socket consumers.
 2445  */
 2446 void
 2447 sohasoutofband(struct socket *so)
 2448 {
 2449 
 2450         if (so->so_sigio != NULL)
 2451                 pgsigio(&so->so_sigio, SIGURG, 0);
 2452         selwakeuppri(&so->so_rcv.sb_sel, PSOCK);
 2453 }
 2454 
 2455 int
 2456 sopoll(struct socket *so, int events, struct ucred *active_cred,
 2457     struct thread *td)
 2458 {
 2459 
 2460         /* XXXRW: Temporary debugging. */
 2461         KASSERT(so->so_proto->pr_usrreqs->pru_sopoll != sopoll,
 2462             ("sopoll: protocol calls sopoll"));
 2463 
 2464         return (so->so_proto->pr_usrreqs->pru_sopoll(so, events, active_cred,
 2465             td));
 2466 }
 2467 
 2468 int
 2469 sopoll_generic(struct socket *so, int events, struct ucred *active_cred,
 2470     struct thread *td)
 2471 {
 2472         int revents = 0;
 2473 
 2474         SOCKBUF_LOCK(&so->so_snd);
 2475         SOCKBUF_LOCK(&so->so_rcv);
 2476         if (events & (POLLIN | POLLRDNORM))
 2477                 if (soreadable(so))
 2478                         revents |= events & (POLLIN | POLLRDNORM);
 2479 
 2480         if (events & POLLINIGNEOF)
 2481                 if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 2482                     !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 2483                         revents |= POLLINIGNEOF;
 2484 
 2485         if (events & (POLLOUT | POLLWRNORM))
 2486                 if (sowriteable(so))
 2487                         revents |= events & (POLLOUT | POLLWRNORM);
 2488 
 2489         if (events & (POLLPRI | POLLRDBAND))
 2490                 if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK))
 2491                         revents |= events & (POLLPRI | POLLRDBAND);
 2492 
 2493         if (revents == 0) {
 2494                 if (events &
 2495                     (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM |
 2496                      POLLRDBAND)) {
 2497                         selrecord(td, &so->so_rcv.sb_sel);
 2498                         so->so_rcv.sb_flags |= SB_SEL;
 2499                 }
 2500 
 2501                 if (events & (POLLOUT | POLLWRNORM)) {
 2502                         selrecord(td, &so->so_snd.sb_sel);
 2503                         so->so_snd.sb_flags |= SB_SEL;
 2504                 }
 2505         }
 2506 
 2507         SOCKBUF_UNLOCK(&so->so_rcv);
 2508         SOCKBUF_UNLOCK(&so->so_snd);
 2509         return (revents);
 2510 }
 2511 
 2512 int
 2513 soo_kqfilter(struct file *fp, struct knote *kn)
 2514 {
 2515         struct socket *so = kn->kn_fp->f_data;
 2516         struct sockbuf *sb;
 2517 
 2518         switch (kn->kn_filter) {
 2519         case EVFILT_READ:
 2520                 if (so->so_options & SO_ACCEPTCONN)
 2521                         kn->kn_fop = &solisten_filtops;
 2522                 else
 2523                         kn->kn_fop = &soread_filtops;
 2524                 sb = &so->so_rcv;
 2525                 break;
 2526         case EVFILT_WRITE:
 2527                 kn->kn_fop = &sowrite_filtops;
 2528                 sb = &so->so_snd;
 2529                 break;
 2530         default:
 2531                 return (EINVAL);
 2532         }
 2533 
 2534         SOCKBUF_LOCK(sb);
 2535         knlist_add(&sb->sb_sel.si_note, kn, 1);
 2536         sb->sb_flags |= SB_KNOTE;
 2537         SOCKBUF_UNLOCK(sb);
 2538         return (0);
 2539 }
 2540 
 2541 /*
 2542  * Some routines that return EOPNOTSUPP for entry points that are not
 2543  * supported by a protocol.  Fill in as needed.
 2544  */
 2545 int
 2546 pru_accept_notsupp(struct socket *so, struct sockaddr **nam)
 2547 {
 2548 
 2549         return EOPNOTSUPP;
 2550 }
 2551 
 2552 int
 2553 pru_attach_notsupp(struct socket *so, int proto, struct thread *td)
 2554 {
 2555 
 2556         return EOPNOTSUPP;
 2557 }
 2558 
 2559 int
 2560 pru_bind_notsupp(struct socket *so, struct sockaddr *nam, struct thread *td)
 2561 {
 2562 
 2563         return EOPNOTSUPP;
 2564 }
 2565 
 2566 int
 2567 pru_connect_notsupp(struct socket *so, struct sockaddr *nam, struct thread *td)
 2568 {
 2569 
 2570         return EOPNOTSUPP;
 2571 }
 2572 
 2573 int
 2574 pru_connect2_notsupp(struct socket *so1, struct socket *so2)
 2575 {
 2576 
 2577         return EOPNOTSUPP;
 2578 }
 2579 
 2580 int
 2581 pru_control_notsupp(struct socket *so, u_long cmd, caddr_t data,
 2582     struct ifnet *ifp, struct thread *td)
 2583 {
 2584 
 2585         return EOPNOTSUPP;
 2586 }
 2587 
 2588 int
 2589 pru_disconnect_notsupp(struct socket *so)
 2590 {
 2591 
 2592         return EOPNOTSUPP;
 2593 }
 2594 
 2595 int
 2596 pru_listen_notsupp(struct socket *so, int backlog, struct thread *td)
 2597 {
 2598 
 2599         return EOPNOTSUPP;
 2600 }
 2601 
 2602 int
 2603 pru_peeraddr_notsupp(struct socket *so, struct sockaddr **nam)
 2604 {
 2605 
 2606         return EOPNOTSUPP;
 2607 }
 2608 
 2609 int
 2610 pru_rcvd_notsupp(struct socket *so, int flags)
 2611 {
 2612 
 2613         return EOPNOTSUPP;
 2614 }
 2615 
 2616 int
 2617 pru_rcvoob_notsupp(struct socket *so, struct mbuf *m, int flags)
 2618 {
 2619 
 2620         return EOPNOTSUPP;
 2621 }
 2622 
 2623 int
 2624 pru_send_notsupp(struct socket *so, int flags, struct mbuf *m,
 2625     struct sockaddr *addr, struct mbuf *control, struct thread *td)
 2626 {
 2627 
 2628         return EOPNOTSUPP;
 2629 }
 2630 
 2631 /*
 2632  * This isn't really a ``null'' operation, but it's the default one and
 2633  * doesn't do anything destructive.
 2634  */
 2635 int
 2636 pru_sense_null(struct socket *so, struct stat *sb)
 2637 {
 2638 
 2639         sb->st_blksize = so->so_snd.sb_hiwat;
 2640         return 0;
 2641 }
 2642 
 2643 int
 2644 pru_shutdown_notsupp(struct socket *so)
 2645 {
 2646 
 2647         return EOPNOTSUPP;
 2648 }
 2649 
 2650 int
 2651 pru_sockaddr_notsupp(struct socket *so, struct sockaddr **nam)
 2652 {
 2653 
 2654         return EOPNOTSUPP;
 2655 }
 2656 
 2657 int
 2658 pru_sosend_notsupp(struct socket *so, struct sockaddr *addr, struct uio *uio,
 2659     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
 2660 {
 2661 
 2662         return EOPNOTSUPP;
 2663 }
 2664 
 2665 int
 2666 pru_soreceive_notsupp(struct socket *so, struct sockaddr **paddr,
 2667     struct uio *uio, struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
 2668 {
 2669 
 2670         return EOPNOTSUPP;
 2671 }
 2672 
 2673 int
 2674 pru_sopoll_notsupp(struct socket *so, int events, struct ucred *cred,
 2675     struct thread *td)
 2676 {
 2677 
 2678         return EOPNOTSUPP;
 2679 }
 2680 
 2681 static void
 2682 filt_sordetach(struct knote *kn)
 2683 {
 2684         struct socket *so = kn->kn_fp->f_data;
 2685 
 2686         SOCKBUF_LOCK(&so->so_rcv);
 2687         knlist_remove(&so->so_rcv.sb_sel.si_note, kn, 1);
 2688         if (knlist_empty(&so->so_rcv.sb_sel.si_note))
 2689                 so->so_rcv.sb_flags &= ~SB_KNOTE;
 2690         SOCKBUF_UNLOCK(&so->so_rcv);
 2691 }
 2692 
 2693 /*ARGSUSED*/
 2694 static int
 2695 filt_soread(struct knote *kn, long hint)
 2696 {
 2697         struct socket *so;
 2698 
 2699         so = kn->kn_fp->f_data;
 2700         SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 2701 
 2702         kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl;
 2703         if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 2704                 kn->kn_flags |= EV_EOF;
 2705                 kn->kn_fflags = so->so_error;
 2706                 return (1);
 2707         } else if (so->so_error)        /* temporary udp error */
 2708                 return (1);
 2709         else if (kn->kn_sfflags & NOTE_LOWAT)
 2710                 return (kn->kn_data >= kn->kn_sdata);
 2711         else
 2712                 return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat);
 2713 }
 2714 
 2715 static void
 2716 filt_sowdetach(struct knote *kn)
 2717 {
 2718         struct socket *so = kn->kn_fp->f_data;
 2719 
 2720         SOCKBUF_LOCK(&so->so_snd);
 2721         knlist_remove(&so->so_snd.sb_sel.si_note, kn, 1);
 2722         if (knlist_empty(&so->so_snd.sb_sel.si_note))
 2723                 so->so_snd.sb_flags &= ~SB_KNOTE;
 2724         SOCKBUF_UNLOCK(&so->so_snd);
 2725 }
 2726 
 2727 /*ARGSUSED*/
 2728 static int
 2729 filt_sowrite(struct knote *kn, long hint)
 2730 {
 2731         struct socket *so;
 2732 
 2733         so = kn->kn_fp->f_data;
 2734         SOCKBUF_LOCK_ASSERT(&so->so_snd);
 2735         kn->kn_data = sbspace(&so->so_snd);
 2736         if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
 2737                 kn->kn_flags |= EV_EOF;
 2738                 kn->kn_fflags = so->so_error;
 2739                 return (1);
 2740         } else if (so->so_error)        /* temporary udp error */
 2741                 return (1);
 2742         else if (((so->so_state & SS_ISCONNECTED) == 0) &&
 2743             (so->so_proto->pr_flags & PR_CONNREQUIRED))
 2744                 return (0);
 2745         else if (kn->kn_sfflags & NOTE_LOWAT)
 2746                 return (kn->kn_data >= kn->kn_sdata);
 2747         else
 2748                 return (kn->kn_data >= so->so_snd.sb_lowat);
 2749 }
 2750 
 2751 /*ARGSUSED*/
 2752 static int
 2753 filt_solisten(struct knote *kn, long hint)
 2754 {
 2755         struct socket *so = kn->kn_fp->f_data;
 2756 
 2757         kn->kn_data = so->so_qlen;
 2758         return (! TAILQ_EMPTY(&so->so_comp));
 2759 }
 2760 
 2761 int
 2762 socheckuid(struct socket *so, uid_t uid)
 2763 {
 2764 
 2765         if (so == NULL)
 2766                 return (EPERM);
 2767         if (so->so_cred->cr_uid != uid)
 2768                 return (EPERM);
 2769         return (0);
 2770 }
 2771 
 2772 static int
 2773 sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
 2774 {
 2775         int error;
 2776         int val;
 2777 
 2778         val = somaxconn;
 2779         error = sysctl_handle_int(oidp, &val, 0, req);
 2780         if (error || !req->newptr )
 2781                 return (error);
 2782 
 2783         if (val < 1 || val > USHRT_MAX)
 2784                 return (EINVAL);
 2785 
 2786         somaxconn = val;
 2787         return (0);
 2788 }
 2789 
 2790 /*
 2791  * These functions are used by protocols to notify the socket layer (and its
 2792  * consumers) of state changes in the sockets driven by protocol-side events.
 2793  */
 2794 
 2795 /*
 2796  * Procedures to manipulate state flags of socket and do appropriate wakeups.
 2797  *
 2798  * Normal sequence from the active (originating) side is that
 2799  * soisconnecting() is called during processing of connect() call, resulting
 2800  * in an eventual call to soisconnected() if/when the connection is
 2801  * established.  When the connection is torn down soisdisconnecting() is
 2802  * called during processing of disconnect() call, and soisdisconnected() is
 2803  * called when the connection to the peer is totally severed.  The semantics
 2804  * of these routines are such that connectionless protocols can call
 2805  * soisconnected() and soisdisconnected() only, bypassing the in-progress
 2806  * calls when setting up a ``connection'' takes no time.
 2807  *
 2808  * From the passive side, a socket is created with two queues of sockets:
 2809  * so_incomp for connections in progress and so_comp for connections already
 2810  * made and awaiting user acceptance.  As a protocol is preparing incoming
 2811  * connections, it creates a socket structure queued on so_incomp by calling
 2812  * sonewconn().  When the connection is established, soisconnected() is
 2813  * called, and transfers the socket structure to so_comp, making it available
 2814  * to accept().
 2815  *
 2816  * If a socket is closed with sockets on either so_incomp or so_comp, these
 2817  * sockets are dropped.
 2818  *
 2819  * If higher-level protocols are implemented in the kernel, the wakeups done
 2820  * here will sometimes cause software-interrupt process scheduling.
 2821  */
 2822 void
 2823 soisconnecting(struct socket *so)
 2824 {
 2825 
 2826         SOCK_LOCK(so);
 2827         so->so_state &= ~(SS_ISCONNECTED|SS_ISDISCONNECTING);
 2828         so->so_state |= SS_ISCONNECTING;
 2829         SOCK_UNLOCK(so);
 2830 }
 2831 
 2832 void
 2833 soisconnected(struct socket *so)
 2834 {
 2835         struct socket *head;
 2836 
 2837         ACCEPT_LOCK();
 2838         SOCK_LOCK(so);
 2839         so->so_state &= ~(SS_ISCONNECTING|SS_ISDISCONNECTING|SS_ISCONFIRMING);
 2840         so->so_state |= SS_ISCONNECTED;
 2841         head = so->so_head;
 2842         if (head != NULL && (so->so_qstate & SQ_INCOMP)) {
 2843                 if ((so->so_options & SO_ACCEPTFILTER) == 0) {
 2844                         SOCK_UNLOCK(so);
 2845                         TAILQ_REMOVE(&head->so_incomp, so, so_list);
 2846                         head->so_incqlen--;
 2847                         so->so_qstate &= ~SQ_INCOMP;
 2848                         TAILQ_INSERT_TAIL(&head->so_comp, so, so_list);
 2849                         head->so_qlen++;
 2850                         so->so_qstate |= SQ_COMP;
 2851                         ACCEPT_UNLOCK();
 2852                         sorwakeup(head);
 2853                         wakeup_one(&head->so_timeo);
 2854                 } else {
 2855                         ACCEPT_UNLOCK();
 2856                         so->so_upcall =
 2857                             head->so_accf->so_accept_filter->accf_callback;
 2858                         so->so_upcallarg = head->so_accf->so_accept_filter_arg;
 2859                         so->so_rcv.sb_flags |= SB_UPCALL;
 2860                         so->so_options &= ~SO_ACCEPTFILTER;
 2861                         SOCK_UNLOCK(so);
 2862                         so->so_upcall(so, so->so_upcallarg, M_DONTWAIT);
 2863                 }
 2864                 return;
 2865         }
 2866         SOCK_UNLOCK(so);
 2867         ACCEPT_UNLOCK();
 2868         wakeup(&so->so_timeo);
 2869         sorwakeup(so);
 2870         sowwakeup(so);
 2871 }
 2872 
 2873 void
 2874 soisdisconnecting(struct socket *so)
 2875 {
 2876 
 2877         /*
 2878          * Note: This code assumes that SOCK_LOCK(so) and
 2879          * SOCKBUF_LOCK(&so->so_rcv) are the same.
 2880          */
 2881         SOCKBUF_LOCK(&so->so_rcv);
 2882         so->so_state &= ~SS_ISCONNECTING;
 2883         so->so_state |= SS_ISDISCONNECTING;
 2884         so->so_rcv.sb_state |= SBS_CANTRCVMORE;
 2885         sorwakeup_locked(so);
 2886         SOCKBUF_LOCK(&so->so_snd);
 2887         so->so_snd.sb_state |= SBS_CANTSENDMORE;
 2888         sowwakeup_locked(so);
 2889         wakeup(&so->so_timeo);
 2890 }
 2891 
 2892 void
 2893 soisdisconnected(struct socket *so)
 2894 {
 2895 
 2896         /*
 2897          * Note: This code assumes that SOCK_LOCK(so) and
 2898          * SOCKBUF_LOCK(&so->so_rcv) are the same.
 2899          */
 2900         SOCKBUF_LOCK(&so->so_rcv);
 2901         so->so_state &= ~(SS_ISCONNECTING|SS_ISCONNECTED|SS_ISDISCONNECTING);
 2902         so->so_state |= SS_ISDISCONNECTED;
 2903         so->so_rcv.sb_state |= SBS_CANTRCVMORE;
 2904         sorwakeup_locked(so);
 2905         SOCKBUF_LOCK(&so->so_snd);
 2906         so->so_snd.sb_state |= SBS_CANTSENDMORE;
 2907         sbdrop_locked(&so->so_snd, so->so_snd.sb_cc);
 2908         sowwakeup_locked(so);
 2909         wakeup(&so->so_timeo);
 2910 }
 2911 
 2912 /*
 2913  * Make a copy of a sockaddr in a malloced buffer of type M_SONAME.
 2914  */
 2915 struct sockaddr *
 2916 sodupsockaddr(const struct sockaddr *sa, int mflags)
 2917 {
 2918         struct sockaddr *sa2;
 2919 
 2920         sa2 = malloc(sa->sa_len, M_SONAME, mflags);
 2921         if (sa2)
 2922                 bcopy(sa, sa2, sa->sa_len);
 2923         return sa2;
 2924 }
 2925 
 2926 /*
 2927  * Create an external-format (``xsocket'') structure using the information in
 2928  * the kernel-format socket structure pointed to by so.  This is done to
 2929  * reduce the spew of irrelevant information over this interface, to isolate
 2930  * user code from changes in the kernel structure, and potentially to provide
 2931  * information-hiding if we decide that some of this information should be
 2932  * hidden from users.
 2933  */
 2934 void
 2935 sotoxsocket(struct socket *so, struct xsocket *xso)
 2936 {
 2937 
 2938         xso->xso_len = sizeof *xso;
 2939         xso->xso_so = so;
 2940         xso->so_type = so->so_type;
 2941         xso->so_options = so->so_options;
 2942         xso->so_linger = so->so_linger;
 2943         xso->so_state = so->so_state;
 2944         xso->so_pcb = so->so_pcb;
 2945         xso->xso_protocol = so->so_proto->pr_protocol;
 2946         xso->xso_family = so->so_proto->pr_domain->dom_family;
 2947         xso->so_qlen = so->so_qlen;
 2948         xso->so_incqlen = so->so_incqlen;
 2949         xso->so_qlimit = so->so_qlimit;
 2950         xso->so_timeo = so->so_timeo;
 2951         xso->so_error = so->so_error;
 2952         xso->so_pgid = so->so_sigio ? so->so_sigio->sio_pgid : 0;
 2953         xso->so_oobmark = so->so_oobmark;
 2954         sbtoxsockbuf(&so->so_snd, &xso->so_snd);
 2955         sbtoxsockbuf(&so->so_rcv, &xso->so_rcv);
 2956         xso->so_uid = so->so_cred->cr_uid;
 2957 }

Cache object: 878b3b273630cba04a7148e48ba7df91


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.