The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/netinet/ip_fastfwd.c

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 /*-
    2  * Copyright (c) 2003 Andre Oppermann, Internet Business Solutions AG
    3  * All rights reserved.
    4  *
    5  * Redistribution and use in source and binary forms, with or without
    6  * modification, are permitted provided that the following conditions
    7  * are met:
    8  * 1. Redistributions of source code must retain the above copyright
    9  *    notice, this list of conditions and the following disclaimer.
   10  * 2. Redistributions in binary form must reproduce the above copyright
   11  *    notice, this list of conditions and the following disclaimer in the
   12  *    documentation and/or other materials provided with the distribution.
   13  * 3. The name of the author may not be used to endorse or promote
   14  *    products derived from this software without specific prior written
   15  *    permission.
   16  *
   17  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
   18  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   19  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   20  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
   21  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
   22  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
   23  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
   24  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
   25  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
   26  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   27  * SUCH DAMAGE.
   28  */
   29 
   30 /*
   31  * ip_fastforward gets its speed from processing the forwarded packet to
   32  * completion (if_output on the other side) without any queues or netisr's.
   33  * The receiving interface DMAs the packet into memory, the upper half of
   34  * driver calls ip_fastforward, we do our routing table lookup and directly
   35  * send it off to the outgoing interface, which DMAs the packet to the
   36  * network card. The only part of the packet we touch with the CPU is the
   37  * IP header (unless there are complex firewall rules touching other parts
   38  * of the packet, but that is up to you). We are essentially limited by bus
   39  * bandwidth and how fast the network card/driver can set up receives and
   40  * transmits.
   41  *
   42  * We handle basic errors, IP header errors, checksum errors,
   43  * destination unreachable, fragmentation and fragmentation needed and
   44  * report them via ICMP to the sender.
   45  *
   46  * Else if something is not pure IPv4 unicast forwarding we fall back to
   47  * the normal ip_input processing path. We should only be called from
   48  * interfaces connected to the outside world.
   49  *
   50  * Firewalling is fully supported including divert, ipfw fwd and ipfilter
   51  * ipnat and address rewrite.
   52  *
   53  * IPSEC is not supported if this host is a tunnel broker. IPSEC is
   54  * supported for connections to/from local host.
   55  *
   56  * We try to do the least expensive (in CPU ops) checks and operations
   57  * first to catch junk with as little overhead as possible.
   58  * 
   59  * We take full advantage of hardware support for IP checksum and
   60  * fragmentation offloading.
   61  *
   62  * We don't do ICMP redirect in the fast forwarding path. I have had my own
   63  * cases where two core routers with Zebra routing suite would send millions
   64  * ICMP redirects to connected hosts if the destination router was not the
   65  * default gateway. In one case it was filling the routing table of a host
   66  * with approximately 300.000 cloned redirect entries until it ran out of
   67  * kernel memory. However the networking code proved very robust and it didn't
   68  * crash or fail in other ways.
   69  */
   70 
   71 /*
   72  * Many thanks to Matt Thomas of NetBSD for basic structure of ip_flow.c which
   73  * is being followed here.
   74  */
   75 
   76 #include <sys/cdefs.h>
   77 __FBSDID("$FreeBSD: releng/11.0/sys/netinet/ip_fastfwd.c 301717 2016-06-09 05:48:34Z ae $");
   78 
   79 #include "opt_ipstealth.h"
   80 
   81 #include <sys/param.h>
   82 #include <sys/systm.h>
   83 #include <sys/kernel.h>
   84 #include <sys/malloc.h>
   85 #include <sys/mbuf.h>
   86 #include <sys/protosw.h>
   87 #include <sys/sdt.h>
   88 #include <sys/socket.h>
   89 #include <sys/sysctl.h>
   90 
   91 #include <net/pfil.h>
   92 #include <net/if.h>
   93 #include <net/if_types.h>
   94 #include <net/if_var.h>
   95 #include <net/if_dl.h>
   96 #include <net/route.h>
   97 #include <net/vnet.h>
   98 
   99 #include <netinet/in.h>
  100 #include <netinet/in_kdtrace.h>
  101 #include <netinet/in_systm.h>
  102 #include <netinet/in_var.h>
  103 #include <netinet/ip.h>
  104 #include <netinet/ip_var.h>
  105 #include <netinet/ip_icmp.h>
  106 #include <netinet/ip_options.h>
  107 
  108 #include <machine/in_cksum.h>
  109 
  110 static struct sockaddr_in *
  111 ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m)
  112 {
  113         struct sockaddr_in *dst;
  114         struct rtentry *rt;
  115 
  116         /*
  117          * Find route to destination.
  118          */
  119         bzero(ro, sizeof(*ro));
  120         dst = (struct sockaddr_in *)&ro->ro_dst;
  121         dst->sin_family = AF_INET;
  122         dst->sin_len = sizeof(*dst);
  123         dst->sin_addr.s_addr = dest.s_addr;
  124         in_rtalloc_ign(ro, 0, M_GETFIB(m));
  125 
  126         /*
  127          * Route there and interface still up?
  128          */
  129         rt = ro->ro_rt;
  130         if (rt && (rt->rt_flags & RTF_UP) &&
  131             (rt->rt_ifp->if_flags & IFF_UP) &&
  132             (rt->rt_ifp->if_drv_flags & IFF_DRV_RUNNING)) {
  133                 if (rt->rt_flags & RTF_GATEWAY)
  134                         dst = (struct sockaddr_in *)rt->rt_gateway;
  135         } else {
  136                 IPSTAT_INC(ips_noroute);
  137                 IPSTAT_INC(ips_cantforward);
  138                 if (rt)
  139                         RTFREE(rt);
  140                 icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0);
  141                 return NULL;
  142         }
  143         return dst;
  144 }
  145 
  146 /*
  147  * Try to forward a packet based on the destination address.
  148  * This is a fast path optimized for the plain forwarding case.
  149  * If the packet is handled (and consumed) here then we return NULL;
  150  * otherwise mbuf is returned and the packet should be delivered
  151  * to ip_input for full processing.
  152  */
  153 struct mbuf *
  154 ip_tryforward(struct mbuf *m)
  155 {
  156         struct ip *ip;
  157         struct mbuf *m0 = NULL;
  158         struct route ro;
  159         struct sockaddr_in *dst = NULL;
  160         struct ifnet *ifp;
  161         struct in_addr odest, dest;
  162         uint16_t ip_len, ip_off;
  163         int error = 0;
  164         int mtu;
  165         struct m_tag *fwd_tag = NULL;
  166 
  167         /*
  168          * Are we active and forwarding packets?
  169          */
  170 
  171         M_ASSERTVALID(m);
  172         M_ASSERTPKTHDR(m);
  173 
  174         bzero(&ro, sizeof(ro));
  175 
  176 
  177 #ifdef ALTQ
  178         /*
  179          * Is packet dropped by traffic conditioner?
  180          */
  181         if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0)
  182                 goto drop;
  183 #endif
  184 
  185         /*
  186          * Only IP packets without options
  187          */
  188         ip = mtod(m, struct ip *);
  189 
  190         if (ip->ip_hl != (sizeof(struct ip) >> 2)) {
  191                 if (V_ip_doopts == 1)
  192                         return m;
  193                 else if (V_ip_doopts == 2) {
  194                         icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB,
  195                                 0, 0);
  196                         return NULL;    /* mbuf already free'd */
  197                 }
  198                 /* else ignore IP options and continue */
  199         }
  200 
  201         /*
  202          * Only unicast IP, not from loopback, no L2 or IP broadcast,
  203          * no multicast, no INADDR_ANY
  204          *
  205          * XXX: Probably some of these checks could be direct drop
  206          * conditions.  However it is not clear whether there are some
  207          * hacks or obscure behaviours which make it necessary to
  208          * let ip_input handle it.  We play safe here and let ip_input
  209          * deal with it until it is proven that we can directly drop it.
  210          */
  211         if ((m->m_flags & (M_BCAST|M_MCAST)) ||
  212             (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) ||
  213             ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
  214             ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST ||
  215             IN_MULTICAST(ntohl(ip->ip_src.s_addr)) ||
  216             IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) ||
  217             IN_LINKLOCAL(ntohl(ip->ip_src.s_addr)) ||
  218             IN_LINKLOCAL(ntohl(ip->ip_dst.s_addr)) ||
  219             ip->ip_src.s_addr == INADDR_ANY ||
  220             ip->ip_dst.s_addr == INADDR_ANY )
  221                 return m;
  222 
  223         /*
  224          * Is it for a local address on this host?
  225          */
  226         if (in_localip(ip->ip_dst))
  227                 return m;
  228 
  229         IPSTAT_INC(ips_total);
  230 
  231         /*
  232          * Step 3: incoming packet firewall processing
  233          */
  234 
  235         odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;
  236 
  237         /*
  238          * Run through list of ipfilter hooks for input packets
  239          */
  240         if (!PFIL_HOOKED(&V_inet_pfil_hook))
  241                 goto passin;
  242 
  243         if (pfil_run_hooks(
  244             &V_inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) ||
  245             m == NULL)
  246                 goto drop;
  247 
  248         M_ASSERTVALID(m);
  249         M_ASSERTPKTHDR(m);
  250 
  251         ip = mtod(m, struct ip *);      /* m may have changed by pfil hook */
  252         dest.s_addr = ip->ip_dst.s_addr;
  253 
  254         /*
  255          * Destination address changed?
  256          */
  257         if (odest.s_addr != dest.s_addr) {
  258                 /*
  259                  * Is it now for a local address on this host?
  260                  */
  261                 if (in_localip(dest))
  262                         goto forwardlocal;
  263                 /*
  264                  * Go on with new destination address
  265                  */
  266         }
  267 
  268         if (m->m_flags & M_FASTFWD_OURS) {
  269                 /*
  270                  * ipfw changed it for a local address on this host.
  271                  */
  272                 goto forwardlocal;
  273         }
  274 
  275 passin:
  276         /*
  277          * Step 4: decrement TTL and look up route
  278          */
  279 
  280         /*
  281          * Check TTL
  282          */
  283 #ifdef IPSTEALTH
  284         if (!V_ipstealth) {
  285 #endif
  286         if (ip->ip_ttl <= IPTTLDEC) {
  287                 icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, 0);
  288                 return NULL;    /* mbuf already free'd */
  289         }
  290 
  291         /*
  292          * Decrement the TTL and incrementally change the IP header checksum.
  293          * Don't bother doing this with hw checksum offloading, it's faster
  294          * doing it right here.
  295          */
  296         ip->ip_ttl -= IPTTLDEC;
  297         if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8))
  298                 ip->ip_sum -= ~htons(IPTTLDEC << 8);
  299         else
  300                 ip->ip_sum += htons(IPTTLDEC << 8);
  301 #ifdef IPSTEALTH
  302         }
  303 #endif
  304 
  305         /*
  306          * Find route to destination.
  307          */
  308         if ((dst = ip_findroute(&ro, dest, m)) == NULL)
  309                 return NULL;    /* icmp unreach already sent */
  310         ifp = ro.ro_rt->rt_ifp;
  311 
  312         /*
  313          * Immediately drop blackholed traffic, and directed broadcasts
  314          * for either the all-ones or all-zero subnet addresses on
  315          * locally attached networks.
  316          */
  317         if ((ro.ro_rt->rt_flags & (RTF_BLACKHOLE|RTF_BROADCAST)) != 0)
  318                 goto drop;
  319 
  320         /*
  321          * Step 5: outgoing firewall packet processing
  322          */
  323 
  324         /*
  325          * Run through list of hooks for output packets.
  326          */
  327         if (!PFIL_HOOKED(&V_inet_pfil_hook))
  328                 goto passout;
  329 
  330         if (pfil_run_hooks(&V_inet_pfil_hook, &m, ifp, PFIL_OUT, NULL) || m == NULL) {
  331                 goto drop;
  332         }
  333 
  334         M_ASSERTVALID(m);
  335         M_ASSERTPKTHDR(m);
  336 
  337         ip = mtod(m, struct ip *);
  338         dest.s_addr = ip->ip_dst.s_addr;
  339 
  340         /*
  341          * Destination address changed?
  342          */
  343         if (m->m_flags & M_IP_NEXTHOP)
  344                 fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL);
  345         if (odest.s_addr != dest.s_addr || fwd_tag != NULL) {
  346                 /*
  347                  * Is it now for a local address on this host?
  348                  */
  349                 if (m->m_flags & M_FASTFWD_OURS || in_localip(dest)) {
  350 forwardlocal:
  351                         /*
  352                          * Return packet for processing by ip_input().
  353                          */
  354                         m->m_flags |= M_FASTFWD_OURS;
  355                         if (ro.ro_rt)
  356                                 RTFREE(ro.ro_rt);
  357                         return m;
  358                 }
  359                 /*
  360                  * Redo route lookup with new destination address
  361                  */
  362                 if (fwd_tag) {
  363                         dest.s_addr = ((struct sockaddr_in *)
  364                                     (fwd_tag + 1))->sin_addr.s_addr;
  365                         m_tag_delete(m, fwd_tag);
  366                         m->m_flags &= ~M_IP_NEXTHOP;
  367                 }
  368                 RTFREE(ro.ro_rt);
  369                 if ((dst = ip_findroute(&ro, dest, m)) == NULL)
  370                         return NULL;    /* icmp unreach already sent */
  371                 ifp = ro.ro_rt->rt_ifp;
  372         }
  373 
  374 passout:
  375         /*
  376          * Step 6: send off the packet
  377          */
  378         ip_len = ntohs(ip->ip_len);
  379         ip_off = ntohs(ip->ip_off);
  380 
  381         /*
  382          * Check if route is dampned (when ARP is unable to resolve)
  383          */
  384         if ((ro.ro_rt->rt_flags & RTF_REJECT) &&
  385             (ro.ro_rt->rt_expire == 0 || time_uptime < ro.ro_rt->rt_expire)) {
  386                 icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0);
  387                 goto consumed;
  388         }
  389 
  390         /*
  391          * Check if media link state of interface is not down
  392          */
  393         if (ifp->if_link_state == LINK_STATE_DOWN) {
  394                 icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0);
  395                 goto consumed;
  396         }
  397 
  398         /*
  399          * Check if packet fits MTU or if hardware will fragment for us
  400          */
  401         if (ro.ro_rt->rt_mtu)
  402                 mtu = min(ro.ro_rt->rt_mtu, ifp->if_mtu);
  403         else
  404                 mtu = ifp->if_mtu;
  405 
  406         if (ip_len <= mtu) {
  407                 /*
  408                  * Avoid confusing lower layers.
  409                  */
  410                 m_clrprotoflags(m);
  411                 /*
  412                  * Send off the packet via outgoing interface
  413                  */
  414                 IP_PROBE(send, NULL, NULL, ip, ifp, ip, NULL);
  415                 error = (*ifp->if_output)(ifp, m,
  416                                 (struct sockaddr *)dst, &ro);
  417         } else {
  418                 /*
  419                  * Handle EMSGSIZE with icmp reply needfrag for TCP MTU discovery
  420                  */
  421                 if (ip_off & IP_DF) {
  422                         IPSTAT_INC(ips_cantfrag);
  423                         icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
  424                                 0, mtu);
  425                         goto consumed;
  426                 } else {
  427                         /*
  428                          * We have to fragment the packet
  429                          */
  430                         m->m_pkthdr.csum_flags |= CSUM_IP;
  431                         if (ip_fragment(ip, &m, mtu, ifp->if_hwassist))
  432                                 goto drop;
  433                         KASSERT(m != NULL, ("null mbuf and no error"));
  434                         /*
  435                          * Send off the fragments via outgoing interface
  436                          */
  437                         error = 0;
  438                         do {
  439                                 m0 = m->m_nextpkt;
  440                                 m->m_nextpkt = NULL;
  441                                 /*
  442                                  * Avoid confusing lower layers.
  443                                  */
  444                                 m_clrprotoflags(m);
  445 
  446                                 IP_PROBE(send, NULL, NULL, ip, ifp, ip, NULL);
  447                                 error = (*ifp->if_output)(ifp, m,
  448                                         (struct sockaddr *)dst, &ro);
  449                                 if (error)
  450                                         break;
  451                         } while ((m = m0) != NULL);
  452                         if (error) {
  453                                 /* Reclaim remaining fragments */
  454                                 for (m = m0; m; m = m0) {
  455                                         m0 = m->m_nextpkt;
  456                                         m_freem(m);
  457                                 }
  458                         } else
  459                                 IPSTAT_INC(ips_fragmented);
  460                 }
  461         }
  462 
  463         if (error != 0)
  464                 IPSTAT_INC(ips_odropped);
  465         else {
  466                 counter_u64_add(ro.ro_rt->rt_pksent, 1);
  467                 IPSTAT_INC(ips_forward);
  468                 IPSTAT_INC(ips_fastforward);
  469         }
  470 consumed:
  471         RTFREE(ro.ro_rt);
  472         return NULL;
  473 drop:
  474         if (m)
  475                 m_freem(m);
  476         if (ro.ro_rt)
  477                 RTFREE(ro.ro_rt);
  478         return NULL;
  479 }

Cache object: 8288d21c6e5c37f701444eaf1fe4a77a


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.