The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/geom/notes

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 $FreeBSD: releng/11.2/sys/geom/notes 134824 2004-09-05 21:15:58Z phk $
    2 
    3 For the lack of a better place to put them, this file will contain
    4 notes on some of the more intricate details of geom.
    5 
    6 -----------------------------------------------------------------------
    7 Locking of bio_children and bio_inbed
    8 
    9 bio_children is used by g_std_done() and g_clone_bio() to keep track
   10 of children cloned off a request.  g_clone_bio will increment the
   11 bio_children counter for each time it is called and g_std_done will
   12 increment bio_inbed for every call, and if the two counters are
   13 equal, call g_io_deliver() on the parent bio.
   14 
   15 The general assumption is that g_clone_bio() is called only in
   16 the g_down thread, and g_std_done() only in the g_up thread and
   17 therefore the two fields do not generally need locking.  These
   18 restrictions are not enforced by the code, but only with great
   19 care should they be violated.
   20 
   21 It is the responsibility of the class implementation to avoid the
   22 following race condition:  A class intend to split a bio in two
   23 children.  It clones the bio, and requests I/O on the child. 
   24 This I/O operation completes before the second child is cloned
   25 and g_std_done() sees the counters both equal 1 and finishes off
   26 the bio.
   27 
   28 There is no race present in the common case where the bio is split
   29 in multiple parts in the class start method and the I/O is requested
   30 on another GEOM class below:  There is only one g_down thread and
   31 the class below will not get its start method run until we return
   32 from our start method, and consequently the I/O cannot complete
   33 prematurely.
   34 
   35 In all other cases, this race needs to be mitigated, for instance
   36 by cloning all children before I/O is request on any of them.
   37 
   38 Notice that cloning an "extra" child and calling g_std_done() on
   39 it directly opens another race since the assumption is that
   40 g_std_done() only is called in the g_up thread.
   41 
   42 -----------------------------------------------------------------------
   43 Statistics collection
   44 
   45 Statistics collection can run at three levels controlled by the
   46 "kern.geom.collectstats" sysctl.
   47 
   48 At level zero, only the number of transactions started and completed
   49 are counted, and this is only because GEOM internally uses the difference
   50 between these two as sanity checks.
   51 
   52 At level one we collect the full statistics.  Higher levels are
   53 reserved for future use.  Statistics are collected independently
   54 on both the provider and the consumer, because multiple consumers
   55 can be active against the same provider at the same time.
   56 
   57 The statistics collection falls in two parts:
   58 
   59 The first and simpler part consists of g_io_request() timestamping
   60 the struct bio when the request is first started and g_io_deliver()
   61 updating the consumer and providers statistics based on fields in
   62 the bio when it is completed.  There are no concurrency or locking
   63 concerns in this part.  The statistics collected consists of number
   64 of requests, number of bytes, number of ENOMEM errors, number of
   65 other errors and duration of the request for each of the three
   66 major request types: BIO_READ, BIO_WRITE and BIO_DELETE.
   67 
   68 The second part is trying to keep track of the "busy%".
   69 
   70 If in g_io_request() we find that there are no outstanding requests,
   71 (based on the counters for scheduled and completed requests being
   72 equal), we set a timestamp in the "wentbusy" field.  Since there
   73 are no outstanding requests, and as long as there is only one thread
   74 pushing the g_down queue, we cannot possibly conflict with
   75 g_io_deliver() until we ship the current request down.
   76 
   77 In g_io_deliver() we calculate the delta-T from wentbusy and add this
   78 to the "bt" field, and set wentbusy to the current timestamp.  We
   79 take care to do this before we increment the "requests completed"
   80 counter, since that prevents g_io_request() from touching the
   81 "wentbusy" timestamp concurrently.
   82 
   83 The statistics data is made available to userland through the use
   84 of a special allocator (in geom_stats.c) which through a device
   85 allows userland to mmap(2) the pages containing the statistics data.
   86 In order to indicate to userland when the data in a statstics
   87 structure might be inconsistent, g_io_deliver() atomically sets a
   88 flag "updating" and resets it when the structure is again consistent.
   89 -----------------------------------------------------------------------
   90 maxsize, stripesize and stripeoffset
   91 
   92 maxsize is the biggest request we are willing to handle.  If not
   93 set there is no upper bound on the size of a request and the code
   94 is responsible for chopping it up.  Only hardware methods should
   95 set an upper bound in this field.  Geom_disk will inherit the upper
   96 bound set by the device driver.
   97 
   98 stripesize is the width of any natural request boundaries for the
   99 device.  This would be the width of a stripe on a raid-5 unit or
  100 one zone in GBDE.  The idea with this field is to hint to clustering
  101 type code to not trivially overrun these boundaries.
  102 
  103 stripeoffset is the amount of the first stripe which lies before the
  104 devices beginning.
  105 
  106 If we have a device with 64k stripes:
  107         [0...64k[
  108         [64k...128k[
  109         [128k..192k[
  110 Then it will have stripesize = 64k and stripeoffset = 0.
  111 
  112 If we put a MBR on this device, where slice#1 starts on sector#63,
  113 then this slice will have: stripesize = 64k, stripeoffset = 63 * sectorsize.
  114 
  115 If the clustering code wants to widen a request which writes to
  116 sector#53 of the slice, it can calculate how many bytes till the end of
  117 the stripe as:
  118         stripewith - (53 * sectorsize + stripeoffset) % stripewidth.
  119 -----------------------------------------------------------------------
  120 
  121 #include file usage:
  122 
  123                  geom.h|geom_int.h|geom_ext.h|geom_ctl.h|libgeom.h
  124 ----------------+------+----------+----------+----------+--------+
  125 geom class      |      |          |          |          |        |
  126 implementation  |   X  |          |          |          |        |
  127 ----------------+------+----------+----------+----------+--------+
  128 geom kernel     |      |          |          |          |        |
  129 infrastructure  |   X  |      X   |  X       |    X     |        |
  130 ----------------+------+----------+----------+----------+--------+
  131 libgeom         |      |          |          |          |        |
  132 implementation  |      |          |  X       |    X     |  X     |
  133 ----------------+------+----------+----------+----------+--------+
  134 geom aware      |      |          |          |          |        |
  135 application     |      |          |          |    X     |  X     |
  136 ----------------+------+----------+----------+----------+--------+
  137 
  138 geom_slice.h is special in that it documents a "library" for implementing
  139 a specific kind of class, and consequently does not appear in the above
  140 matrix.
  141 -----------------------------------------------------------------------
  142 Removable media.
  143 
  144 In general, the theory is that a drive creates the provider when it has
  145 a media and destroys it when the media disappears.
  146 
  147 In a more realistic world, we will allow a provider to be opened medialess
  148 (set any sectorsize and a mediasize==0) in order to allow operations like
  149 open/close tray etc.
  150 

Cache object: 0b8f2a19eb0c57da80f5009400336859


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.