The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/robust-futex-ABI.txt

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 Started by Paul Jackson <pj@sgi.com>
    2 
    3 The robust futex ABI
    4 --------------------
    5 
    6 Robust_futexes provide a mechanism that is used in addition to normal
    7 futexes, for kernel assist of cleanup of held locks on task exit.
    8 
    9 The interesting data as to what futexes a thread is holding is kept on a
   10 linked list in user space, where it can be updated efficiently as locks
   11 are taken and dropped, without kernel intervention.  The only additional
   12 kernel intervention required for robust_futexes above and beyond what is
   13 required for futexes is:
   14 
   15  1) a one time call, per thread, to tell the kernel where its list of
   16     held robust_futexes begins, and
   17  2) internal kernel code at exit, to handle any listed locks held
   18     by the exiting thread.
   19 
   20 The existing normal futexes already provide a "Fast Userspace Locking"
   21 mechanism, which handles uncontested locking without needing a system
   22 call, and handles contested locking by maintaining a list of waiting
   23 threads in the kernel.  Options on the sys_futex(2) system call support
   24 waiting on a particular futex, and waking up the next waiter on a
   25 particular futex.
   26 
   27 For robust_futexes to work, the user code (typically in a library such
   28 as glibc linked with the application) has to manage and place the
   29 necessary list elements exactly as the kernel expects them.  If it fails
   30 to do so, then improperly listed locks will not be cleaned up on exit,
   31 probably causing deadlock or other such failure of the other threads
   32 waiting on the same locks.
   33 
   34 A thread that anticipates possibly using robust_futexes should first
   35 issue the system call:
   36 
   37     asmlinkage long
   38     sys_set_robust_list(struct robust_list_head __user *head, size_t len);
   39 
   40 The pointer 'head' points to a structure in the threads address space
   41 consisting of three words.  Each word is 32 bits on 32 bit arch's, or 64
   42 bits on 64 bit arch's, and local byte order.  Each thread should have
   43 its own thread private 'head'.
   44 
   45 If a thread is running in 32 bit compatibility mode on a 64 native arch
   46 kernel, then it can actually have two such structures - one using 32 bit
   47 words for 32 bit compatibility mode, and one using 64 bit words for 64
   48 bit native mode.  The kernel, if it is a 64 bit kernel supporting 32 bit
   49 compatibility mode, will attempt to process both lists on each task
   50 exit, if the corresponding sys_set_robust_list() call has been made to
   51 setup that list.
   52 
   53   The first word in the memory structure at 'head' contains a
   54   pointer to a single linked list of 'lock entries', one per lock,
   55   as described below.  If the list is empty, the pointer will point
   56   to itself, 'head'.  The last 'lock entry' points back to the 'head'.
   57 
   58   The second word, called 'offset', specifies the offset from the
   59   address of the associated 'lock entry', plus or minus, of what will
   60   be called the 'lock word', from that 'lock entry'.  The 'lock word'
   61   is always a 32 bit word, unlike the other words above.  The 'lock
   62   word' holds 3 flag bits in the upper 3 bits, and the thread id (TID)
   63   of the thread holding the lock in the bottom 29 bits.  See further
   64   below for a description of the flag bits.
   65 
   66   The third word, called 'list_op_pending', contains transient copy of
   67   the address of the 'lock entry', during list insertion and removal,
   68   and is needed to correctly resolve races should a thread exit while
   69   in the middle of a locking or unlocking operation.
   70 
   71 Each 'lock entry' on the single linked list starting at 'head' consists
   72 of just a single word, pointing to the next 'lock entry', or back to
   73 'head' if there are no more entries.  In addition, nearby to each 'lock
   74 entry', at an offset from the 'lock entry' specified by the 'offset'
   75 word, is one 'lock word'.
   76 
   77 The 'lock word' is always 32 bits, and is intended to be the same 32 bit
   78 lock variable used by the futex mechanism, in conjunction with
   79 robust_futexes.  The kernel will only be able to wakeup the next thread
   80 waiting for a lock on a threads exit if that next thread used the futex
   81 mechanism to register the address of that 'lock word' with the kernel.
   82 
   83 For each futex lock currently held by a thread, if it wants this
   84 robust_futex support for exit cleanup of that lock, it should have one
   85 'lock entry' on this list, with its associated 'lock word' at the
   86 specified 'offset'.  Should a thread die while holding any such locks,
   87 the kernel will walk this list, mark any such locks with a bit
   88 indicating their holder died, and wakeup the next thread waiting for
   89 that lock using the futex mechanism.
   90 
   91 When a thread has invoked the above system call to indicate it
   92 anticipates using robust_futexes, the kernel stores the passed in 'head'
   93 pointer for that task.  The task may retrieve that value later on by
   94 using the system call:
   95 
   96     asmlinkage long
   97     sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
   98                         size_t __user *len_ptr);
   99 
  100 It is anticipated that threads will use robust_futexes embedded in
  101 larger, user level locking structures, one per lock.  The kernel
  102 robust_futex mechanism doesn't care what else is in that structure, so
  103 long as the 'offset' to the 'lock word' is the same for all
  104 robust_futexes used by that thread.  The thread should link those locks
  105 it currently holds using the 'lock entry' pointers.  It may also have
  106 other links between the locks, such as the reverse side of a double
  107 linked list, but that doesn't matter to the kernel.
  108 
  109 By keeping its locks linked this way, on a list starting with a 'head'
  110 pointer known to the kernel, the kernel can provide to a thread the
  111 essential service available for robust_futexes, which is to help clean
  112 up locks held at the time of (a perhaps unexpectedly) exit.
  113 
  114 Actual locking and unlocking, during normal operations, is handled
  115 entirely by user level code in the contending threads, and by the
  116 existing futex mechanism to wait for, and wakeup, locks.  The kernels
  117 only essential involvement in robust_futexes is to remember where the
  118 list 'head' is, and to walk the list on thread exit, handling locks
  119 still held by the departing thread, as described below.
  120 
  121 There may exist thousands of futex lock structures in a threads shared
  122 memory, on various data structures, at a given point in time. Only those
  123 lock structures for locks currently held by that thread should be on
  124 that thread's robust_futex linked lock list a given time.
  125 
  126 A given futex lock structure in a user shared memory region may be held
  127 at different times by any of the threads with access to that region. The
  128 thread currently holding such a lock, if any, is marked with the threads
  129 TID in the lower 29 bits of the 'lock word'.
  130 
  131 When adding or removing a lock from its list of held locks, in order for
  132 the kernel to correctly handle lock cleanup regardless of when the task
  133 exits (perhaps it gets an unexpected signal 9 in the middle of
  134 manipulating this list), the user code must observe the following
  135 protocol on 'lock entry' insertion and removal:
  136 
  137 On insertion:
  138  1) set the 'list_op_pending' word to the address of the 'lock entry'
  139     to be inserted,
  140  2) acquire the futex lock,
  141  3) add the lock entry, with its thread id (TID) in the bottom 29 bits
  142     of the 'lock word', to the linked list starting at 'head', and
  143  4) clear the 'list_op_pending' word.
  144 
  145 On removal:
  146  1) set the 'list_op_pending' word to the address of the 'lock entry'
  147     to be removed,
  148  2) remove the lock entry for this lock from the 'head' list,
  149  2) release the futex lock, and
  150  2) clear the 'lock_op_pending' word.
  151 
  152 On exit, the kernel will consider the address stored in
  153 'list_op_pending' and the address of each 'lock word' found by walking
  154 the list starting at 'head'.  For each such address, if the bottom 29
  155 bits of the 'lock word' at offset 'offset' from that address equals the
  156 exiting threads TID, then the kernel will do two things:
  157 
  158  1) if bit 31 (0x80000000) is set in that word, then attempt a futex
  159     wakeup on that address, which will waken the next thread that has
  160     used to the futex mechanism to wait on that address, and
  161  2) atomically set  bit 30 (0x40000000) in the 'lock word'.
  162 
  163 In the above, bit 31 was set by futex waiters on that lock to indicate
  164 they were waiting, and bit 30 is set by the kernel to indicate that the
  165 lock owner died holding the lock.
  166 
  167 The kernel exit code will silently stop scanning the list further if at
  168 any point:
  169 
  170  1) the 'head' pointer or an subsequent linked list pointer
  171     is not a valid address of a user space word
  172  2) the calculated location of the 'lock word' (address plus
  173     'offset') is not the valid address of a 32 bit user space
  174     word
  175  3) if the list contains more than 1 million (subject to
  176     future kernel configuration changes) elements.
  177 
  178 When the kernel sees a list entry whose 'lock word' doesn't have the
  179 current threads TID in the lower 29 bits, it does nothing with that
  180 entry, and goes on to the next entry.
  181 
  182 Bit 29 (0x20000000) of the 'lock word' is reserved for future use.

Cache object: 31326206ba857f9f17fca57a2c5c515a


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.