The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/unshare.txt

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1 
    2 unshare system call:
    3 --------------------
    4 This document describes the new system call, unshare. The document
    5 provides an overview of the feature, why it is needed, how it can
    6 be used, its interface specification, design, implementation and
    7 how it can be tested.
    8 
    9 Change Log:
   10 -----------
   11 version 0.1  Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006
   12 
   13 Contents:
   14 ---------
   15         1) Overview
   16         2) Benefits
   17         3) Cost
   18         4) Requirements
   19         5) Functional Specification
   20         6) High Level Design
   21         7) Low Level Design
   22         8) Test Specification
   23         9) Future Work
   24 
   25 1) Overview
   26 -----------
   27 Most legacy operating system kernels support an abstraction of threads
   28 as multiple execution contexts within a process. These kernels provide
   29 special resources and mechanisms to maintain these "threads". The Linux
   30 kernel, in a clever and simple manner, does not make distinction
   31 between processes and "threads". The kernel allows processes to share
   32 resources and thus they can achieve legacy "threads" behavior without
   33 requiring additional data structures and mechanisms in the kernel. The
   34 power of implementing threads in this manner comes not only from
   35 its simplicity but also from allowing application programmers to work
   36 outside the confinement of all-or-nothing shared resources of legacy
   37 threads. On Linux, at the time of thread creation using the clone system
   38 call, applications can selectively choose which resources to share
   39 between threads.
   40 
   41 unshare system call adds a primitive to the Linux thread model that
   42 allows threads to selectively 'unshare' any resources that were being
   43 shared at the time of their creation. unshare was conceptualized by
   44 Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part
   45 of the discussion on POSIX threads on Linux.  unshare augments the
   46 usefulness of Linux threads for applications that would like to control
   47 shared resources without creating a new process. unshare is a natural
   48 addition to the set of available primitives on Linux that implement
   49 the concept of process/thread as a virtual machine.
   50 
   51 2) Benefits
   52 -----------
   53 unshare would be useful to large application frameworks such as PAM
   54 where creating a new process to control sharing/unsharing of process
   55 resources is not possible. Since namespaces are shared by default
   56 when creating a new process using fork or clone, unshare can benefit
   57 even non-threaded applications if they have a need to disassociate
   58 from default shared namespace. The following lists two use-cases
   59 where unshare can be used.
   60 
   61 2.1 Per-security context namespaces
   62 -----------------------------------
   63 unshare can be used to implement polyinstantiated directories using
   64 the kernel's per-process namespace mechanism. Polyinstantiated directories,
   65 such as per-user and/or per-security context instance of /tmp, /var/tmp or
   66 per-security context instance of a user's home directory, isolate user
   67 processes when working with these directories. Using unshare, a PAM
   68 module can easily setup a private namespace for a user at login.
   69 Polyinstantiated directories are required for Common Criteria certification
   70 with Labeled System Protection Profile, however, with the availability
   71 of shared-tree feature in the Linux kernel, even regular Linux systems
   72 can benefit from setting up private namespaces at login and
   73 polyinstantiating /tmp, /var/tmp and other directories deemed
   74 appropriate by system administrators.
   75 
   76 2.2 unsharing of virtual memory and/or open files
   77 -------------------------------------------------
   78 Consider a client/server application where the server is processing
   79 client requests by creating processes that share resources such as
   80 virtual memory and open files. Without unshare, the server has to
   81 decide what needs to be shared at the time of creating the process
   82 which services the request. unshare allows the server an ability to
   83 disassociate parts of the context during the servicing of the
   84 request. For large and complex middleware application frameworks, this
   85 ability to unshare after the process was created can be very
   86 useful.
   87 
   88 3) Cost
   89 -------
   90 In order to not duplicate code and to handle the fact that unshare
   91 works on an active task (as opposed to clone/fork working on a newly
   92 allocated inactive task) unshare had to make minor reorganizational
   93 changes to copy_* functions utilized by clone/fork system call.
   94 There is a cost associated with altering existing, well tested and
   95 stable code to implement a new feature that may not get exercised
   96 extensively in the beginning. However, with proper design and code
   97 review of the changes and creation of an unshare test for the LTP
   98 the benefits of this new feature can exceed its cost.
   99 
  100 4) Requirements
  101 ---------------
  102 unshare reverses sharing that was done using clone(2) system call,
  103 so unshare should have a similar interface as clone(2). That is,
  104 since flags in clone(int flags, void *stack) specifies what should
  105 be shared, similar flags in unshare(int flags) should specify
  106 what should be unshared. Unfortunately, this may appear to invert
  107 the meaning of the flags from the way they are used in clone(2).
  108 However, there was no easy solution that was less confusing and that
  109 allowed incremental context unsharing in future without an ABI change.
  110 
  111 unshare interface should accommodate possible future addition of
  112 new context flags without requiring a rebuild of old applications.
  113 If and when new context flags are added, unshare design should allow
  114 incremental unsharing of those resources on an as needed basis.
  115 
  116 5) Functional Specification
  117 ---------------------------
  118 NAME
  119         unshare - disassociate parts of the process execution context
  120 
  121 SYNOPSIS
  122         #include <sched.h>
  123 
  124         int unshare(int flags);
  125 
  126 DESCRIPTION
  127         unshare allows a process to disassociate parts of its execution
  128         context that are currently being shared with other processes. Part
  129         of execution context, such as the namespace, is shared by default
  130         when a new process is created using fork(2), while other parts,
  131         such as the virtual memory, open file descriptors, etc, may be
  132         shared by explicit request to share them when creating a process
  133         using clone(2).
  134 
  135         The main use of unshare is to allow a process to control its
  136         shared execution context without creating a new process.
  137 
  138         The flags argument specifies one or bitwise-or'ed of several of
  139         the following constants.
  140 
  141         CLONE_FS
  142                 If CLONE_FS is set, file system information of the caller
  143                 is disassociated from the shared file system information.
  144 
  145         CLONE_FILES
  146                 If CLONE_FILES is set, the file descriptor table of the
  147                 caller is disassociated from the shared file descriptor
  148                 table.
  149 
  150         CLONE_NEWNS
  151                 If CLONE_NEWNS is set, the namespace of the caller is
  152                 disassociated from the shared namespace.
  153 
  154         CLONE_VM
  155                 If CLONE_VM is set, the virtual memory of the caller is
  156                 disassociated from the shared virtual memory.
  157 
  158 RETURN VALUE
  159         On success, zero returned. On failure, -1 is returned and errno is
  160 
  161 ERRORS
  162         EPERM   CLONE_NEWNS was specified by a non-root process (process
  163                 without CAP_SYS_ADMIN).
  164 
  165         ENOMEM  Cannot allocate sufficient memory to copy parts of caller's
  166                 context that need to be unshared.
  167 
  168         EINVAL  Invalid flag was specified as an argument.
  169 
  170 CONFORMING TO
  171         The unshare() call is Linux-specific and  should  not be used
  172         in programs intended to be portable.
  173 
  174 SEE ALSO
  175         clone(2), fork(2)
  176 
  177 6) High Level Design
  178 --------------------
  179 Depending on the flags argument, the unshare system call allocates
  180 appropriate process context structures, populates it with values from
  181 the current shared version, associates newly duplicated structures
  182 with the current task structure and releases corresponding shared
  183 versions. Helper functions of clone (copy_*) could not be used
  184 directly by unshare because of the following two reasons.
  185   1) clone operates on a newly allocated not-yet-active task
  186      structure, where as unshare operates on the current active
  187      task. Therefore unshare has to take appropriate task_lock()
  188      before associating newly duplicated context structures
  189   2) unshare has to allocate and duplicate all context structures
  190      that are being unshared, before associating them with the
  191      current task and releasing older shared structures. Failure
  192      do so will create race conditions and/or oops when trying
  193      to backout due to an error. Consider the case of unsharing
  194      both virtual memory and namespace. After successfully unsharing
  195      vm, if the system call encounters an error while allocating
  196      new namespace structure, the error return code will have to
  197      reverse the unsharing of vm. As part of the reversal the
  198      system call will have to go back to older, shared, vm
  199      structure, which may not exist anymore.
  200 
  201 Therefore code from copy_* functions that allocated and duplicated
  202 current context structure was moved into new dup_* functions. Now,
  203 copy_* functions call dup_* functions to allocate and duplicate
  204 appropriate context structures and then associate them with the
  205 task structure that is being constructed. unshare system call on
  206 the other hand performs the following:
  207   1) Check flags to force missing, but implied, flags
  208   2) For each context structure, call the corresponding unshare
  209      helper function to allocate and duplicate a new context
  210      structure, if the appropriate bit is set in the flags argument.
  211   3) If there is no error in allocation and duplication and there
  212      are new context structures then lock the current task structure,
  213      associate new context structures with the current task structure,
  214      and release the lock on the current task structure.
  215   4) Appropriately release older, shared, context structures.
  216 
  217 7) Low Level Design
  218 -------------------
  219 Implementation of unshare can be grouped in the following 4 different
  220 items:
  221   a) Reorganization of existing copy_* functions
  222   b) unshare system call service function
  223   c) unshare helper functions for each different process context
  224   d) Registration of system call number for different architectures
  225 
  226   7.1) Reorganization of copy_* functions
  227        Each copy function such as copy_mm, copy_namespace, copy_files,
  228        etc, had roughly two components. The first component allocated
  229        and duplicated the appropriate structure and the second component
  230        linked it to the task structure passed in as an argument to the copy
  231        function. The first component was split into its own function.
  232        These dup_* functions allocated and duplicated the appropriate
  233        context structure. The reorganized copy_* functions invoked
  234        their corresponding dup_* functions and then linked the newly
  235        duplicated structures to the task structure with which the
  236        copy function was called.
  237 
  238   7.2) unshare system call service function
  239        * Check flags
  240          Force implied flags. If CLONE_THREAD is set force CLONE_VM.
  241          If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is
  242          set and signals are also being shared, force CLONE_THREAD. If
  243          CLONE_NEWNS is set, force CLONE_FS.
  244        * For each context flag, invoke the corresponding unshare_*
  245          helper routine with flags passed into the system call and a
  246          reference to pointer pointing the new unshared structure
  247        * If any new structures are created by unshare_* helper
  248          functions, take the task_lock() on the current task,
  249          modify appropriate context pointers, and release the
  250          task lock.
  251        * For all newly unshared structures, release the corresponding
  252          older, shared, structures.
  253 
  254   7.3) unshare_* helper functions
  255        For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,
  256        and CLONE_THREAD, return -EINVAL since they are not implemented yet.
  257        For others, check the flag value to see if the unsharing is
  258        required for that structure. If it is, invoke the corresponding
  259        dup_* function to allocate and duplicate the structure and return
  260        a pointer to it.
  261 
  262   7.4) Appropriately modify architecture specific code to register the
  263        new system call.
  264 
  265 8) Test Specification
  266 ---------------------
  267 The test for unshare should test the following:
  268   1) Valid flags: Test to check that clone flags for signal and
  269         signal handlers, for which unsharing is not implemented
  270         yet, return -EINVAL.
  271   2) Missing/implied flags: Test to make sure that if unsharing
  272         namespace without specifying unsharing of filesystem, correctly
  273         unshares both namespace and filesystem information.
  274   3) For each of the four (namespace, filesystem, files and vm)
  275         supported unsharing, verify that the system call correctly
  276         unshares the appropriate structure. Verify that unsharing
  277         them individually as well as in combination with each
  278         other works as expected.
  279   4) Concurrent execution: Use shared memory segments and futex on
  280         an address in the shm segment to synchronize execution of
  281         about 10 threads. Have a couple of threads execute execve,
  282         a couple _exit and the rest unshare with different combination
  283         of flags. Verify that unsharing is performed as expected and
  284         that there are no oops or hangs.
  285 
  286 9) Future Work
  287 --------------
  288 The current implementation of unshare does not allow unsharing of
  289 signals and signal handlers. Signals are complex to begin with and
  290 to unshare signals and/or signal handlers of a currently running
  291 process is even more complex. If in the future there is a specific
  292 need to allow unsharing of signals and/or signal handlers, it can
  293 be incrementally added to unshare without affecting legacy
  294 applications using unshare.
  295 

Cache object: 9e782d9045a349eb84bd961cf1c6ad3d


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.