The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/vfs/ufs/

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

Name Size Last modified (GMT) Description
Back Parent directory 2013-10-07 20:27:23
File README 15209 bytes 2009-02-08 18:53:02
File README.softupdates 888 bytes 2009-02-08 18:53:02
C file dinode.h 5748 bytes 2010-02-10 22:25:49
C file dir.h 6209 bytes 2013-10-07 20:27:23
C file dirhash.h 5282 bytes 2009-02-08 18:53:02
C file ffs_alloc.c 56160 bytes 2013-10-07 20:27:23
C file ffs_balloc.c 14933 bytes 2013-10-07 20:27:23
C file ffs_extern.h 5381 bytes 2010-02-10 22:25:49
C file ffs_inode.c 17091 bytes 2013-12-06 09:31:48
C file ffs_rawread.c 10139 bytes 2013-10-07 20:27:23
C file ffs_softdep.c 151254 bytes 2013-12-06 09:31:48
C file ffs_softdep_stub.c 5363 bytes 2009-02-08 18:53:02
C file ffs_subr.c 7749 bytes 2013-12-06 09:31:48
C file ffs_tables.c 5786 bytes 2013-10-07 20:27:23
C file ffs_vfsops.c 34395 bytes 2013-12-06 09:31:48
C file ffs_vnops.c 3881 bytes 2013-10-07 20:27:23
C file fs.h 25042 bytes 2010-12-20 21:05:48
C file inode.h 6748 bytes 2010-02-10 22:25:49
C file quota.h 7759 bytes 2009-02-08 18:53:02
C file softdep.h 27747 bytes 2009-02-08 18:53:02
C file ufs_bmap.c 9929 bytes 2013-10-07 20:27:24
C file ufs_dirhash.c 27800 bytes 2013-10-07 20:27:24
C file ufs_extern.h 4768 bytes 2013-10-07 20:27:24
C file ufs_ihash.c 5153 bytes 2013-10-07 20:27:24
C file ufs_inode.c 4936 bytes 2013-12-06 09:31:48
C file ufs_lookup.c 34619 bytes 2013-12-06 09:31:48
C file ufs_quota.c 25238 bytes 2013-10-07 20:27:24
C file ufs_readwrite.c 11744 bytes 2013-10-07 20:27:24
C file ufs_types.h 1993 bytes 2009-02-08 18:53:02
C file ufs_vfsops.c 5818 bytes 2013-10-07 20:27:24
C file ufs_vnops.c 58484 bytes 2013-12-06 09:31:48
C file ufsmount.h 4305 bytes 2013-10-07 20:27:24

    1 # $FreeBSD: src/sys/ufs/ffs/README,v 1.4 1999/12/03 00:34:26 billf Exp $
    2 # $DragonFly: src/sys/vfs/ufs/README,v 1.4 2004/07/18 19:43:48 drhodus Exp $
    3 
    4 Introduction
    5 
    6 This package constitutes the alpha distribution of the soft update
    7 code updates for the fast filesystem.
    8 
    9 For More information on what Soft Updates is, see:
   10 http://www.ece.cmu.edu/~ganger/papers/CSE-TR-254-95/
   11 
   12 Status
   13 
   14 My `filesystem torture tests' (described below) run for days without
   15 a hitch (no panic's, hangs, filesystem corruption, or memory leaks).
   16 However, I have had several panic's reported to me by folks that
   17 are field testing the code which I have not yet been able to
   18 reproduce or fix. Although these panic's are rare and do not cause
   19 filesystem corruption, the code should only be put into production
   20 on systems where the system administrator is aware that it is being
   21 run, and knows how to turn it off if problems arise. Thus, you may
   22 hand out this code to others, but please ensure that this status
   23 message is included with any distributions. Please also include
   24 the file ffs_softdep.stub.c in any distributions so that folks that
   25 cannot abide by the need to redistribute source will not be left
   26 with a kernel that will not link. It will resolve all the calls
   27 into the soft update code and simply ignores the request to enable
   28 them. Thus you will be able to ensure that your other hooks have
   29 not broken anything and that your kernel is softdep-ready for those
   30 that wish to use them. Please report problems back to me with
   31 kernel backtraces of panics if possible. This is massively complex
   32 code, and people only have to have their filesystems hosed once or
   33 twice to avoid future changes like the plague. I want to find and
   34 fix as many bugs as soon as possible so as to get the code rock
   35 solid before it gets widely released. Please report any bugs that
   36 you uncover to mckusick@mckusick.com.
   37 
   38 Performance
   39 
   40 Running the Andrew Benchmarks yields the following raw data:
   41 
   42         Phase   Normal  Softdep     What it does
   43           1       3s      <1s       Creating directories
   44           2       8s       4s       Copying files
   45           3       6s       6s       Recursive directory stats
   46           4       8s       9s       Scanning each file
   47           5      25s      25s       Compilation
   48 
   49         Normal:  19.9u 29.2s 0:52.8 135+630io
   50         Softdep: 20.3u 28.5s 0:47.8 103+363io
   51 
   52 Another interesting datapoint are my `filesystem torture tests'.
   53 They consist of 1000 runs of the andrew benchmarks, 1000 copy and
   54 removes of /etc with randomly selected pauses of 0-60 seconds
   55 between each copy and remove, and 500 find from / with randomly
   56 selected pauses of 100 seconds between each run). The run of the
   57 torture test compares as follows:
   58 
   59 With soft updates: writes: 6 sync, 1,113,686 async; run time 19hr, 50min
   60 Normal filesystem: writes: 1,459,147 sync, 487,031 async; run time 27hr, 15min
   61 
   62 The upshot is 42% less I/O and 28% shorter running time.
   63 
   64 Another interesting test point is a full MAKEDEV. Because it runs
   65 as a shell script, it becomes mostly limited by the execution speed
   66 of the machine on which it runs. Here are the numbers:
   67 
   68 With soft updates:
   69 
   70         labrat# time ./MAKEDEV std
   71         2.2u 32.6s 0:34.82 100.0% 0+0k 11+36io 0pf+0w
   72 
   73         labrat# ls | wc
   74              522     522    3317
   75 
   76 Without soft updates:
   77 
   78         labrat# time ./MAKEDEV std
   79         2.0u 40.5s 0:42.53 100.0% 0+0k 11+1221io 0pf+0w
   80 
   81         labrat# ls | wc
   82              522     522    3317
   83 
   84 Of course, some of the system time is being pushed
   85 to the syncer process, but that is a different story.
   86 
   87 To show a benchmark designed to highlight the soft update code
   88 consider a tar of zero-sized files and an rm -rf of a directory tree
   89 that has at least 50 files or so at each level. Running a test with
   90 a directory tree containing 28 directories holding 202 empty files
   91 produces the following numbers:
   92 
   93 With soft updates:
   94 tar: 0.0u 0.5s 0:00.65 76.9% 0+0k 0+44io 0pf+0w (0 sync, 33 async writes)
   95 rm: 0.0u 0.2s 0:00.20 100.0% 0+0k 0+37io 0pf+0w (0 sync, 72 async writes)
   96 
   97 Normal filesystem:
   98 tar: 0.0u 1.1s 0:07.27 16.5% 0+0k 60+586io 0pf+0w (523 sync, 0 async writes)
   99 rm:  0.0u 0.5s 0:01.84 29.3% 0+0k 0+318io 0pf+0w (258 sync, 65 async writes)
  100 
  101 The large reduction in writes is because inodes are clustered, so
  102 most of a block gets allocated, then the whole block is written
  103 out once rather than having the same block written once for each
  104 inode allocated from it.  Similarly each directory block is written
  105 once rather than once for each new directory entry. Effectively
  106 what the update code is doing is allocating a bunch of inodes
  107 and directory entries without writing anything, then ensuring that
  108 the block containing the inodes is written first followed by the
  109 directory block that references them.  If there were data in the
  110 files it would further ensure that the data blocks were written
  111 before their inodes claimed them.
  112 
  113 Copyright Restrictions
  114 
  115 Please familiarize yourself with the copyright restrictions
  116 contained at the top of either the sys/ufs/ffs/softdep.h or
  117 sys/ufs/ffs/ffs_softdep.c file. The key provision is similar
  118 to the one used by the DB 2.0 package and goes as follows:
  119 
  120     Redistributions in any form must be accompanied by information
  121     on how to obtain complete source code for any accompanying
  122     software that uses the this software. This source code must
  123     either be included in the distribution or be available for
  124     no more than the cost of distribution plus a nominal fee,
  125     and must be freely redistributable under reasonable
  126     conditions. For an executable file, complete source code
  127     means the source code for all modules it contains. It does
  128     not mean source code for modules or files that typically
  129     accompany the operating system on which the executable file
  130     runs, e.g., standard library modules or system header files.
  131 
  132 The idea is to allow those of you freely redistributing your source
  133 to use it while retaining for myself the right to peddle it for
  134 money to the commercial UNIX vendors. Note that I have included a
  135 stub file ffs_softdep.c.stub that is freely redistributable so that
  136 you can put in all the necessary hooks to run the full soft updates
  137 code, but still allow vendors that want to maintain proprietary
  138 source to have a working system. I do plan to release the code with
  139 a `Berkeley style' copyright once I have peddled it around to the
  140 commercial vendors.  If you have concerns about this copyright,
  141 feel free to contact me with them and we can try to resolve any
  142 difficulties.
  143 
  144 Soft Dependency Operation
  145 
  146 The soft update implementation does NOT require ANY changes
  147 to the on-disk format of your filesystems. Furthermore it is
  148 not used by default for any filesystems. It must be enabled on
  149 a filesystem by filesystem basis by running tunefs to set a
  150 bit in the superblock indicating that the filesystem should be
  151 managed using soft updates. If you wish to stop using
  152 soft updates due to performance or reliability reasons,
  153 you can simply run tunefs on it again to turn off the bit and
  154 revert to normal operation. The additional dynamic memory load
  155 placed on the kernel malloc arena is approximately equal to
  156 the amount of memory used by vnodes plus inodes (for a system
  157 with 1000 vnodes, the additional peak memory load is about 300K).
  158 
  159 Kernel Changes
  160 
  161 There are two new changes to the kernel functionality that are not
  162 contained in in the soft update files. The first is a `trickle
  163 sync' facility running in the kernel as process 3.  This trickle
  164 sync process replaces the traditional `update' program (which should
  165 be commented out of the /etc/rc startup script). When a vnode is
  166 first written it is placed 30 seconds down on the trickle sync
  167 queue. If it still exists and has dirty data when it reaches the
  168 top of the queue, it is sync'ed.  This approach evens out the load
  169 on the underlying I/O system and avoids writing short-lived files.
  170 The papers on trickle-sync tend to favor aging based on buffers
  171 rather than files. However, I sync on file age rather than buffer
  172 age because the data structures are much smaller as there are
  173 typically far fewer files than buffers. Although this can make the
  174 I/O spikey when a big file times out, it is still much better than
  175 the wholesale sync's that were happening before. It also adapts
  176 much better to the soft update code where I want to control
  177 aging to improve performance (inodes age in 10 seconds, directories
  178 in 15 seconds, files in 30 seconds). This ensures that most
  179 dependencies are gone (e.g., inodes are written when directory
  180 entries want to go to disk) reducing the amount of rollback that
  181 is needed.
  182 
  183 The other main kernel change is to split the vnode freelist into
  184 two separate lists.  One for vnodes that are still being used to
  185 identify buffers and the other for those vnodes no longer identifying
  186 any buffers.  The latter list is used by getnewvnode in preference
  187 to the former.
  188 
  189 Packaging of Kernel Changes
  190 
  191 The sys subdirectory contains the changes and additions to the
  192 kernel. My goal in writing this code was to minimize the changes
  193 that need to be made to the kernel. Thus, most of the new code
  194 is contained in the two new files softdep.h and ffs_softdep.c.
  195 The rest of the kernel changes are simply inserting hooks to
  196 call into these two new files. Although there has been some
  197 structural reorganization of the filesystem code to accommodate
  198 gathering the information required by the soft update code,
  199 the actual ordering of filesystem operations when soft updates
  200 are disabled is unchanged.
  201 
  202 The kernel changes are packaged as a set of diffs. As I am
  203 doing my development in BSD/OS, the diffs are relative to the
  204 BSD/OS versions of the files. Because BSD/OS recently had
  205 4.4BSD-Lite2 merged into it, the Lite2 files are a good starting
  206 point for figuring out the changes. There are 40 files that
  207 require change plus the two new files. Most of these files have
  208 only a few lines of changes in them. However, four files have
  209 fairly extensive changes: kern/vfs_subr.c, vfs/ufs/ufs_lookup.c,
  210 vfs/ufs/ufs_vnops.c, and vfs/ffs/ffs_alloc.c. For these four
  211 files, I have provided the original Lite2 version, the Lite2
  212 version with the diffs merged in, and the diffs between the
  213 BSD/OS and merged version. Even so, I expect that there will
  214 be some difficulty in doing the merge; I am certainly willing
  215 to assist in helping get the code merged into your system.
  216 
  217 Packaging of Utility Changes
  218 
  219 The utilities subdirectory contains the changes and additions
  220 to the utilities. There are diffs to three utilities enclosed:
  221 
  222     tunefs - add a flag to enable and disable soft updates
  223 
  224     mount - print out whether soft updates are enabled and
  225             also statistics on number of sync and async writes
  226 
  227     fsck - tighter checks on acceptable errors and a slightly
  228            different policy for what to put in lost+found on
  229            filesystems using soft updates
  230 
  231 In addition you should recompile vmstat so as to get reports
  232 on the 13 new memory types used by the soft update code.
  233 It is not necessary to use the new version of fsck, however it
  234 would aid in my debugging if you do. Also, because of the time
  235 lag between deleting a directory entry and the inode it
  236 references, you will find a lot more files showing up in your
  237 lost+found if you do not use the new version. Note that the
  238 new version checks for the soft update flag in the superblock
  239 and only uses the new algorithms if it is set. So, it will run
  240 unchanged on the filesystems that are not using soft updates.
  241 
  242 Operation
  243 
  244 Once you have booted a kernel that incorporates the soft update
  245 code and installed the updated utilities, do the following:
  246 
  247 1) Comment out the update program in /etc/rc.
  248 
  249 2) Run `tunefs -n enable' on one or more test filesystems.
  250 
  251 3) Mount these filesystems and then type `mount' to ensure that
  252    they have been enabled for soft updates.
  253 
  254 4) Copy the test directory to a softdep filesystem, chdir into
  255    it and run `./doit'. You may want to check out each of the
  256    three subtests individually first: doit1 - andrew benchmarks,
  257    doit2 - copy and removal of /etc, doit3 - find from /.
  258 
  259 ====
  260 Additional notes from Feb 13
  261 
  262 When removing huge directories of files, it is possible to get
  263 the incore state arbitrarily far ahead of the disk. Maintaining
  264 all the associated depedency information can exhaust the kernel
  265 malloc arena. To avoid this senario, I have put some limits on
  266 the soft update code so that it will not be allowed to rampage
  267 through all of the kernel memory. I enclose below the relevant
  268 patches to vnode.h and vfs_subr.c (which allow the soft update
  269 code to speed up the filesystem syncer process). I have also
  270 included the diffs for ffs_softdep.c. I hope to make a pass over
  271 ffs_softdep.c to isolate the differences with my standard version
  272 so that these diffs are less painful to incorporate.
  273 
  274 Since I know you like to play with tuning, I have put the relevant
  275 knobs on sysctl debug variables. The tuning knobs can be viewed
  276 with `sysctl debug' and set with `sysctl -w debug.<name>=value'.
  277 The knobs are as follows:
  278 
  279         debug.max_softdeps - limit on any given resource
  280         debug.tickdelay - ticks to delay before allocating
  281         debug.max_limit_hit - number of times tickdelay imposed
  282         debug.rush_requests - number of rush requests to filesystem syncer
  283 
  284 The max_softdeps limit is derived from vnodesdesired which in
  285 turn is sized based on the amount of memory on the machine.
  286 When the limit is hit, a process requesting a resource first
  287 tries to speed up the filesystem syncer process. Such a
  288 request is recorded as a rush_request. After syncdelay / 2 
  289 unserviced rush requests (typically 15) are in the filesystem
  290 syncers queue (i.e., it is more than 15 seconds behind in its 
  291 work), the process requesting the memory is put to sleep for
  292 tickdelay seconds. Such a delay is recorded in max_limit_hit.
  293 Following this delay it is granted its memory without further
  294 delay. I have tried the following experiments in which I
  295 delete an MH directory containing 16,703 files:
  296 
  297 Run #                   1               2               3
  298 
  299 max_softdeps         4496            4496            4496
  300 tickdelay        100 == 1 sec   20 == 0.2 sec   2 == 0.02 sec
  301 max_limit_hit    16 == 16 sec   27 == 5.4 sec   203 == 4.1 sec
  302 rush_requests         147             102              93
  303 run time             57 sec          46 sec          45 sec
  304 I/O's                 781             859             936
  305 
  306 When run with no limits, it completes in 40 seconds. So, the
  307 time spent in delay is directly added to the bottom line.
  308 Shortening the tick delay does cut down the total running time,
  309 but at the expense of generating more total I/O operations
  310 due to the rush orders being sent to the filesystem syncer.
  311 Although the number of rush orders decreases with a shorter
  312 tick delay, there are more requests in each order, hence the
  313 increase in I/O count. Also, although the I/O count does rise
  314 with a shorter delay, it is still at least an order of magnitude 
  315 less than without soft updates. Anyway, you may want to play
  316 around with these value to see what works best and to see if
  317 you can get an insight into how best to tune them. If you get
  318 out of memory panic's, then you have max_softdeps set too high.
  319 The max_limit_hit and rush_requests show be reset to zero
  320 before each run. The minimum legal value for tickdelay is 2
  321 (if you set it below that, the code will use 2).

[ source navigation ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.