The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/ufs/ffs/

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

Name Size Last modified (GMT) Description
Back Parent directory 2023-01-29 20:53:11
File README 15073 bytes 2023-01-29 20:53:11
File README.softupdates 717 bytes 2023-01-29 20:53:11
C file ffs_alloc.c 50479 bytes 2023-01-29 20:53:11
C file ffs_balloc.c 11256 bytes 2023-01-29 20:53:11
C file ffs_extern.h 5560 bytes 2023-01-29 20:53:11
C file ffs_inode.c 15963 bytes 2023-01-29 20:53:11
C file ffs_rawread.c 11426 bytes 2023-01-29 20:53:11
C file ffs_softdep.c 150000 bytes 2023-01-29 20:53:11
C file ffs_softdep_stub.c 5673 bytes 2023-01-29 20:53:11
C file ffs_subr.c 6382 bytes 2023-01-29 20:53:11
C file ffs_tables.c 5873 bytes 2023-01-29 20:53:11
C file ffs_vfsops.c 35054 bytes 2023-01-29 20:53:11
C file ffs_vnops.c 8213 bytes 2023-01-29 20:53:11
C file fs.h 23572 bytes 2023-01-29 20:53:11
C file softdep.h 27596 bytes 2023-01-29 20:53:11

    1 # $FreeBSD$
    2 
    3 Introduction
    4 
    5 This package constitutes the alpha distribution of the soft update
    6 code updates for the fast filesystem.
    7 
    8 For More information on what Soft Updates is, see:
    9 http://www.ece.cmu.edu/~ganger/papers/CSE-TR-254-95/
   10 
   11 Status
   12 
   13 My `filesystem torture tests' (described below) run for days without
   14 a hitch (no panic's, hangs, filesystem corruption, or memory leaks).
   15 However, I have had several panic's reported to me by folks that
   16 are field testing the code which I have not yet been able to
   17 reproduce or fix. Although these panic's are rare and do not cause
   18 filesystem corruption, the code should only be put into production
   19 on systems where the system administrator is aware that it is being
   20 run, and knows how to turn it off if problems arise. Thus, you may
   21 hand out this code to others, but please ensure that this status
   22 message is included with any distributions. Please also include
   23 the file ffs_softdep.stub.c in any distributions so that folks that
   24 cannot abide by the need to redistribute source will not be left
   25 with a kernel that will not link. It will resolve all the calls
   26 into the soft update code and simply ignores the request to enable
   27 them. Thus you will be able to ensure that your other hooks have
   28 not broken anything and that your kernel is softdep-ready for those
   29 that wish to use them. Please report problems back to me with
   30 kernel backtraces of panics if possible. This is massively complex
   31 code, and people only have to have their filesystems hosed once or
   32 twice to avoid future changes like the plague. I want to find and
   33 fix as many bugs as soon as possible so as to get the code rock
   34 solid before it gets widely released. Please report any bugs that
   35 you uncover to mckusick@mckusick.com.
   36 
   37 Performance
   38 
   39 Running the Andrew Benchmarks yields the following raw data:
   40 
   41         Phase   Normal  Softdep     What it does
   42           1       3s      <1s       Creating directories
   43           2       8s       4s       Copying files
   44           3       6s       6s       Recursive directory stats
   45           4       8s       9s       Scanning each file
   46           5      25s      25s       Compilation
   47 
   48         Normal:  19.9u 29.2s 0:52.8 135+630io
   49         Softdep: 20.3u 28.5s 0:47.8 103+363io
   50 
   51 Another interesting datapoint are my `filesystem torture tests'.
   52 They consist of 1000 runs of the andrew benchmarks, 1000 copy and
   53 removes of /etc with randomly selected pauses of 0-60 seconds
   54 between each copy and remove, and 500 find from / with randomly
   55 selected pauses of 100 seconds between each run). The run of the
   56 torture test compares as follows:
   57 
   58 With soft updates: writes: 6 sync, 1,113,686 async; run time 19hr, 50min
   59 Normal filesystem: writes: 1,459,147 sync, 487,031 async; run time 27hr, 15min
   60 
   61 The upshot is 42% less I/O and 28% shorter running time.
   62 
   63 Another interesting test point is a full MAKEDEV. Because it runs
   64 as a shell script, it becomes mostly limited by the execution speed
   65 of the machine on which it runs. Here are the numbers:
   66 
   67 With soft updates:
   68 
   69         labrat# time ./MAKEDEV std
   70         2.2u 32.6s 0:34.82 100.0% 0+0k 11+36io 0pf+0w
   71 
   72         labrat# ls | wc
   73              522     522    3317
   74 
   75 Without soft updates:
   76 
   77         labrat# time ./MAKEDEV std
   78         2.0u 40.5s 0:42.53 100.0% 0+0k 11+1221io 0pf+0w
   79 
   80         labrat# ls | wc
   81              522     522    3317
   82 
   83 Of course, some of the system time is being pushed
   84 to the syncer process, but that is a different story.
   85 
   86 To show a benchmark designed to highlight the soft update code
   87 consider a tar of zero-sized files and an rm -rf of a directory tree
   88 that has at least 50 files or so at each level. Running a test with
   89 a directory tree containing 28 directories holding 202 empty files
   90 produces the following numbers:
   91 
   92 With soft updates:
   93 tar: 0.0u 0.5s 0:00.65 76.9% 0+0k 0+44io 0pf+0w (0 sync, 33 async writes)
   94 rm: 0.0u 0.2s 0:00.20 100.0% 0+0k 0+37io 0pf+0w (0 sync, 72 async writes)
   95 
   96 Normal filesystem:
   97 tar: 0.0u 1.1s 0:07.27 16.5% 0+0k 60+586io 0pf+0w (523 sync, 0 async writes)
   98 rm:  0.0u 0.5s 0:01.84 29.3% 0+0k 0+318io 0pf+0w (258 sync, 65 async writes)
   99 
  100 The large reduction in writes is because inodes are clustered, so
  101 most of a block gets allocated, then the whole block is written
  102 out once rather than having the same block written once for each
  103 inode allocated from it.  Similarly each directory block is written
  104 once rather than once for each new directory entry. Effectively
  105 what the update code is doing is allocating a bunch of inodes
  106 and directory entries without writing anything, then ensuring that
  107 the block containing the inodes is written first followed by the
  108 directory block that references them.  If there were data in the
  109 files it would further ensure that the data blocks were written
  110 before their inodes claimed them.
  111 
  112 Copyright Restrictions
  113 
  114 Please familiarize yourself with the copyright restrictions
  115 contained at the top of either the sys/ufs/ffs/softdep.h or
  116 sys/ufs/ffs/ffs_softdep.c file. The key provision is similar
  117 to the one used by the DB 2.0 package and goes as follows:
  118 
  119     Redistributions in any form must be accompanied by information
  120     on how to obtain complete source code for any accompanying
  121     software that uses the this software. This source code must
  122     either be included in the distribution or be available for
  123     no more than the cost of distribution plus a nominal fee,
  124     and must be freely redistributable under reasonable
  125     conditions. For an executable file, complete source code
  126     means the source code for all modules it contains. It does
  127     not mean source code for modules or files that typically
  128     accompany the operating system on which the executable file
  129     runs, e.g., standard library modules or system header files.
  130 
  131 The idea is to allow those of you freely redistributing your source
  132 to use it while retaining for myself the right to peddle it for
  133 money to the commercial UNIX vendors. Note that I have included a
  134 stub file ffs_softdep.c.stub that is freely redistributable so that
  135 you can put in all the necessary hooks to run the full soft updates
  136 code, but still allow vendors that want to maintain proprietary
  137 source to have a working system. I do plan to release the code with
  138 a `Berkeley style' copyright once I have peddled it around to the
  139 commercial vendors.  If you have concerns about this copyright,
  140 feel free to contact me with them and we can try to resolve any
  141 difficulties.
  142 
  143 Soft Dependency Operation
  144 
  145 The soft update implementation does NOT require ANY changes
  146 to the on-disk format of your filesystems. Furthermore it is
  147 not used by default for any filesystems. It must be enabled on
  148 a filesystem by filesystem basis by running tunefs to set a
  149 bit in the superblock indicating that the filesystem should be
  150 managed using soft updates. If you wish to stop using
  151 soft updates due to performance or reliability reasons,
  152 you can simply run tunefs on it again to turn off the bit and
  153 revert to normal operation. The additional dynamic memory load
  154 placed on the kernel malloc arena is approximately equal to
  155 the amount of memory used by vnodes plus inodes (for a system
  156 with 1000 vnodes, the additional peak memory load is about 300K).
  157 
  158 Kernel Changes
  159 
  160 There are two new changes to the kernel functionality that are not
  161 contained in in the soft update files. The first is a `trickle
  162 sync' facility running in the kernel as process 3.  This trickle
  163 sync process replaces the traditional `update' program (which should
  164 be commented out of the /etc/rc startup script). When a vnode is
  165 first written it is placed 30 seconds down on the trickle sync
  166 queue. If it still exists and has dirty data when it reaches the
  167 top of the queue, it is sync'ed.  This approach evens out the load
  168 on the underlying I/O system and avoids writing short-lived files.
  169 The papers on trickle-sync tend to favor aging based on buffers
  170 rather than files. However, I sync on file age rather than buffer
  171 age because the data structures are much smaller as there are
  172 typically far fewer files than buffers. Although this can make the
  173 I/O spikey when a big file times out, it is still much better than
  174 the wholesale sync's that were happening before. It also adapts
  175 much better to the soft update code where I want to control
  176 aging to improve performance (inodes age in 10 seconds, directories
  177 in 15 seconds, files in 30 seconds). This ensures that most
  178 dependencies are gone (e.g., inodes are written when directory
  179 entries want to go to disk) reducing the amount of rollback that
  180 is needed.
  181 
  182 The other main kernel change is to split the vnode freelist into
  183 two separate lists.  One for vnodes that are still being used to
  184 identify buffers and the other for those vnodes no longer identifying
  185 any buffers.  The latter list is used by getnewvnode in preference
  186 to the former.
  187 
  188 Packaging of Kernel Changes
  189 
  190 The sys subdirectory contains the changes and additions to the
  191 kernel. My goal in writing this code was to minimize the changes
  192 that need to be made to the kernel. Thus, most of the new code
  193 is contained in the two new files softdep.h and ffs_softdep.c.
  194 The rest of the kernel changes are simply inserting hooks to
  195 call into these two new files. Although there has been some
  196 structural reorganization of the filesystem code to accommodate
  197 gathering the information required by the soft update code,
  198 the actual ordering of filesystem operations when soft updates
  199 are disabled is unchanged.
  200 
  201 The kernel changes are packaged as a set of diffs. As I am
  202 doing my development in BSD/OS, the diffs are relative to the
  203 BSD/OS versions of the files. Because BSD/OS recently had
  204 4.4BSD-Lite2 merged into it, the Lite2 files are a good starting
  205 point for figuring out the changes. There are 40 files that
  206 require change plus the two new files. Most of these files have
  207 only a few lines of changes in them. However, four files have
  208 fairly extensive changes: kern/vfs_subr.c, ufs/ufs/ufs_lookup.c,
  209 ufs/ufs/ufs_vnops.c, and ufs/ffs/ffs_alloc.c. For these four
  210 files, I have provided the original Lite2 version, the Lite2
  211 version with the diffs merged in, and the diffs between the
  212 BSD/OS and merged version. Even so, I expect that there will
  213 be some difficulty in doing the merge; I am certainly willing
  214 to assist in helping get the code merged into your system.
  215 
  216 Packaging of Utility Changes
  217 
  218 The utilities subdirectory contains the changes and additions
  219 to the utilities. There are diffs to three utilities enclosed:
  220 
  221     tunefs - add a flag to enable and disable soft updates
  222 
  223     mount - print out whether soft updates are enabled and
  224             also statistics on number of sync and async writes
  225 
  226     fsck - tighter checks on acceptable errors and a slightly
  227            different policy for what to put in lost+found on
  228            filesystems using soft updates
  229 
  230 In addition you should recompile vmstat so as to get reports
  231 on the 13 new memory types used by the soft update code.
  232 It is not necessary to use the new version of fsck, however it
  233 would aid in my debugging if you do. Also, because of the time
  234 lag between deleting a directory entry and the inode it
  235 references, you will find a lot more files showing up in your
  236 lost+found if you do not use the new version. Note that the
  237 new version checks for the soft update flag in the superblock
  238 and only uses the new algorithms if it is set. So, it will run
  239 unchanged on the filesystems that are not using soft updates.
  240 
  241 Operation
  242 
  243 Once you have booted a kernel that incorporates the soft update
  244 code and installed the updated utilities, do the following:
  245 
  246 1) Comment out the update program in /etc/rc.
  247 
  248 2) Run `tunefs -n enable' on one or more test filesystems.
  249 
  250 3) Mount these filesystems and then type `mount' to ensure that
  251    they have been enabled for soft updates.
  252 
  253 4) Copy the test directory to a softdep filesystem, chdir into
  254    it and run `./doit'. You may want to check out each of the
  255    three subtests individually first: doit1 - andrew benchmarks,
  256    doit2 - copy and removal of /etc, doit3 - find from /.
  257 
  258 ====
  259 Additional notes from Feb 13
  260 
  261 When removing huge directories of files, it is possible to get
  262 the incore state arbitrarily far ahead of the disk. Maintaining
  263 all the associated depedency information can exhaust the kernel
  264 malloc arena. To avoid this senario, I have put some limits on
  265 the soft update code so that it will not be allowed to rampage
  266 through all of the kernel memory. I enclose below the relevant
  267 patches to vnode.h and vfs_subr.c (which allow the soft update
  268 code to speed up the filesystem syncer process). I have also
  269 included the diffs for ffs_softdep.c. I hope to make a pass over
  270 ffs_softdep.c to isolate the differences with my standard version
  271 so that these diffs are less painful to incorporate.
  272 
  273 Since I know you like to play with tuning, I have put the relevant
  274 knobs on sysctl debug variables. The tuning knobs can be viewed
  275 with `sysctl debug' and set with `sysctl -w debug.<name>=value'.
  276 The knobs are as follows:
  277 
  278         debug.max_softdeps - limit on any given resource
  279         debug.tickdelay - ticks to delay before allocating
  280         debug.max_limit_hit - number of times tickdelay imposed
  281         debug.rush_requests - number of rush requests to filesystem syncer
  282 
  283 The max_softdeps limit is derived from vnodesdesired which in
  284 turn is sized based on the amount of memory on the machine.
  285 When the limit is hit, a process requesting a resource first
  286 tries to speed up the filesystem syncer process. Such a
  287 request is recorded as a rush_request. After syncdelay / 2 
  288 unserviced rush requests (typically 15) are in the filesystem
  289 syncers queue (i.e., it is more than 15 seconds behind in its 
  290 work), the process requesting the memory is put to sleep for
  291 tickdelay seconds. Such a delay is recorded in max_limit_hit.
  292 Following this delay it is granted its memory without further
  293 delay. I have tried the following experiments in which I
  294 delete an MH directory containing 16,703 files:
  295 
  296 Run #                   1               2               3
  297 
  298 max_softdeps         4496            4496            4496
  299 tickdelay        100 == 1 sec   20 == 0.2 sec   2 == 0.02 sec
  300 max_limit_hit    16 == 16 sec   27 == 5.4 sec   203 == 4.1 sec
  301 rush_requests         147             102              93
  302 run time             57 sec          46 sec          45 sec
  303 I/O's                 781             859             936
  304 
  305 When run with no limits, it completes in 40 seconds. So, the
  306 time spent in delay is directly added to the bottom line.
  307 Shortening the tick delay does cut down the total running time,
  308 but at the expense of generating more total I/O operations
  309 due to the rush orders being sent to the filesystem syncer.
  310 Although the number of rush orders decreases with a shorter
  311 tick delay, there are more requests in each order, hence the
  312 increase in I/O count. Also, although the I/O count does rise
  313 with a shorter delay, it is still at least an order of magnitude 
  314 less than without soft updates. Anyway, you may want to play
  315 around with these value to see what works best and to see if
  316 you can get an insight into how best to tune them. If you get
  317 out of memory panic's, then you have max_softdeps set too high.
  318 The max_limit_hit and rush_requests show be reset to zero
  319 before each run. The minimum legal value for tickdelay is 2
  320 (if you set it below that, the code will use 2).
  321 
  322 

[ source navigation ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.