![]() Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[ source navigation ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]
FreeBSD/Linux Kernel Cross Reference
|
Name | Size | Last modified (GMT) | Description | |
![]() | Parent directory | 2023-01-29 20:53:11 | ||
![]() | README | 15073 bytes | 2023-01-29 20:53:11 | |
![]() | README.softupdates | 717 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_alloc.c | 50479 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_balloc.c | 11256 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_extern.h | 5560 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_inode.c | 15963 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_rawread.c | 11426 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_softdep.c | 150000 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_softdep_stub.c | 5673 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_subr.c | 6382 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_tables.c | 5873 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_vfsops.c | 35054 bytes | 2023-01-29 20:53:11 | |
![]() | ffs_vnops.c | 8213 bytes | 2023-01-29 20:53:11 | |
![]() | fs.h | 23572 bytes | 2023-01-29 20:53:11 | |
![]() | softdep.h | 27596 bytes | 2023-01-29 20:53:11 |
1 # $FreeBSD$ 2 3 Introduction 4 5 This package constitutes the alpha distribution of the soft update 6 code updates for the fast filesystem. 7 8 For More information on what Soft Updates is, see: 9 http://www.ece.cmu.edu/~ganger/papers/CSE-TR-254-95/ 10 11 Status 12 13 My `filesystem torture tests' (described below) run for days without 14 a hitch (no panic's, hangs, filesystem corruption, or memory leaks). 15 However, I have had several panic's reported to me by folks that 16 are field testing the code which I have not yet been able to 17 reproduce or fix. Although these panic's are rare and do not cause 18 filesystem corruption, the code should only be put into production 19 on systems where the system administrator is aware that it is being 20 run, and knows how to turn it off if problems arise. Thus, you may 21 hand out this code to others, but please ensure that this status 22 message is included with any distributions. Please also include 23 the file ffs_softdep.stub.c in any distributions so that folks that 24 cannot abide by the need to redistribute source will not be left 25 with a kernel that will not link. It will resolve all the calls 26 into the soft update code and simply ignores the request to enable 27 them. Thus you will be able to ensure that your other hooks have 28 not broken anything and that your kernel is softdep-ready for those 29 that wish to use them. Please report problems back to me with 30 kernel backtraces of panics if possible. This is massively complex 31 code, and people only have to have their filesystems hosed once or 32 twice to avoid future changes like the plague. I want to find and 33 fix as many bugs as soon as possible so as to get the code rock 34 solid before it gets widely released. Please report any bugs that 35 you uncover to mckusick@mckusick.com. 36 37 Performance 38 39 Running the Andrew Benchmarks yields the following raw data: 40 41 Phase Normal Softdep What it does 42 1 3s <1s Creating directories 43 2 8s 4s Copying files 44 3 6s 6s Recursive directory stats 45 4 8s 9s Scanning each file 46 5 25s 25s Compilation 47 48 Normal: 19.9u 29.2s 0:52.8 135+630io 49 Softdep: 20.3u 28.5s 0:47.8 103+363io 50 51 Another interesting datapoint are my `filesystem torture tests'. 52 They consist of 1000 runs of the andrew benchmarks, 1000 copy and 53 removes of /etc with randomly selected pauses of 0-60 seconds 54 between each copy and remove, and 500 find from / with randomly 55 selected pauses of 100 seconds between each run). The run of the 56 torture test compares as follows: 57 58 With soft updates: writes: 6 sync, 1,113,686 async; run time 19hr, 50min 59 Normal filesystem: writes: 1,459,147 sync, 487,031 async; run time 27hr, 15min 60 61 The upshot is 42% less I/O and 28% shorter running time. 62 63 Another interesting test point is a full MAKEDEV. Because it runs 64 as a shell script, it becomes mostly limited by the execution speed 65 of the machine on which it runs. Here are the numbers: 66 67 With soft updates: 68 69 labrat# time ./MAKEDEV std 70 2.2u 32.6s 0:34.82 100.0% 0+0k 11+36io 0pf+0w 71 72 labrat# ls | wc 73 522 522 3317 74 75 Without soft updates: 76 77 labrat# time ./MAKEDEV std 78 2.0u 40.5s 0:42.53 100.0% 0+0k 11+1221io 0pf+0w 79 80 labrat# ls | wc 81 522 522 3317 82 83 Of course, some of the system time is being pushed 84 to the syncer process, but that is a different story. 85 86 To show a benchmark designed to highlight the soft update code 87 consider a tar of zero-sized files and an rm -rf of a directory tree 88 that has at least 50 files or so at each level. Running a test with 89 a directory tree containing 28 directories holding 202 empty files 90 produces the following numbers: 91 92 With soft updates: 93 tar: 0.0u 0.5s 0:00.65 76.9% 0+0k 0+44io 0pf+0w (0 sync, 33 async writes) 94 rm: 0.0u 0.2s 0:00.20 100.0% 0+0k 0+37io 0pf+0w (0 sync, 72 async writes) 95 96 Normal filesystem: 97 tar: 0.0u 1.1s 0:07.27 16.5% 0+0k 60+586io 0pf+0w (523 sync, 0 async writes) 98 rm: 0.0u 0.5s 0:01.84 29.3% 0+0k 0+318io 0pf+0w (258 sync, 65 async writes) 99 100 The large reduction in writes is because inodes are clustered, so 101 most of a block gets allocated, then the whole block is written 102 out once rather than having the same block written once for each 103 inode allocated from it. Similarly each directory block is written 104 once rather than once for each new directory entry. Effectively 105 what the update code is doing is allocating a bunch of inodes 106 and directory entries without writing anything, then ensuring that 107 the block containing the inodes is written first followed by the 108 directory block that references them. If there were data in the 109 files it would further ensure that the data blocks were written 110 before their inodes claimed them. 111 112 Copyright Restrictions 113 114 Please familiarize yourself with the copyright restrictions 115 contained at the top of either the sys/ufs/ffs/softdep.h or 116 sys/ufs/ffs/ffs_softdep.c file. The key provision is similar 117 to the one used by the DB 2.0 package and goes as follows: 118 119 Redistributions in any form must be accompanied by information 120 on how to obtain complete source code for any accompanying 121 software that uses the this software. This source code must 122 either be included in the distribution or be available for 123 no more than the cost of distribution plus a nominal fee, 124 and must be freely redistributable under reasonable 125 conditions. For an executable file, complete source code 126 means the source code for all modules it contains. It does 127 not mean source code for modules or files that typically 128 accompany the operating system on which the executable file 129 runs, e.g., standard library modules or system header files. 130 131 The idea is to allow those of you freely redistributing your source 132 to use it while retaining for myself the right to peddle it for 133 money to the commercial UNIX vendors. Note that I have included a 134 stub file ffs_softdep.c.stub that is freely redistributable so that 135 you can put in all the necessary hooks to run the full soft updates 136 code, but still allow vendors that want to maintain proprietary 137 source to have a working system. I do plan to release the code with 138 a `Berkeley style' copyright once I have peddled it around to the 139 commercial vendors. If you have concerns about this copyright, 140 feel free to contact me with them and we can try to resolve any 141 difficulties. 142 143 Soft Dependency Operation 144 145 The soft update implementation does NOT require ANY changes 146 to the on-disk format of your filesystems. Furthermore it is 147 not used by default for any filesystems. It must be enabled on 148 a filesystem by filesystem basis by running tunefs to set a 149 bit in the superblock indicating that the filesystem should be 150 managed using soft updates. If you wish to stop using 151 soft updates due to performance or reliability reasons, 152 you can simply run tunefs on it again to turn off the bit and 153 revert to normal operation. The additional dynamic memory load 154 placed on the kernel malloc arena is approximately equal to 155 the amount of memory used by vnodes plus inodes (for a system 156 with 1000 vnodes, the additional peak memory load is about 300K). 157 158 Kernel Changes 159 160 There are two new changes to the kernel functionality that are not 161 contained in in the soft update files. The first is a `trickle 162 sync' facility running in the kernel as process 3. This trickle 163 sync process replaces the traditional `update' program (which should 164 be commented out of the /etc/rc startup script). When a vnode is 165 first written it is placed 30 seconds down on the trickle sync 166 queue. If it still exists and has dirty data when it reaches the 167 top of the queue, it is sync'ed. This approach evens out the load 168 on the underlying I/O system and avoids writing short-lived files. 169 The papers on trickle-sync tend to favor aging based on buffers 170 rather than files. However, I sync on file age rather than buffer 171 age because the data structures are much smaller as there are 172 typically far fewer files than buffers. Although this can make the 173 I/O spikey when a big file times out, it is still much better than 174 the wholesale sync's that were happening before. It also adapts 175 much better to the soft update code where I want to control 176 aging to improve performance (inodes age in 10 seconds, directories 177 in 15 seconds, files in 30 seconds). This ensures that most 178 dependencies are gone (e.g., inodes are written when directory 179 entries want to go to disk) reducing the amount of rollback that 180 is needed. 181 182 The other main kernel change is to split the vnode freelist into 183 two separate lists. One for vnodes that are still being used to 184 identify buffers and the other for those vnodes no longer identifying 185 any buffers. The latter list is used by getnewvnode in preference 186 to the former. 187 188 Packaging of Kernel Changes 189 190 The sys subdirectory contains the changes and additions to the 191 kernel. My goal in writing this code was to minimize the changes 192 that need to be made to the kernel. Thus, most of the new code 193 is contained in the two new files softdep.h and ffs_softdep.c. 194 The rest of the kernel changes are simply inserting hooks to 195 call into these two new files. Although there has been some 196 structural reorganization of the filesystem code to accommodate 197 gathering the information required by the soft update code, 198 the actual ordering of filesystem operations when soft updates 199 are disabled is unchanged. 200 201 The kernel changes are packaged as a set of diffs. As I am 202 doing my development in BSD/OS, the diffs are relative to the 203 BSD/OS versions of the files. Because BSD/OS recently had 204 4.4BSD-Lite2 merged into it, the Lite2 files are a good starting 205 point for figuring out the changes. There are 40 files that 206 require change plus the two new files. Most of these files have 207 only a few lines of changes in them. However, four files have 208 fairly extensive changes: kern/vfs_subr.c, ufs/ufs/ufs_lookup.c, 209 ufs/ufs/ufs_vnops.c, and ufs/ffs/ffs_alloc.c. For these four 210 files, I have provided the original Lite2 version, the Lite2 211 version with the diffs merged in, and the diffs between the 212 BSD/OS and merged version. Even so, I expect that there will 213 be some difficulty in doing the merge; I am certainly willing 214 to assist in helping get the code merged into your system. 215 216 Packaging of Utility Changes 217 218 The utilities subdirectory contains the changes and additions 219 to the utilities. There are diffs to three utilities enclosed: 220 221 tunefs - add a flag to enable and disable soft updates 222 223 mount - print out whether soft updates are enabled and 224 also statistics on number of sync and async writes 225 226 fsck - tighter checks on acceptable errors and a slightly 227 different policy for what to put in lost+found on 228 filesystems using soft updates 229 230 In addition you should recompile vmstat so as to get reports 231 on the 13 new memory types used by the soft update code. 232 It is not necessary to use the new version of fsck, however it 233 would aid in my debugging if you do. Also, because of the time 234 lag between deleting a directory entry and the inode it 235 references, you will find a lot more files showing up in your 236 lost+found if you do not use the new version. Note that the 237 new version checks for the soft update flag in the superblock 238 and only uses the new algorithms if it is set. So, it will run 239 unchanged on the filesystems that are not using soft updates. 240 241 Operation 242 243 Once you have booted a kernel that incorporates the soft update 244 code and installed the updated utilities, do the following: 245 246 1) Comment out the update program in /etc/rc. 247 248 2) Run `tunefs -n enable' on one or more test filesystems. 249 250 3) Mount these filesystems and then type `mount' to ensure that 251 they have been enabled for soft updates. 252 253 4) Copy the test directory to a softdep filesystem, chdir into 254 it and run `./doit'. You may want to check out each of the 255 three subtests individually first: doit1 - andrew benchmarks, 256 doit2 - copy and removal of /etc, doit3 - find from /. 257 258 ==== 259 Additional notes from Feb 13 260 261 When removing huge directories of files, it is possible to get 262 the incore state arbitrarily far ahead of the disk. Maintaining 263 all the associated depedency information can exhaust the kernel 264 malloc arena. To avoid this senario, I have put some limits on 265 the soft update code so that it will not be allowed to rampage 266 through all of the kernel memory. I enclose below the relevant 267 patches to vnode.h and vfs_subr.c (which allow the soft update 268 code to speed up the filesystem syncer process). I have also 269 included the diffs for ffs_softdep.c. I hope to make a pass over 270 ffs_softdep.c to isolate the differences with my standard version 271 so that these diffs are less painful to incorporate. 272 273 Since I know you like to play with tuning, I have put the relevant 274 knobs on sysctl debug variables. The tuning knobs can be viewed 275 with `sysctl debug' and set with `sysctl -w debug.<name>=value'. 276 The knobs are as follows: 277 278 debug.max_softdeps - limit on any given resource 279 debug.tickdelay - ticks to delay before allocating 280 debug.max_limit_hit - number of times tickdelay imposed 281 debug.rush_requests - number of rush requests to filesystem syncer 282 283 The max_softdeps limit is derived from vnodesdesired which in 284 turn is sized based on the amount of memory on the machine. 285 When the limit is hit, a process requesting a resource first 286 tries to speed up the filesystem syncer process. Such a 287 request is recorded as a rush_request. After syncdelay / 2 288 unserviced rush requests (typically 15) are in the filesystem 289 syncers queue (i.e., it is more than 15 seconds behind in its 290 work), the process requesting the memory is put to sleep for 291 tickdelay seconds. Such a delay is recorded in max_limit_hit. 292 Following this delay it is granted its memory without further 293 delay. I have tried the following experiments in which I 294 delete an MH directory containing 16,703 files: 295 296 Run # 1 2 3 297 298 max_softdeps 4496 4496 4496 299 tickdelay 100 == 1 sec 20 == 0.2 sec 2 == 0.02 sec 300 max_limit_hit 16 == 16 sec 27 == 5.4 sec 203 == 4.1 sec 301 rush_requests 147 102 93 302 run time 57 sec 46 sec 45 sec 303 I/O's 781 859 936 304 305 When run with no limits, it completes in 40 seconds. So, the 306 time spent in delay is directly added to the bottom line. 307 Shortening the tick delay does cut down the total running time, 308 but at the expense of generating more total I/O operations 309 due to the rush orders being sent to the filesystem syncer. 310 Although the number of rush orders decreases with a shorter 311 tick delay, there are more requests in each order, hence the 312 increase in I/O count. Also, although the I/O count does rise 313 with a shorter delay, it is still at least an order of magnitude 314 less than without soft updates. Anyway, you may want to play 315 around with these value to see what works best and to see if 316 you can get an insight into how best to tune them. If you get 317 out of memory panic's, then you have max_softdeps set too high. 318 The max_limit_hit and rush_requests show be reset to zero 319 before each run. The minimum legal value for tickdelay is 2 320 (if you set it below that, the code will use 2). 321 322
[ source navigation ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]
This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.