fxr.watson.org: linux-2.4.22 sys/Documentation/smp.tex

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/smp.tex

Version: - FREEBSD - FREEBSD-13-STABLE - FREEBSD-13-0 - FREEBSD-12-STABLE - FREEBSD-12-0 - FREEBSD-11-STABLE - FREEBSD-11-0 - FREEBSD-10-STABLE - FREEBSD-10-0 - FREEBSD-9-STABLE - FREEBSD-9-0 - FREEBSD-8-STABLE - FREEBSD-8-0 - FREEBSD-7-STABLE - FREEBSD-7-0 - FREEBSD-6-STABLE - FREEBSD-6-0 - FREEBSD-5-STABLE - FREEBSD-5-0 - FREEBSD-4-STABLE - FREEBSD-3-STABLE - FREEBSD22 - l41 - OPENBSD - linux-2.6 - MK84 - PLAN9 - xnu-8792
SearchContext: - none - 3 - 10

1 From: michael@Physik.Uni-Dortmund.DE (Michael Dirkmann) 2 3 thanks for your information. Attached is the tex-code of your 4 SMP-documentation : 5 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 6 \documentclass[]{article} 7 \parindent0.0cm 8 \parskip0.2cm 9 10 \begin{document} 11 12 \begin{center} 13 \LARGE \bf 14 An Implementation Of Multiprocessor Linux 15 \normalsize 16 \end{center} 17 18 { \it 19 This document describes the implementation of a simple SMP 20 Linux kernel extension and how to use this to develop SMP Linux kernels for 21 architectures other than the Intel MP v1.1 architecture for Pentium and 486 22 processors.} 23 24 \hfill Alan Cox, 1995 25 26 27 The author wishes to thank Caldera Inc. ( http://www.caldera.com ) 28 whose donation of an ASUS dual Pentium board made this project possible, 29 and Thomas Radke, whose initial work on multiprocessor Linux formed 30 the backbone of this project. 31 32 \section{Background: The Intel MP specification.} 33 Most IBM PC style multiprocessor motherboards combine Intel 486 or Pentium 34 processors and glue chipsets with a hardware/software specification. The 35 specification places much of the onus for hard work on the chipset and 36 hardware rather than the operating system. 37 38 The Intel Pentium processors have a wide variety of inbuilt facilities for 39 supporting multiprocessing, including hardware cache coherency, built in 40 interprocessor interrupt handling and a set of atomic test and set, 41 exchange and similar operations. The cache coherency in particular makes the 42 operating system's job far easier. 43 44 The specification defines a detailed configuration structure in ROM that 45 the boot up processor can read to find the full configuration of the 46 processors and buses. It also defines a procedure for starting up the 47 other processors. 48 49 50 \section{Mutual Exclusion Within A Single Processor Linux Kernel} 51 For any kernel to function in a sane manner it has to provide internal 52 locking and protection of its own tables to prevent two processes updating 53 them at once and for example allocating the same memory block. There are 54 two strategies for this within current Unix and Unixlike kernels. 55 Traditional Unix systems from the earliest of days use a scheme of 'Coarse 56 Grained Locking' where the entire kernel is protected by a small number of 57 locks only. Some modern systems use fine grained locking. Because fine 58 grained locking has more overhead it is normally used only on 59 multiprocessor kernels and real time kernels. In a real time kernel the 60 fine grained locking reduces the amount of time locks are held and reduces 61 the critical (to real time programming at least) latency times. 62 63 Within the Linux kernel certain guarantees are made. No process running in 64 kernel mode will be pre-empted by another kernel mode process unless it 65 voluntarily sleeps. This ensures that blocks of kernel code are 66 effectively atomic with respect to other processes and greatly simplifies 67 many operations. Secondly interrupts may pre-empt a kernel running process, 68 but will always return to that process. A process in kernel mode may 69 disable interrupts on the processor and guarantee such an interruption will 70 not occur. The final guarantee is that an interrupt will not be pre-empted 71 by a kernel task. That is interrupts will run to completion or be 72 pre-empted by other interrupts only. 73 74 The SMP kernel chooses to continue these basic guarantees in order to make 75 initial implementation and deployment easier. A single lock is maintained 76 across all processors. This lock is required to access the kernel space. 77 Any processor may hold it and once it is held may also re-enter the kernel 78 for interrupts and other services whenever it likes until the lock is 79 relinquished. This lock ensures that a kernel mode process will not be 80 pre-empted and ensures that blocking interrupts in kernel mode behaves 81 correctly. This is guaranteed because only the processor holding the lock 82 can be in kernel mode, only kernel mode processes can disable interrupts 83 and only the processor holding the lock may handle an interrupt. 84 85 Such a choice is however poor for performance. In the longer term it is 86 necessary to move to finer grained parallelism in order to get the best 87 system performance. This can be done hierarchically by gradually refining 88 the locks to cover smaller areas. With the current kernel highly CPU bound 89 process sets perform well but I/O bound task sets can easily degenerate to 90 near single processor performance levels. This refinement will be needed to 91 get the best from Linux/SMP. 92 93 \subsection{Changes To The Portable Kernel Components} 94 The kernel changes are split into generic SMP support changes and 95 architecture specific changes necessary to accommodate each different 96 processor type Linux is ported to. 97 98 99 \subsubsection{Initialisation} 100 The first problem with a multiprocessor kernel is starting the other 101 processors up. Linux/SMP defines that a single processor enters the normal 102 kernel entry point start\_kernel(). Other processors are assumed not to be 103 started or to have been captured elsewhere. The first processor begins the 104 normal Linux initialisation sequences and sets up paging, interrupts and 105 trap handlers. After it has obtained the processor information about the 106 boot CPU, the architecture specific function 107 108 109 {\tt \bf{void smp\_store\_cpu\_info(int processor\_id) }} 110 111 is called to store any information about the processor into a per processor 112 array. This includes things like the bogomips speed ratings. 113 114 Having completed the kernel initialisation the architecture specific 115 function 116 117 {\tt \bf void smp\_boot\_cpus(void) } 118 119 is called and is expected to start up each other processor and cause it to 120 enter start\_kernel() with its paging registers and other control 121 information correctly loaded. Each other processor skips the setup except 122 for calling the trap and irq initialisation functions that are needed on 123 some processors to set each CPU up correctly. These functions will 124 probably need to be modified in existing kernels to cope with this. 125 126 127 Each additional CPU then calls the architecture specific function 128 129 {\tt \bf void smp\_callin(void)} 130 131 which does any final setup and then spins the processor while the boot 132 up processor forks off enough idle threads for each processor. This is 133 necessary because the scheduler assumes there is always something to run. 134 Having generated these threads and forked init the architecture specific 135 136 {\tt \bf void smp\_commence(void)} 137 138 function is invoked. This does any final setup and indicates to the system 139 that multiprocessor mode is now active. All the processors spinning in the 140 smp\_callin() function are now released to run the idle processes, which 141 they will run when they have no real work to process. 142 143 144 \subsubsection{Scheduling} 145 The kernel scheduler implements a simple but very effective task 146 scheduler. The basic structure of this scheduler is unchanged in the 147 multiprocessor kernel. A processor field is added to each task, and this 148 maintains the number of the processor executing a given task, or a magic 149 constant (NO\_PROC\_ID) indicating the job is not allocated to a processor. 150 151 Each processor executes the scheduler itself and will select the next task 152 to run from all runnable processes not allocated to a different processor. 153 The algorithm used by the selection is otherwise unchanged. This is 154 actually inadequate for the final system because there are advantages to 155 keeping a process on the same CPU, especially on processor boards with per 156 processor second level caches. 157 158 Throughout the kernel the variable 'current' is used as a global for the 159 current process. In Linux/SMP this becomes a macro which expands to 160 current\_set[smp\_processor\_id()]. This enables almost the entire kernel to 161 be unaware of the array of running processors, but still allows the SMP 162 aware kernel modules to see all of the running processes. 163 164 The fork system call is modified to generate multiple processes with a 165 process id of zero until the SMP kernel starts up properly. This is 166 necessary because process number 1 must be init, and it is desirable that 167 all the system threads are process 0. 168 169 The final area within the scheduling of processes that does cause problems 170 is the fact the uniprocessor kernel hard codes tests for the idle threads 171 as task[0] and the init process as task[1]. Because there are multiple idle 172 threads it is necessary to replace these with tests that the process id is 173 0 and a search for process ID 1, respectively. 174 175 \subsubsection{Memory Management} 176 The memory management core of the existing Linux system functions 177 adequately within the multiprocessor framework providing the locking is 178 used. Certain processor specific areas do need changing, in particular 179 invalidate() must invalidate the TLBs of all processors before it returns. 180 181 182 \subsubsection{Miscellaneous Functions} 183 The portable SMP code rests on a small set of functions and variables 184 that are provided by the processor specification functionality. These are 185 186 {\tt \bf int smp\_processor\_id(void) } 187 188 which returns the identity of the processor the call is executed upon. This 189 call is assumed to be valid at all times. This may mean additional tests 190 are needed during initialisation. 191 192 193 {\tt \bf int smp\_num\_cpus;} 194 195 This is the number of processors in the system. \ 196 197 {\tt \bf void smp\_message\_pass(int target, int msg, unsigned long data, 198 int wait)} 199 200 This function passes messages between processors. At the moment it is not 201 sufficiently defined to sensibly document and needs cleaning up and further 202 work. Refer to the processor specific code documentation for more details. 203 204 205 \subsection{Architecture Specific Code For the Intel MP Port} 206 The architecture specific code for the Intel port splits fairly cleanly 207 into four sections. Firstly the initialisation code used to boot the 208 system, secondly the message handling and support code, thirdly the 209 interrupt and kernel syscall entry function handling and finally the 210 extensions to standard kernel facilities to cope with multiple processors. 211 212 \subsubsection{Initialisation} 213 The Intel MP architecture captures all the processors except for a single 214 processor known as the 'boot processor' in the BIOS at boot time. Thus a 215 single processor enters the kernel bootup code. The first processor 216 executes the bootstrap code, loads and uncompresses the kernel. Having 217 unpacked the kernel it sets up the paging and control registers then enters 218 the C kernel startup. 219 220 The assembler startup code for the kernel is modified so that it can be 221 used by the other processors to do the processor identification and various 222 other low level configurations but does not execute those parts of the 223 startup code that would damage the running system (such as clearing the BSS 224 segment). 225 226 In the initialisation done by the first processor the arch/i386/mm/init 227 code is modified to scan the low page, top page and BIOS for intel MP 228 signature blocks. This is necessary because the MP signature blocks must 229 be read and processed before the kernel is allowed to allocate and destroy 230 the page at the top of low memory. Having established the number of 231 processors it reserves a set of pages to provide a stack come boot up area 232 for each processor in the system. These must be allocated at startup to 233 ensure they fall below the 1Mb boundary. 234 235 Further processors are started up in smp\_boot\_cpus() by programming the 236 APIC controller registers and sending an inter-processor interrupt (IPI) to 237 the processor. This message causes the target processor to begin executing 238 code at the start of any page of memory within the lowest 1Mb, in 16bit 239 real mode. The kernel uses the single page it allocated for each processor 240 to use as stack. Before booting a given CPU the relocatable code from 241 trampoline.S and trampoline32.S is copied to the bottom of its stack page 242 and used as the target for the startup. 243 244 The trampoline code calculates the desired stack base from the code 245 segment (since the code segment on startup is the bottom of the stack), 246 enters 32bit mode and jumps to the kernel entry assembler. This as 247 described above is modified to only execute the parts necessary for each 248 processor, and then to enter start\_kernel(). On entering the kernel the 249 processor initialises its trap and interrupt handlers before entering 250 smp\_callin(), where it reports its status and sets a flag that causes the 251 boot processor to continue and look for further processors. The processor 252 then spins until smp\_commence() is invoked. 253 254 Having started each processor up the smp\_commence( ) function flips a 255 flag. Each processor spinning in smp\_callin() then loads the task register 256 with the task state segment (TSS) of its idle thread as is needed for task 257 switching. 258 259 \subsubsection{Message Handling and Support Code} 260 The architecture specific code implements the smp\_processor\_id() function 261 by querying the APIC logical identity register. Because the APIC isn't 262 mapped into the kernel address space at boot, the initial value returned is 263 rigged by setting the APIC base pointer to point at a suitable constant. 264 Once the system starts doing the SMP setup (in smp\_boot\_cpus()), the APIC 265 is mapped with a vremap() call and the apic pointer is adjusted 266 appropriately. From then on the real APIC logical identity register is 267 read. 268 269 Message passing is accomplished using a pair of IPIs on interrupt 13 270 (unused by the 80486 FPUs in SMP mode) and interrupt 16. Two are used in 271 order to separate messages that cannot be processed until the receiver 272 obtains the kernel spinlock from messages that can be processed 273 immediately. In effect IRQ 13 is a fast IRQ handler that does not obtain 274 the locks, and cannot cause a reschedule, while IRQ 16 is a slow IRQ that 275 must acquire the kernel spinlocks and can cause a reschedule. This 276 interrupt is used for passing on slave timer messages from the processor 277 that receives the timer interrupt to the rest of the processors, so that 278 they can reschedule running tasks. 279 280 281 \subsubsection{Entry And Exit Code} 282 A single spinlock protects the entire kernel. The interrupt handlers, the 283 syscall entry code and the exception handlers all acquire the lock before 284 entering the kernel proper. When the processor is trying to acquire the 285 spinlock it spins continually on the lock with interrupts disabled. This 286 causes a specific deadlock problem. The lock owner may need to send an 287 invalidate request to the rest of the processors and wait for these to 288 complete before continuing. A processor spinning on the lock would not be 289 able to do this. Thus the loop of the spinlock tests and handles invalidate 290 requests. If the invalidate bit for the spinning CPU is set the processor 291 invalidates its TLB and atomically clears the bit. When the spinlock is 292 obtained that processor will take an IPI and in the IPI test the bit and 293 skip the invalidate as the bit is clear. 294 295 One complexity of the spinlock is that a process running in kernel mode 296 can sleep voluntarily and be pre-empted. A switch from such a process to a 297 process executing in user space may reduce the lock count. To track this 298 the kernel uses a syscall\_count and a per process lock\_depth parameter to 299 track the kernel lock state. The switch\_to() function is modified in SMP 300 mode to adjust the lock appropriately. 301 302 The final problem is the idle thread. In the single processor kernel the 303 idle thread executes 'hlt' instructions. This saves power and reduces the 304 running temperature of the processors when they are idle. However it means 305 the process spends all its time in kernel mode and would thus hold the 306 kernel spinlock. The SMP idle thread continually reschedules a new task and 307 returns to user mode. This is far from ideal and will be modified to use 308 'hlt' instructions and release the spinlock soon. Using 'hlt' is even more 309 beneficial on a multiprocessor system as it almost completely takes an idle 310 processor off the bus. 311 312 Interrupts are distributed by an i82489 APIC. This chip is set up to work 313 as an emulation of the traditional PC interrupt controllers when the 314 machine boots (so that an Intel MP machine boots one CPU and PC 315 compatible). The kernel has all the relevant locks but does not yet 316 reprogram the 82489 to deliver interrupts to arbitrary processors as it 317 should. This requires further modification of the standard Linux interrupt 318 handling code, and is particularly messy as the interrupt handler behaviour 319 has to change as soon as the 82489 is switched into SMP mode. 320 321 322 \subsubsection{Extensions To Standard Facilities} 323 The kernel maintains a set of per processor control information such as 324 the speed of the processor for delay loops. These functions on the SMP 325 kernel look the values up in a per processor array that is set up from the 326 data generated at boot up by the smp\_store\_cpu\_info() function. This 327 includes other facts such as whether there is an FPU on the processor. The 328 current kernel does not handle floating point correctly, this requires some 329 changes to the techniques the single CPU kernel uses to minimise floating 330 point processor reloads. 331 332 The highly useful atomic bit operations are prefixed with the 'lock' 333 prefix in the SMP kernel to maintain their atomic properties when used 334 outside of (and by) the spinlock and message code. Amongst other things 335 this is needed for the invalidate handler, as all CPU's will invalidate at 336 the same time without any locks. 337 338 Interrupt 13 floating point error reporting is removed. This facility is 339 not usable on a multiprocessor board, nor relevant to the Intel MP 340 architecture which does not cover the 80386/80387 processor pair. \ 341 342 The /proc filesystem support is changed so that the /proc/cpuinfo file 343 contains a column for each processor present. This information is extracted 344 from the data saved by smp\_store\_cpu\_info(). 345 346 \end{document}

Cache object: 6d66e89c1017232b0ab7889153dc2a7e

FreeBSD/Linux Kernel Cross Reference sys/Documentation/smp.tex

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/smp.tex