The kernel then sets up various structures for controlling the hardware and key functions. It starts with the Interrupt Descriptor Table (IDT) so that it can handle exceptions and some other devices. It then sets up a new Global Descriptor Table (GDT) so that it can also use a Task State Segment (TSS) and user-space segments. The GDT the bootloader provides is not suitable for kernel use. The kernel allocates the space for these structures using the page-frame allocator. This has the added benefit of ensuring they're page-aligned.
When the tables are allocated, the kernel sets up the Programmable Interrupt Controller (PIC) for specifying how the CPU should deal with exceptions and interrupts. During this process, the Programmable Interrupt Timer (PIT) is also initialised, allowing the kernel to keep track of time. These last two are potentially just a fallback however. The kernel then looks for other processors on the system and initialises them. If they don't exist, the PIC and PIT just continue to be used. If they do exist, then the kernel starts them, and has them run through a stripped down version of the same process it went through: enable protected mode, then load the IDT/GDT. This puts all the processors present in the system into the same state, ready to run the OS.
The Bootstrap Processor (BSP) which has been the focus for the majority of this document will continue to do some more setup, and the Application Processors (APs) will just wait until it is complete. There are two types of setup: one-time setup and per-processor setup. One-time setup is normally the case when there is a global resource, such as memory. Examples of one-time setup include setting up the page-frame allocator, and creating the new GDT. Per-processor setup is used when the resource is specific to the local processor. Examples include setting up SSE, or loading the GDT.
If there are other processors present in the system, then each one can use its own local Advanced Programmable Interrupt Controller (APIC) timer. This timer needs to be setup for every processor, but can be used instead of the PIT, even for the BSP. It does, however, rely on the PIT for its own setup. Since there is only one PIT, but multiple processors, access to the PIT must be serialised. To accomplish this, the kernel makes use of a spinlock to ensure that only one processor/process can access it at any one time. Several resources in the kernel make use of such locks to help protect against race conditions and undesirable results.
After the APs are ready, the BSP continues with its setup. The next step is to setup virtual memory. This allows the kernel to see a contiguous version of memory, with each page of virtual memory being backed by any page-frame of physical memory. Regardless of how much physical memory the computer has installed, the kernel can now access any address in the 32-bit address space. As mentioned earlier, the kernel will use the page-frame allocator to provide physical memory for the regions of virtual memory that are actually accessed. This means that the kernel can now place anything it wants at any point in memory and it'll automatically get backed by a free frame of physical memory. To achieve this, the kernel catches a page-fault (triggered when access to a non-present page occurs) and allocates a page-frame for it.
When the kernel has completed its setup of the environment it can prepare for more features, built on top of this low level, which presents a usable OS to the user and programs. The main functions of the kernel are:
- Process management
- Address space management
- The file system
Process Management
The kernel, in order to enable user programs to run, splits the available processing time between any number of threads. It keeps track of these threads using aTask
structure.
Each of these contains a globally unique thread identifier (tid), and also a thread group identifier (tgid).
For a normal process these two values are the same.
Threads of a process will have the process' tgid, but their own tid.
Also contained in the structure are a set of identifiers for which user owns this thread.
This value can be used to determine which other resources the thread may access.
A Task
also contains pointers to a few namespaces for the resources presented by the OS.
Address Space Management
The kernel allows some control over a process' virtual address space. This is a fairly high level interface to the resource. It gives the user and programs the ability to map large regions of memory. These regions maybe backed by either a file or memory (anonymous file). This helps to unify all kernel resources behind the file system.Each page of virtual memory must be backed by a physical page-frame. If a program requires more pages than there are remaining page-frames an issue occurs. The solution to this issue is to pick a page-frame, copy its contents to disk, then repurpose it for the new page. This works until the old page-frame is needed again, at which point we swap the new contents out for the old contents on disk, and carry on using it. Performance can be improved by picking page-frames that aren't likely to be needed any time soon. This may be a page-frame from the currently running process that isn't used often. Ideally it would be a page-frame containing the memory of another process, since that is guaranteed to not be used while the current process is running. This also means that the process that got its page swapped out must be notified that the page is no longer present.
In order to share memory, the regions must be mapped to the same page-frames of physical memory in both address spaces. This ensures that any changes made are visible to all sharing processes. Since regions of memory are backed by either a named or anonymous file, the page frames can be found using the file system page cache.
File System
The Virtual File System (VFS) presents a unified interface to all resources the kernel has to offer. Since all resources can be operated on by basicread
and write
functions, it makes logical sense for them to all share the same interface.
The system also allows a process to move around the available namespace of files and directories.
There is a hierarchical system that lets the user structure the files into groupings on multiple levels.
Each file can represent a resource such as a peripheral or actual disk data file.
This abstraction can actually build on itself to a higher level of abstraction.
If the main hard-drive is represented as a file, the file system driver can just call the read
and write
methods of its file,
rather than needing to utilise the specific driver to access sectors of the disk.
Files can be assigned an owner, which lets the kernel deny certain processes access to it based on the Task
's owner.
This approach allows everything to be implemented on top of the VFS abstraction. At the bottom layer of this, the device drivers sit. Devices all exist as files in the Device File System (devfs). This can then be mounted onto a directory (typically /dev) in the real file system when that's ready. Above these are the actual file systems for storing data and programs. This concept can also be used to allow processes to share memory, by backing that memory with a file and having both processes map it into their address space. This file doesn't need to be on disk, it can exist in a Temporary File System (tmpfs) purely in memory.
The VFS keeps an internal representation of the entire file system, by storing a tree (really an indirected node graph, because of symbolic/hard links) of all files and directories. These files can also have a cache of the data they contain, which can be preferentially read/written to save unnecessary disk access. This cache can also be used to map files into memory and ensure the contents stay the same.
The VFS also keeps an internal log of the path used to access a file/directory. Since most file systems only keep forward linkage, correctly implementing the
..
directory, for instance, is difficult.
This internal log can be used to easily navigate backwards in the tree (when a File
has multiple parents) or to find the name of the current working directory.