[原创] Linux Startup Process

mac2024-05-22  36

KEYWORDS: theory, terminology, csdn, startup, grub, bios, mbr,

HISTORY:

Last updated on 2019-10-31 for release.Updated on 2019-03-16, 2019-03-19, 2019-10-21, 2019-10-22 - 2019-10-25, 2019-10-27, 2019-10-29 - 2019-10-31.Created on 2019-03-16.

Table of Contents (2019-10-22)

Linux Startup Process

BIOS, Firmware Initialization PhaseBoot Loader PhaseKernel Phase Kernel Loading StageKernel Startup Stage Init Process

Terminology

High MemoryInitial RAM Disk initramfs, ramfs and rootfs dracutDracut Emergency Shell initrd MBR, Master Boot Record VBR, Volume Boot Record Splash Screen

Linux Startup Process (2019-02-25, Updated on 2019-10-22)

Wiki: Linux Startup Process

Booting a Linux installation involves multiple stages and software components, including (2019-10-20):

firmware initialization;execution of a boot loader;loading and startup of a Linux kernel image;execution of various startup scripts and daemons.

Summary (2019-10-31):

BIOS, Firmware Initialization Phase

冷启动按钮 (Powered on 或 Reset) 被按下后, 首先执行开机自检. 热启动 (ctrl-alt-del), 默认会自动跳过此步骤.开机自检程序 (POST) 识别并初始化硬件设备, 完成后发送 INT 19h (BIOS Call) 开始启动过程.根据在 BIOS 中配置的设备启动优先级, BIOS 按顺序加载设备的第 1 个扇区 (也就是 MBR, 512 Bytes) 以判断设备是否可启动 (判断依据为: 该扇区可读, 且最后两个字节的签名位为 55AA; 若该扇区不可读, 则跳过, 选择下一个设备, Updated on 2019-10-21).在找到第 1 个可启动设备后, BIOS 转交控制权给 MBR 的 boot.img (with a jump instruction) (Wiki: Master Boot Record, Updated on 2019-10-21).

Boot Loader Phase

加载保存在设备第 1 个扇区中的 boot.img (440 Bytes). boot.img 仅记录 core.img 的第 1 扇区号. 它的作用是读取 core.img 的第 1 个扇区 (也就是 diskboot.img), 并跳转到该扇区 (Updated on 2019-10-21).diskboot.img 的作用是加载 core.img 的剩余部分 (Updated on 2019-10-21).core.img 执行自解压, 然后加载 /boot/grub2/<platform>/normal.mod. 之所以能够加载文件系统是因为 core.img 中包含有能够访问 /boot/ 文件系统的驱动. (Updated on 2019-10-21).normal.mod 解析 /boot/grub/grub.cfg, (可选) 然后加载相关模块 (例如, 用户图形用户界面) 并显示菜单.选择启动菜单后, GRUB 将选中的 Kernel 加载到内存, 并移交控制权给 Kernel.

Kernel Phase

内核加载阶段, Boot Loader 将内核加载到内存, 并解压. 如果初始 root 文件系统镜像 (initrd or initramfs) 存在, 则将其加载至内存供后续使用. 然后启动内核.

内核的启动函数初始化页表并启用内存分页, 检测 CPU 类型和附加的浮点运算单元等, 存储起来以备后用 (Updated on 2019-10-24).

然后调用 start_kernel 函数, 切换到非特定体系结构的 Linux 内核. 也即 Linux 内核的主函数 (Updated on 2019-10-24).

通过调用 start_kernel, 一系列的初始化函数将被调用来设置中断, 并执行进一步的内存配置. 然后加载初始 RAM 磁盘 (Updated on 2019-10-24).

对于 initramfs, 内核将其解压为一个基于 RAM 的 tmpfs, 挂载并将其作为初始 root 文件系统.

将 initramfs 成功解压后, 挂载真正的文件系统所需的驱动, 程序已经全部具备.

Chapter 4.2.3 of 深度探索_Linux_操作系统_系统构建和原理解析_扫描版.pdf (2019-10-24, Updated on 2019-10-25)

在初始化的最后, 内核运行 initramfs 中的 init, init 将探测硬件设备, 加载驱动, 挂载真正的文件系统 (Updated on 2019-10-31).

由于初始 root 文件系统不能被 “rotated”, 因此, 只能将其清空 (释放内存), 然后将最终的 root 文件系统挂载在其顶部 (2019-10-24, Updated on 2019-10-25).


真正的文件系统挂载后, initramfs 即完成使命, 其占用的内存也会被释放 (Updated on 2019-10-31).

然后内核执行文件系统上的 /sbin/init, 并将其作为操作系统的首个进程, PID 为 1, 进而切换到真正的用户空间 (Last updated on 2019-10-31).

Init Process

systemd 启动系统的剩余部分 (高度并行).

BIOS, Firmware Initialization Phase (2019-10-20)

Wiki: BIOS

If the system has just been powered up or the reset button was pressed (“cold boot”), the full power-on self-test (POST) is run. If ctrl-alt-del was pressed (“warm boot”), a special flag value stored in nonvolatile BIOS memory (“CMOS”) tested by the BIOS allows bypass of the lengthy POST and memory detection.

The POST identifies, and initializes system devices such as the CPU, RAM, interrupt and DMA controllers and other parts of the chipset, video display card, keyboard, hard disk drive, optical disc drive and other basic hardware.

After that, the BIOS calls INT 19h (About INT 19h, see below) to start boot processing.

According to the boot priority configured in BIOS (zzp), The BIOS checks each device in order to see if it is bootable by attempting to load the first sector (boot sector). If the sector cannot be read, the BIOS proceeds to the next device. If the sector is read successfully, some BIOSes will also check for the boot sector signature 0x55 0xAA in the last two bytes of the sector (which is 512 bytes long), before accepting a boot sector and considering the device bootable.

When a bootable device is found, the BIOS transfers control to the loaded sector. The BIOS does not interpret the contents of the boot sector other than to possibly check for the boot sector signature in the last two bytes. Interpretation of data structures like partition tables and BIOS Parameter Blocks is done by the boot program in the boot sector itself or by other programs loaded through the boot process.


Wiki: BIOS Interrupt Call (2019-10-20)

About INT 19h: After POST this interrupt is used by the BIOS to load the operating system. A program can call this interrupt to reboot the computer.


Boot Loader Phase

Reference:

Official: GNU GRUB (2019-10-20)Wiki: Linux Startup Process (2019-10-20)Wiki: GNU GRUB (2019-10-20)

Wiki: Linux Startup Process (2019-10-20)

The boot loader phase varies by computer architecture. Since the earlier phases are not specific to the operating system, the BIOS-based boot process for x86 and x86-64 architectures is considered to start when the master boot record (MBR) code is executed in real mode and the first-stage boot loader is loaded.

In UEFI systems, a payload, such as the Linux kernel, can be executed directly. Thus no boot loader is necessary.


Wiki: GNU GRUB (2019-10-20)

1st stage: boot.img is written to the first 440 bytes of the MBR, or optionally in a partition’s boot sector (PBR/VBR). It addresses diskboot.img by a 64-bit LBA (Logical Block Addressing) address, thus it can load from above the 2 GiB limit of the MBR. The actual sector number is written by grub-install.


GNU GRUB Manual 2.04: GRUB Image Files (2019-10-20)

The sole function of boot.img is to read the first sector of the core image from a local disk and jump to it. Because of the size restriction, boot.img cannot understand any file system structure, so grub-install hardcodes the location of the first sector of the core image into boot.img when installing GRUB.


“System Bootstrapping” section of Wiki: Master Boot Record (2019-10-21)

BIOS loads and executes the master boot record. In order to remain compatible, all x86 architecture systems start with the microprocessor in an operating mode referred to as real mode. The BIOS reads the MBR from the storage device into physical memory, and then it directs the microprocessor to the start of the boot code.


Wiki: Linux Startup Process

The “1st stage” (see below “Comments”) is loaded and executed either by the BIOS from the MBR or by another boot loader from the partition boot sector.

2nd stage: diskboot.img is the first sector of core.img (called stage 1.5 in Grub Legacy) with the sole purpose to load the rest of core.img identified by LBA sector numbers also written by grub-install.

On MBR partitioned disks: core.img is stored in the empty sectors (if available) between the MBR and the first partition. Recent operating systems suggest a 1 MiB gap here for alignment (2047*512 byte or 255*4 KiB sectors). This gap used to be 62 sectors (31 KiB) as a reminder of the sector number limit of C/H/S addressing used by Bios before 1998, therefore core.img is designed to be smaller than 32 KiB.

On GPT partitioned disks: partitions are not limited to 4, thus core.img is written to its own tiny (1 MiB), filesystem-less BIOS boot partition.


GNU GRUB Manual 2.04: GRUB Image Files (2019-10-20)

disk.img is used as the first sector of the core image when booting from a hard disk. It reads the rest of the core image into memory and starts the kernel. Since file system handling is not yet available, it encodes the location of the core image using a block list format.


3rd stage: core.img enters 32-bit protected mode, uncompresses itself (the kernel of grub and filesystem modules to reach /boot/grub), then loads /boot/grub/<platform>/normal.mod from the partition configured by grub-install. If the partition index has changed, Grub will be unable to find the normal.mod, and presents the user with the Grub Rescue prompt, where the user “can” find and load normal.mod, or the linux kernel.

The /boot/grub directory can be located on any partition (Grub can read many filesystems, including NTFS). Depending on how it was installed it’s either in the root partition of the distribution, or a separate /boot partition.


GNU GRUB Manual 2.04: GRUB Image Files (2019-10-20)

core.img, this is the core image of GRUB. It is built dynamically from the kernel image and an arbitrary list of modules by the grub-mkimage program. Usually, it contains enough modules to access /boot/grub, and loads everything else (including menu handling, the ability to load target operating systems, and so on) from the file system at run-time. The modular design allows the core image to be kept small, since the areas of disk where it must be installed are often as small as 32KB.


4th stage: normal.mod (equivalent of stage 2 in Grub Legacy) parses /boot/grub/grub.cfg, optionally loads modules (eg. for graphical UI) and shows the menu.

GNU GRUB Manual 2.04: GRUB only Offers a Rescue Shell

GRUB’s normal start-up procedure involves setting the prefix environment variable to a value set in the core image by grub-install, setting the root variable to match, loading the normal module from the prefix, and running the normal command (see normal).

This command is responsible for reading /boot/grub/grub.cfg, running the menu, and doing all the useful things GRUB is supposed to do.


GNU GRUB Manual 2.04: GRUB Image Files (2019-10-21)

*.mod. Everything else in GRUB resides in dynamically loadable modules. These are often loaded automatically, or built into the core image (core.img) if they are essential, but may also be loaded manually using the insmod command.

Once boot options have been selected, GRUB loads the selected kernel into memory and passes control to the kernel. Alternatively, GRUB can pass control of the boot process to another boot loader.


Comments (2019-10-20)

Actually, the path is /boot/grub2/ rather than /boot/grub/.


GNU GRUB Manual 2.04: GRUB Image Files (2019-10-20)

For GRUB Legacy users

stage1. Stage 1 from GRUB Legacy was very similar to boot.img in GRUB 2, and they serve the same function.

*_stage1_5. Stage 1.5’s function was to include enough filesystem code to allow the much larger Stage 2 to be read from an ordinary filesystem. In this respect, its function was similar to core.img in GRUB 2.

However, core.img is much more capable than Stage 1.5 was; since it offers a rescue shell, it is sometimes possible to recover manually in the event that it is unable to load any other modules.

stage2. GRUB 2 has no single Stage 2 image. Instead, it loads modules from /boot/grub at run-time.


According to <<深度探索 Linux 操作系统 - 系统构建和原理解析>> (2019-03-19)

GRUB 将映像分为三个部分, MBR 中的 boot.img, 嵌入空闲扇区的 core.img, 存储在文件系统中的模块. 三个部分分别对应 GRUB 执行的三个阶段 (Updated on 2019-03-26):

boot.img, 446 Byte (MBR 占用 64 Byte, 最后 2 Byte 引导标识, 512 - 64 - 2); 仅记录 core.img 的第一个扇区号, 并仅将此扇区号对应的内容加载进内存, core.img 其余部分的加载交给 core.img 的第一个扇区的代码 (Updated on 2019-03-26);core.img, 不超过 31 KB (每个磁道 63 个扇区, MBR 占据第 1 个, 其余 62 * 512 = 31 KB), 包含硬件及文件系统的驱动, 一旦载入, 即可访问文件系统 (Updated on 2019-03-26);存储在文件系统中的模块 (根据 Wiki: GNU GRUB 中的描述, 此部分内容应该是保存在 /boot/grub2/<platform>/, Updated on 2019-10-25).

Comments (2019-03-26)

原著中 “一个柱面最多包含 63 个扇区” 的描述是错误的. 应为 “一个磁道最多包含 63 个扇区”.


Kernel Phase

Wiki: Linux Startup Process: Kernel Phase (Updated on 2019-10-21)

Kernel Loading Stage (2019-10-21)

In the first stage, the kernel (as a compressed image file, /boot/vmlinuz-$(uname -r), Updated on 2019-10-22) is loaded into memory and decompressed, and a few fundamental functions such as basic memory management are set up. Control is then switched one final time to the main kernel start process.


IBM: Inside the Linux Boot Process (2019-10-27)

The kernel image isn’t so much an executable kernel, but a compressed kernel image. Typically this is a zImage (compressed image, less than 512KB) or a bzImage (big compressed image, greater than 512KB), that has been previously compressed with zlib.

At the head of this kernel image is a routine that does some minimal amount of hardware setup and then decompresses the kernel contained within the kernel image and places it into high memory. If an initial RAM disk image is present, this routine moves it into memory and notes it for later use. The routine then calls the kernel and the kernel boot begins.

Kernel Startup Stage

About why to use “initial ram disk” (such as initramfs or initrd), see below “Initial RAM Disk” section (2019-10-27).


Wiki: Linux Startup Process: Kernel Phase (2019-10-23)

The startup function for the kernel (also called the swapper or process 0) establishes memory management (paging tables and memory paging), detects the type of CPU and any additional functionality such as floating point capabilities, and then switches to non-architecture specific Linux kernel functionality via a call to start_kernel().


IBM: Inside the Linux Boot Process (2019-10-23)

With the call to start_kernel, a long list of initialization functions are called to set up interrupts, perform further memory configuration, and load the initial RAM disk. In the end, a call is made to kernel_thread (in arch/i386/kernel/process.c) to start the init function, which is the first user-space process.


Wiki: Linux Startup Process: Kernel Phase (2019-10-23)

start_kernel executes a wide range of initialization functions. It sets up interrupt handling (IRQs), further configures memory, starts the init process (the first user-space process), and then starts the idle task via cpu_idle().


Wiki: Initial RAM Disk (2019-10-23)

At the end of its boot sequence, the kernel tries to determine the format of the image from its first few blocks of data, which can lead either to the initrd or initramfs scheme.

In the initramfs scheme (available since the Linux kernel 2.6.13), the image may be a cpio archive (optionally compressed). The archive is unpacked by the kernel into a special instance of a tmpfs that becomes the initial root file system. This scheme has the advantage of not requiring an intermediate file system or block drivers to be compiled into the kernel.

In the initramfs scheme, the kernel executes /init as its first process that is not expected to exit.

For more information about initramfs, see below.

Mount Preparations (2019-10-23)

Some distributions (for others, read the original) use an event-driven hotplug agent such as udev, which invokes helper programs as hardware devices, disk partitions and storage volumes matching certain rules come online. This allows discovery to run in parallel, and to progressively cascade into arbitrary nestings of LVM, RAID or encryption to get at the root file system.

When the root file system finally becomes visible, any maintenance tasks that cannot run on a mounted root file system are done, the root file system is mounted read-only, and any processes that must continue running (such as the splash screen helper and its command FIFO) are hoisted into the newly mounted root file system.

The final root file system cannot simply be mounted over /, since that would make the scripts and tools on the initial root file system inaccessible for any final cleanup tasks:

On an initramfs (for initrd, read the original), the initial root file system cannot be rotated away. Instead, it is simply emptied and the final root file system mounted over the top.

Init Process (2019-10-23)

Wiki: Linux Startup Process: Kernel Phase (2019-10-23)

Once the kernel has started, it starts the init process. Historically this was the “SysV init”, which was just called “init”. More recent Linux distributions are likely to use one of the more modern alternatives such as “systemd”.


Wiki: Init (2019-10-31)

In Unix-based computer operating systems, init (short for initialization) is the first process started during booting of the computer system. init is a daemon process that continues running until the system is shut down. It is the direct or indirect ancestor of all other processes and automatically adopts all orphaned processes. init is started by the kernel during the booting process; a kernel panic will occur if the kernel is unable to start it. init is typically assigned process identifier 1.


systemd (2019-10-24)

man 1 systemd (2019-10-25)

systemd is a suite of basic building blocks for a Linux system. It provides a system and service manager that runs as PID 1 and starts the rest of the system.


[systemd System and Service Manager]](https://freedesktop.org/wiki/Software/systemd/) (2019-10-25)

systemd provides aggressive parallelization capabilities, uses socket and D-Bus activation for starting services, offers on-demand starting of daemons, keeps track of processes using Linux control groups, maintains mount and automount points, and implements an elaborate transactional dependency-based service control logic.


man 7 bootup (2019-10-25)

On systemd(1) systems, this process is split up in various discrete steps which are exposed as target units. The boot-up process is highly parallelized so that the order in which specific target units are reached is not deterministic, but still adheres to a limited amount of ordering structure.

When systemd starts up the system, it will activate all units that are dependencies of default.target (as well as recursively all dependencies of these dependencies). Usually, default.target is simply an alias of graphical.target or multi-user.target.

timers.target is pulled-in by basic.target asynchronously. This allows timers units to depend on services which become only available later in boot.

local-fs-pre.target | v (various mounts and (various swap (various cryptsetup fsck services...) devices...) devices...) (various low-level (various low-level | | | services: udevd, API VFS mounts: v v v tmpfiles, random mqueue, configfs, local-fs.target swap.target cryptsetup.target seed, sysctl, ...) debugfs, ...) | | | | | \__________________|_________________ | ___________________|____________________/ \|/ v sysinit.target | ____________________________________/|\________________________________________ / | | | \ | | | | | v v | v v (various (various | (various rescue.service timers...) paths...) | sockets...) | | | | | v v v | v rescue.target timers.target paths.target | sockets.target | | | | v \_________________ | ___________________/ \|/ v basic.target | ____________________________________/| emergency.service / | | | | | | v v v v emergency.target display- (various system (various system manager.service services services) | required for | | graphical UIs) v | | multi-user.target | | | \_________________ | _________________/ \|/ v graphical.target

udev (2019-10-25)

Chapter 4.6.2 of 深度探索_Linux_操作系统_系统构建和原理解析_扫描版.pdf.

udev 是用户空间动态管理设备的机制, 包括加载驱动, 管理设备节点等. udev 机制的核心是其服务进程 udevd. 当启动过程进入用户空间阶段后, udevd 将被启动. udevd 启动后, 首先读取并分析所有的规则文件, 并将其缓存在内存中.

每当动态的增加, 删除或者改变某个规则文件时, udevd 将更新其缓存在内存中的规则. 然后, udevd 通过 NETLINK 协议, 监听并处理来自内核的 uevent 事件. 每当 udevd 收到一个内核的 uevent, udevd 均创建一个单独的子进程处理 uevent.


处理冷插拔设备 (2019-10-25)

Chapter 4.6.3 of 深度探索_Linux_操作系统_系统构建和原理解析_扫描版.pdf.

对于磁盘这种非热挺拔设备, 如果驱动没有编译进内核, 那么当内核引导枚举设备时, 系统运行在内核空间, 尚未进入用户空间, 更谈不上启动用户空间的 udev 服务了, 因此内核发送到用户空间的 uevent 自然会被丢弃.

开发人员基于 sys 文件系统设计了一种巧妙的机制. 在 Linux 操作系统进入用户空间, udevd 启动后, 通过 sys 文件系统请求内核重新发出 uevent. 此时 udev 已经启动, 就会收到 uevent, 然后结合这些事件和规则, 完成驱动的加载, 设备节点的建立等.

SysV-style (2019-10-31)

Wiki: Linux Startup Process: Kernel Phase

Init’s job is “to get everything running the way it should be” once the kernel is fully running. Essentially it establishes and operates the entire user space. This includes checking and mounting file systems, starting up necessary user services, and ultimately switching to a user-environment when system startup is completed.

In a standard Linux system, init is executed with a parameter, known as a runlevel, that takes a value from 0 to 6, and that determines which subsystems are to be made operational. Each runlevel has its own scripts which codify the various processes involved in setting up or leaving the given runlevel, and it is these scripts which are referenced as necessary in the boot process. init scripts are typically held in directories with names such as “/etc/rc...”. The top level configuration file for init is at /etc/inittab.

During system boot, it checks whether a default runlevel is specified in /etc/inittab, and requests the runlevel to enter via the system console if not. It then proceeds to run all the relevant boot scripts for the given runlevel, including loading modules, checking the integrity of the root file system (which was mounted read-only) and then remounting it for full read-write access, and sets up the network.

After it has spawned all of the processes specified, init goes dormant, and waits for one of three events to happen: processes that started to end or die, a power failure signal, or a request via /sbin/telinit to further change the runlevel.


Wiki: Init (2019-10-31)

At any moment, a running System V is in one of the predetermined number of states, called runlevels. At least one runlevel is the normal operating state of the system; typically, other runlevels represent single-user mode (used for repairing a faulty system), system shutdown, and various other states. Switching from one runlevel to another causes a per-runlevel set of scripts to be run, which typically mount filesystems, start or stop daemons, start or stop the X Window System, shutdown the machine, etc.

Terminology (2019-10-22)

High Memory (2019-10-27)

Wiki: High Memory

High memory is the part of physical memory in a computer which is not directly mapped by the page tables of its operating system kernel.

Some operating system kernels, such as Linux, divide their virtual address space into two regions, devoting the larger to user space and the smaller to the kernel. In current 32-bit x86 computers, this commonly (although does not have to, as this is a configurable option) takes the form of a 3GB/1GB split of the 4GB address space, so kernel virtual addresses start at 0xC0000000 and go to 0xFFFFFFFF. The lower 896 MB, from 0xC0000000 to 0xF7FFFFFF, is directly mapped to the kernel physical address space, and the remaining 128 MB, from 0xF8000000 to 0xFFFFFFFF, is used on demand by the kernel to be mapped to high memory.

When in user mode, translations are only effective for the first region, thus protecting the kernel from user programs, but when in kernel mode, translations are effective for both regions, thus giving the kernel an easy way to refer to the buffers of processes—it just uses the process’ own mappings.


Comments (2019-10-27)

To fully understand this section, read What are high memory and low memory on Linux?

Every kernel process can also access the user space range if it wishes to. And to achieve this, the kernel maps an address from the user space (the high memory) to its kernel space (the low memory), the 128 MB mentioned above are especially reserved for this.


However, if the kernel needs to refer to physical memory for which a userspace translation has not already been provided, it has only 1GB (for example) of virtual memory to use. On computers with a lot of physical memory (32-bit CentOS/RHEL 5.x with 8GB Memory), this can mean that there exists memory that the kernel cannot refer to directly—this is called high memory. When the kernel wishes to address high memory, it creates a mapping on the fly and destroys the mapping when done, which incurs a performance penalty.


Comments (2019-10-27)

That is the reason why 32-bit linux can use memory more than 4GB. But there are some situations should be taken into account.

See LinuxMM: HighMemory (2019-10-27).


Initial RAM Disk (2019-10-27)

Wiki: Initial RAM Disk

In computing (specifically in regards to Linux computing), initrd (initial ramdisk) is a scheme for loading a temporary root file system into memory, which may be used as part of the Linux startup process.

initrd and initramfs refer to two different methods of achieving this. Both are commonly used to make preparations before the real root file system can be mounted.


Comments (2019-10-27)

I think “Initial Root Filesystem” is the same as “Initial RAM Disk”.


Rationale (2019-10-27)

Many Linux distributions ship a single, generic Linux kernel image – one that the distribution’s developers create specifically to boot on a wide variety of hardware. The device drivers for this generic kernel image are included as loadable kernel modules because statically compiling many drivers into one kernel causes the kernel image to be much larger, perhaps too large to boot on computers with limited memory. This then raises the problem of detecting and loading the modules necessary to mount the root file system at boot time, or for that matter, deducing where or what the root file system is.

To further complicate matters, the root file system may be on a software RAID volume, LVM, NFS (on diskless workstations), or on an encrypted partition. All of these require special preparations to mount.

Another complication is kernel support for hibernation, which suspends the computer to disk by dumping an image of the entire contents of memory to a swap partition or a regular file, then powering off. On next boot, this image has to be made accessible before it can be loaded back into memory.

To avoid having to hardcode handling for so many special cases into the kernel, an initial boot stage with a temporary root file-system — now dubbed early user space — is used. This root file-system can contain user-space helpers which do the hardware detection, module loading and device discovery necessary to get the real root file-system mounted.


Implementation (2019-10-29)

An image of this initial root file system (along with the kernel image) must be stored somewhere accessible by the Linux bootloader or the boot firmware of the computer. This can be the root file system itself, a boot image on an optical disc, a small partition on a local disk (a boot partition, usually using ext2 or FAT file systems), or a TFTP server (on systems that can boot from Ethernet).

The bootloader will load the kernel and initial root file system image into memory and then start the kernel, passing in the memory address of the image. At the end of its boot sequence, the kernel tries to determine the format of the image from its first few blocks of data, which can lead either to the initrd or initramfs scheme (see below).

initramfs, ramfs and rootfs


Comments (2019-10-23)

Summarized according to the references below.

What is initramfs?

initramfs is gzipped cpio format archive.

What does initramfs include? How to decompress it?

The initramfs is a complete set of directories that you would find on a normal root filesystem.

By default, the initramfs archive only includes the drivers that are needed for your specific computer (this allows the archive to be smaller and decreases the time that it takes for your computer to boot).

There are also several basic Unix commands included in the archive such as cat and cp.

What the purpose of using an initramfs?

The only purpose of an initramfs is to mount the root filesystem.

For most distributions, kernel modules are the biggest reason to have an initramfs. In a general distribution, there are many unknowns such as file system types and disk layouts.

How does it work?

At boot time, the boot loader loads the kernel and the initramfs image into memory and starts the kernel. The kernel checks for the presence of the initramfs and, if found, mounts it as / and runs /init. The init program is typically a shell script.


At boot time, the kernel performs the followings:

If an uncompressed cpio archive exists at the start of the initramfs, extract and load the microcode from it to CPU.If an uncompressed cpio archive exists at the start of the initramfs, skip that and set the rest of file as the basic initramfs. Otherwise, treat the whole initramfs as the basic initramfs.Unpack the basic initramfs by treating it as compressed (currently gzipped) cpio archive into a RAM-based disk.mount and use the RAM-based disk as the initial root filesystem.

深度探索_Linux_操作系统_系统构建和原理解析_扫描版.pdf (2019-10-25)

initramfs 是一个临时的文件系统, 其中包含了必要的设备, 如硬盘, 网卡, 文件系统等的驱动以及加载驱动的工具及其运行环境, 如基本的 C 库, 动态库的链接加载器等. 同时, 那些处理根文件系统在 RAID, 网络设备上的程序也存放在 initramfs 中. 由第三方程序 (如 BootLoader) 负责将 initramfs 从硬盘装载进内存.

以驱动硬盘为例, Kernel 就不必再从硬盘, 而是从已经加载到内存的 initramfs 中获取硬盘控制器等相关驱动了, 继而可以驱动硬盘, 访问硬盘上的根文件系统.

在初始化的最后, 内核运行 initramfs 中的 init 程序, 该程序将探测硬件设备, 加载驱动, 挂载真正的文件系统, 执行文件系统上的 /sbin/init, 进而切换到真正的用户空间.

真正的文件系统挂载后, initramfs 即完成了使用, 其占用的内存也会被释放.


initramfs 的重要作用之一就是允许将保存根文件系统的存储设备的驱动不再编译进内核.


Comments (2019-10-25)

NOTE here, "initramfs 中的 init" is different from "真正的文件系统上的 /sbin/init".


/usr/share/doc/kernel-doc-3.10.0/Documentation/filesystems/ramfs-rootfs-initramfs.txt (2019-10-22).

All 2.6 Linux kernels contain a gzipped cpio format archive, which is extracted into rootfs when the kernel boots up. After extracting, the kernel checks to see if rootfs contains a file init, and if so it executes it as PID 1. If found, this init process is responsible for bringing the system the rest of the way up, including locating and mounting the real root device (if any).

If rootfs does not contain an init program after the embedded cpio archive is extracted into it, the kernel will fall through to the older code to locate and mount a root partition, then exec some variant of /sbin/init out of that.

About ramfs and rootfs, read /usr/share/doc/kernel-doc-3.10.0/Documentation/filesystems/ramfs-rootfs-initramfs.txt (2019-10-22).


Beyond Linux® From Scratch (systemd Edition): About initramfs (2019-10-22)

The only purpose of an initramfs is to mount the root filesystem. The initramfs is a complete set of directories that you would find on a normal root filesystem. It is bundled into a single cpio archive and compressed with one of several compression algorithms.

At boot time, the boot loader loads the kernel and the initramfs image into memory and starts the kernel. The kernel checks for the presence of the initramfs and, if found, mounts it as / and runs /init. The init program is typically a shell script.

For most distributions, kernel modules are the biggest reason to have an initramfs. In a general distribution, there are many unknowns such as file system types and disk layouts.


InitRAMFS, Dracut, and the Dracut Emergency Shell (2019-10-22)

Initramfs stands for Initial Random-Access Memory File System. On modern Linux systems, it is typically stored in a file under the /boot directory. The kernel version for which it was built will be included in the file name. A new initramfs is generated every time a new kernel is installed.

There are also several basic Unix commands included in the archive such as cat and cp.


Comments

To get the full list of these commands, use lsinitrd | grep -o "usr/bin/.*".


By default, the initramfs archive only includes the drivers that are needed for your specific computer. This allows the archive to be smaller and decreases the time that it takes for your computer to boot.


Debian: Wiki: How initramfs Works (2019-10-22)

LWN: Initramfs Arrives (2019-10-23)

The basic initramfs is the root filesystem image used for booting the kernel provided as a compressed cpio archive.

Recently, this basic initramfs image may be prepended with an uncompressed cpio archive holding the microcode data loaded very early in the boot process.

At boot time, the kernel performs the followings:

If an uncompressed cpio archive exists at the start of the initramfs, extract and load the microcode from it to CPU.If an uncompressed cpio archive exists at the start of the initramfs, skip that and set the rest of file as the basic initramfs. Otherwise, treat the whole initramfs as the basic initramfs.Unpack the basic initramfs by treating it as compressed (currently gzipped) cpio archive into a RAM-based disk.mount and use the RAM-based disk as the initial root filesystem.

Much of the kernel initialization and bootstrap code can then be moved into this disk and run in user mode. Tasks like finding the real root disk, boot-time networking setup, handling of initrd-style ramdisks, ACPI setup, etc. will be shifted out of the kernel in this way (2019-10-23).

An obvious advantage of this scheme is that the size of the kernel code itself can shrink. That does not free memory for a running system, since the Linux kernel already dumps initialization code when it is no longer needed. But a smaller code base for the kernel itself makes the whole thing a little easier to maintain, and that is always a good thing (2019-10-23).


Comments (2019-10-22)

The first two points do not conflict, the relationship of them is “then”.


dracut (2019-10-22)

InitRAMFS, Dracut, and the Dracut Emergency Shell

Dracut is a tool that is used to manage the initramfs.

Refer to Usage\command\kernel\dracut.md.

Dracut Emergency Shell (2019-10-22)

InitRAMFS, Dracut, and the Dracut Emergency Shell

The dracut emergency shell is an interactive mode that can be initiated while the initramfs is loaded.

initrd

initrd

initrd is a temporary root file system that is mounted during system boot to support a two stages boot process. The initrd file contains various executables and drivers that can be used to mount the actual root file system. After doing that, unload this initrd RAM disk and free up memory.


Wiki: Initial RAM Disk (2019-10-29)

In the initrd scheme, the image may be a file system image (optionally compressed), which is made available in a special block device (/dev/ram) that is then mounted as the initial root file system. The driver for that file system must be compiled statically into the kernel.

Many distributions originally used compressed ext2 file system images, while the others (including Debian 3.1) used cramfs in order to boot on memory-limited systems, since the cramfs image can be mounted in-place without requiring extra space for decompression.

Once the initial root file system is up, the kernel executes /linuxrc as its first process; when it exits, the kernel assumes that the real root file system has been mounted and executes /sbin/init to begin the normal user-space boot process.


Mount preparations (2019-10-30)

Other Linux distributions (such as Fedora and Ubuntu) generate a more generic initrd image. These start only with the device name of the root file system (or its UUID) and must discover everything else at boot time. In this case, the software must perform a complex cascade of tasks to get the root file system mounted:

Any hardware drivers that the boot process depends on must be loaded. A common arrangement is to pack kernel modules for common storage devices onto the initrd and then invoke a hotplug agent to pull in modules matching the computer’s detected hardware.On systems which display a boot splash screen, the video hardware must be initialized and a user-space helper started to paint animations onto the display in lockstep with the boot process.If the root file system is on NFS, it must then bring up the primary network interface, invoke a DHCP client, with which it can obtain a DHCP lease, extract the name of the NFS share and the address of the NFS server from the lease, and mount the NFS share.If the root file system appears to be on a software RAID device, there is no way of knowing which devices the RAID volume spans; the standard MD utilities must be invoked to scan all available block devices and bring the required ones online.If the root file system appears to be on a logical volume, the LVM utilities must be invoked to scan for and activate the volume group containing it.If the root file system is on an encrypted block device, the software needs to invoke a helper script to prompt the user to type in a passphrase and/or insert a hardware token (such as a smart card or a USB security dongle), and then create a decryption target with the device mapper.

When the root file system finally becomes visible, any maintenance tasks that cannot run on a mounted root file system are done, the root file system is mounted read-only, and any processes that must continue running (such as the splash screen helper and its command FIFO) are hoisted into the newly mounted root file system.

The final root file system cannot simply be mounted over /, since that would make the scripts and tools on the initial root file system inaccessible for any final cleanup tasks:

On an initrd, the new root is mounted at a temporary mount point and rotated into place with pivot_root(8) (which was introduced specifically for this purpose). This leaves the initial root file system at a mount point (such as /initrd) where normal boot scripts can later unmount it to free up memory held by the initrd.

IBM: Inside the Linux Boot Process (2019-10-30)

During the boot of the kernel, the initial-RAM disk (initrd) that was loaded into memory by the stage 2 boot loader is copied into RAM and mounted. This initrd serves as a temporary root file system in RAM and allows the kernel to fully boot without having to mount any physical disks. Since the necessary modules needed to interface with peripherals can be part of the initrd, the kernel can be very small, but still support a large number of possible hardware configurations. After the kernel is booted, the root file system is pivoted (via pivot_root) where the initrd root file system is unmounted and the real root file system is mounted.

The initrd function allows you to create a small Linux kernel with drivers compiled as loadable modules. These loadable modules give the kernel the means to access disks and the file systems on those disks, as well as drivers for other hardware assets. Because the root file system is a file system on a disk, the initrd function provides a means of bootstrapping to gain access to the disk and mount the real root file system. In an embedded target without a hard disk, the initrd can be the final root file system, or the final root file system can be mounted via the NFS.


For more detailed information, such as which functions are called, you can find them in section “Booting with an initial RAM disk” of IBM: Linux initial RAM disk (initrd) Overview.

MBR, Master Boot Record

Wiki: Master Boot Record (2019-10-21)

A MBR is a special type of boot sector at the very beginning of partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond.

The MBR holds the information on how the logical partitions, containing file systems, are organized on that medium. The MBR also contains executable code to function as a loader for the installed operating system—usually by passing control over to the loader’s second stage, or in conjunction with each partition’s volume boot record (VBR). This MBR code is usually referred to as a boot loader.

VBR, Volume Boot Record (2019-10-21)

Wiki: Volume Boot Record (2019-10-21)

A VBR (also known as a volume boot sector, a partition boot record or a partition boot sector) is a type of boot sector introduced by the IBM Personal Computer. It may be found on a partitioned data storage device, such as a hard disk, or an unpartitioned device, such as a floppy disk, and contains machine code for bootstrapping programs (usually, but not necessarily, operating systems) stored in other parts of the device.

On non-partitioned storage devices, it is the first sector of the device. On partitioned devices, it is the first sector of an individual partition on the device, with the first sector of the entire device being a MBR containing the partition table.

The code in volume boot records is invoked either directly by the machine’s firmware or indirectly by code in the master boot record or a boot manager. Code in the MBR and VBR is in essence loaded the same way.

Splash Screen

How to change the Linux Boot Splash screen (2019-01-09):

What is a Splash Screen?

A splash screen is nothing but the picture that’s gets displayed in the background while booting the Linux operating system. You might be knowing that GRUB (GRand Unified Bootloader) is the commonly used famous bootloader among major Linux distributions. If you take RedHat as an example, it displays a blank or black background during the booting of the system.

The splash screen definitions are defined in the grub.conf file and the splash screen image file resides in the /boot partition. If you are bored of the default blank screen and want to change it to whatever you like, then just perform the steps below to change it.

Splash screen path: /boot/grub/splash.xpm.gz.

About how to change it, see Usage/How-to/How_to_change_the_Linux_Boot_Splash_screen.md.

References (2019-10-23)

深度探索_Linux_操作系统_系统构建和原理解析_扫描版.pdfIBM: Inside the Linux Boot ProcessIBM: Linux initial RAM disk (initrd) OverviewHow to change the Linux Boot Splash screeninitrdInitRAMFS, Dracut, and the Dracut Emergency ShellDebian: Wiki: How initramfs WorksLWN: Initramfs ArrivesBeyond Linux® From Scratch (systemd Edition): About initramfsOfficial: GNU GRUBGNU GRUB Manual 2.04: GRUB Image FilesWiki: BIOSWiki: BIOS Interrupt CallWiki: GNU GRUBWiki: Master Boot RecordWiki: Linux Startup ProcessWiki: Volume Boot Recordsystemd System and Service ManagerLWN: Linux Device Drivers, Third Edition (LDD3 is current as of the 2.6.10 kernel)Almesberger, Werner; Paper: Booting Linux: The History and the Future (Can be converted to PDF)
最新回复(0)