Revision: $Revision: 451 $ ($Date: 2011-03-23 13:29:59 +0100 (Wed, 23 Mar 2011) $)
This topic has a total weight of 7 points and contains the following objectives:
Candidates should be able to determine the cause of errors in loading and usage of bootloaders. GRUB and LILO are the bootloaders of interest.
A candidate should be able to recognize and identify boot loader and kernel specific stages and utilize kernel boot messages to diagnose kernel errors. This objective includes being able to identify and correct common hardware issues, and be able to determine if the problem is hardware or software.
A candidate should be able to identify, diagnose and repair local system environment.
A candidate should be able to identify common local system and user environment configuration issues and common repair techniques.
Revision: $Revision: 1.8 $
Candidate should be able to: determine, from bootup text, the 4 stages of boot sequence and distinguish between each.
Key files, terms and utilities include:
| boot loader start and hand off to kernel |
| kernel loading |
| hardware initialization and setup |
| daemon initialization and setup |
Resources: the man pages for the various commands.
The boot process has been described at length in Chapter 2, System Startup (202). This section underlines and enhances that description. We will limit our discussion to PC hardware, though most other hardware uses similar schemes.
The PC boot process is started on powerup. The processor will start execution of code contained in the Basic In- and Output System (BIOS). The BIOS is a program stored in Read Only Memory (ROM) and is a part of the PC hardware. Apart from the bootstrap code it contains routines to set up your hardware and to communicate with it. Most of the code in the BIOS is never used by Linux, but the bootstrap code is.
The bootstrap code will load a block of data from sector 0, cylinder 0 of what has been configured to be your boot drive. In most cases this will be the first floppy drive. If reading the floppy disk fails or no floppy disk was inserted, the program in the BIOS will try to load the first sector from the first (specified) hard disk. Most BIOSes allow you to set up an alternate order, i.e. to try the hard disk first or first try to boot from CD.
Wherever the data was found (and if it was found, of course) the BIOS will load it into memory and try to execute it as if it were a program. In most cases the data either consists of code from a boot loader such as LILO, or the start of an operating system kernel like Linux. If the code on the boot sector is illegible or invalid, the BIOS will try the next bootdevice.
As can be determined from the text above, there are two ways for the kernel to be loaded:
by using the kernelcode itself. The first sector of the boot disk will contain the first sector of the Linux kernel itself. That code loads the rest of the kernel from the boot device.
by using a bootstrap loader. There are 2 well-known bootstrap loaders for Linux: GRUB (GRand Unified Bootloader) and LILO. LILO is still widely used, but most modern distributions employ GRUB. GRUB has a number of advantages over LILO, such as built in knowledge of filesystems. Hence GRUB is capable of loading configuration files and the kernel directly by their filename. LILO uses a different method: the physical location (track/sector/offset) for the kernel is stored at installation time. The bootloader part of LILO doesn't need knowledge of the filesystem it is booting from.
If a bootstrap loader has been used, it will locate the kernel, load it and execute it. If the kernel has been raw-copied to a diskette its first sector also contains code that loads the rest of the kernelcode from the boot device and consequently executes it.
The kernel will initialize its internal data structures and device drivers.
Once it is completely initialized, it consults the contents of the
ramdisk word, a fixed address in its binary that
specifies where the kernel can find the filesystem that will be mounted as
root (`/', the root filesystem). The ramdisk word
also can specify that the filesystem is a RAMdisk. A RAMdisk is a memory
region that is loaded with a (optionally compressed) image of a filesystem,
and that is used if it were a hard disk. If the kernel can not find the
root filesystem it halts.
Assuming all went well the kernel now is up and running and has mounted its
root filesystem. Next, the kernel Will start up the init
program, located in either /bin or /sbin.
init uses the configuration file
/etc/inittab to determine which program(s) to start
next.
The way init is used to start up the initial processes
varies from distribution to distribution. init can be
configured in many ways. But in all cases a number of commands will be issued
to set up the basic system such as running fsck on hard
disks, initializing swapping and mount disks that are configured in
/etc/fstab. Next, a group of commands (often scripts)
are executed. They define a so called runlevel.
Such runlevels define a set of processes that need to be run to get the
system in a certain state, for example multi-user mode or single-user mode.
The initdefault entry in the /etc/inittab
file defines the initial runlevel of the system. If there is no such entry
or the configuration file was not found init will prompt
for a runlevel at the system console. Consequently, all processes specified
for that runlevel in the inittab file will be started.
In some cases the initial scripts are specified by the sysinit
label in the init, in other cases they are considered
part of a runlevel.
When a runlevel defines multi-user use, typically a number of daemons is started
next. This is done by using start-up scripts, that can be in various location,
depending on the distribution you use. Typically such start-up scripts are
located in the /etc/rc.d/ directory and named aptly after
the software they run, e.g. sendmail, inetd
or sshd. Typically, these scripts are linked to another level
of directories, one per runlevel. In all runlevels one or more getty
programs will be spawned, to enable user logins.
From what was written before you now should be able to identify the four stages of the bootsequence:
boot loader start and hand off to kernel - typically you can recognize this stage because LILO displays the four letters “L”, “I”, “L”, and “O”. Each of these letters identifies a certain stage in the initial bootprocess. Their meaning is described in more detail in the section called “LILO errors”;
kernel loading - this stage can be recognized since the kernel will
display various messages, starting with the message
“Loading ” followed by the name
of your kernel, e.g. Linux-2.2.20.
hardware initialization and setup - can be identified by various messages that inform you about the various hardware components that were found and initialized.
daemon initialization and setup - this is fairly distribution specific, but this stage can be recognized by messages that typically contain lines like “Starting the ... daemon”.
The kernel stores its messages is a ring buffer. On most Linux systems that ring buffer is flushed to a file during the last phase of the boot process for later inspection. The command dmesg will display the contents of the current buffer (and actually often is used to flush the ring buffer to a file during the last phase of the boot sequence). Check the manual page for more information.
Revision: $Revision: 1.7 $
Candidate should be able to: determine specific stage failures and corrective techniques.
Key files, terms and utilities include:
| Know meaning of L, LI, LIL, LILO, and scrolling 010101 errors |
| Know the different LILO install locations, MBR, /dev/fd0, or primary/extended partition |
/boot/boot.b |
Know significance of /boot/boot.### files |
Resources: the man pages for the various commands, Wirzenius98, Yap98.
As of this writing most BIOS's let you choose booting from hard disk, floppy, network or CDROM. To give an oversight these alternatives are outlined below. Since most systems boot from hard disk, this process was described in more detail and is elaborated on later on.
Booting from CDROM requires that your hardware support the “El Torito” standard. El Torito is a specification that says how a CDROM should be formatted such that you can directly boot from it. A bootable CDROM contains contains a floppy-disk image in its initial sectors. This image is treated like a floppy by the BIOS and booted from.
Booting from the network is done using the Boot Protocol (BOOTP) or the Dynamic Host Configuration Protocol (DHCP). DHCP actually is an evolution of BOOTP. In most cases the client has no means to address the bootserver directly, so the client broadcasts an UDP packet over the network. Any bootserver that has information about the client stored will answer. If more than one server responds, the client will select one of them. Since the requesting client does not yet have a valid IP address, the unique hardware (MAC) address of its network card is used to identify it to the BOOTP server(s) in your network. The BOOTP server(s) will issue the IP address, a hostname, the address of the server where the image of the kernel to boot can be found and the name of that image. The client configures its network accordingly and downloads the specified image from the server that was specified using the Trivial File Transfer Protocol (TFTP). TFTP is often considered to be an unsafe protocol, since there is no authentication. It uses the UDP protocol. However, its triviality also compact implementations that can be stored in a boot-ROM, for example a PC BIOS. After the kernel image has been retrieved, is will be started the usual way. Often, the root filesystem is located on another server too, and NFS is used to mount it. This requires a Linux kernel that allows the root filesystem to be NFS.
booting from floppy or disk is the common case. In previous chapters we already described the boot process used to boot from floppy. However, there is is a slight difference between floppy and hard disk boots. Both contain a bootsector, located at cylinder 0, head 0, sector 1. On a floppy the boot sector often contains just the boot code to be loaded in memory and executed.
booting from hard disk requires some additional functionality: a hard disk can contain one or more partitions, in which case the boot program needs to find out from which partition to boot. A partition in turn will contain its own bootcode sector. The sector located at cylinder 0, head 0, sector 1 is called the master boot record (MBR).
Information about hard disk partitions is typically stored in partition tables, which are data-structures stored on a special partition sector. There are various types of partition tables, for example IRIX/SGI, Sun or DOS. It depends on the hardware in use which type of partition table is used. In this book we focus on the classical PC (DOS) partition table, which is typical for PC hardware. By default Linux accepts and uses DOS partition tables. Support for other partition types can be enabled in the kernel. On PC hardware the partition table is part of the MBR. You can use the fdisk command to print out your current partition table or to create a new one.
On a PC the BIOS starts loading the first 446 bytes of cylinder 0, head 0, sector 1 into memory. These bytes comprise the boot program. That boot program is executed next. It is up to you which program to use to boot your system. By writing your own boot program you could continue the boot process any way you want. But there are many fine boot programs available, for example the DOS loader and the Linux Loader (LILO). Alternately, you can use another boot loader program, for example GRUB. Sometimes a Windows boot loader is used (i.e. Bootmagic), or even the old fashioned DOS boot loader.
DOS for example uses a loader programs that scans the partition table for a bootable partition. When an entry marked “active” was found the first sector of that partition is loaded into memory and executed. That code in turn continues the loading of the operating system.
Linux can install a loader program too. Often this will be LILO, the Linux
Loader. LILO uses a two-stage approach: the boot
sector has a boot program, that loads a boot file, the second stage boot
program. That program presents you with a simple menu-like interface, which
either prompts you for the operating system to load or optionally times out and loads
the the default system.
Note, that the code in the MBR is limited: it does not have any knowledge about concepts like
“filesystems” let alone “filenames”. It can access the hard disk, but needs the BIOS
to do so. And the BIOS is not capable of understanding anything but CHS
(Cylinder/Heads/Sectors). Hence, to find its boot program, the code in the MBR needs
exact specification of the CHS to use to find it. These specifications are figured
out by /sbin/lilo, when it installs the boot sector.
The second stage LILO boot program needs information about the following items:
where /boot/boot.b can be found; it contains
the second stage boot program. The second stage program will be loaded
by the initial boot program in the MBR;
the /boot/map file, which contains
information about the location of kernels, boot sectors etc.; this information
is used mostly by the second stage boot program; see below for a more
detailed description of the map file;
the location of the startup message, if one has been defined
Remember, to be able to access these
files, the BIOS needs the CHS (Cylinder/Head/Sector) information to load
the proper block. This also holds true for the code in the second stage
loader. LILO therefore needs a so called map
file, that maps filenames into CHS values. This file contains
information for all files that LILO needs to know of during boot, for example
locations of the kernel(s), the command line to execute on boot, and more.
The default name for the map file is /boot/map.
/sbin/lilo uses the file /etc/lilo.conf
to determine what files to map and what bootprogram to use and creates a
mapfile accordingly.
The DOS partition table is embedded in the MBR at cylinder 0, head 0, sector 1, at offset 447 (0x1BF) and on. There are four entries in a DOS partition table. Only one of them can be marked as active: the boot program normally will load the first sector of the active partition in memory and deliver control to it.
An entry in the partition table contains 16 bytes, as shown in the following figure:
Figure 13.1. A (DOS) partition table entry
|boot? ||start ||type ||partition | | ||cyl |head |sect|| ||cyl |head |sect| |------||--------||------||----||------||--------||------||----| |start in LBA ||size in sectors | |------||------||------||------||------||------||------||------|
As you can see, each partition entry contains the start and end location of the partition specified as the Cylinder/Head/Sector of the hard disk. Note, that the “Cylinder” field has 10 bits, therefore the maximum number of sectors that can be specified is (2^10==) 1024. BIOSes traditionally use CHS specifications hence older BIOSes are not capable of accessing data stored beyond the first 1024 cylinders of the disk.
As disks grew in size the partition/disk sizes could not be properly expressed using the limited capacity of the CHS fields anymore. An alternate method of addressing blocks on a hard disk was introduced: Logical Block Addressing (LBA). LBA addressing specifies sections of the disk by their block number relative to 0. A block can be seen as a 512 byte sector. The last 64 bits in a partition table entry contain the “begin” and “end” of that partition specified as LBA address of the begin of the partition and the number of sectors.
Remember that your computer boots using the BIOS disk access routines. Hence, if
your BIOS does not cope with LBA addressing you may not be able to boot from partitions
beyond the 1024 cylinder boundary. For this reason people with large disks often
create a small partition somewhere within the 1024 cylinder boundary, usually mounted
on /boot and put the boot program and kernel in there, so BIOS
can boot Linux from hard disk. Once loaded, Linux ignores the BIOS - it has its own
disk access procedures which are capable of handling huge disks.
The “type” field contains the type of the partition, which usually relates to the purpose the partition was intended for. To give an impression of the various types of partitions available, a screen dump of the “L”ist command within fdisk follows:
0 Empty 17 Hidden HPFS/NTF 5c Priam Edisk a6 OpenBSD 1 FAT12 18 AST Windows swa 61 SpeedStor a7 NeXTSTEP 2 XENIX root 1b Hidden Win95 FA 63 GNU HURD or Sys b7 BSDI fs 3 XENIX usr 1c Hidden Win95 FA 64 Novell Netware b8 BSDI swap 4 FAT16 <32M 1e Hidden Win95 FA 65 Novell Netware c1 DRDOS/sec (FAT- 5 Extended 24 NEC DOS 70 DiskSecure Mult c4 DRDOS/sec (FAT- 6 FAT16 3c PartitionMagic 75 PC/IX c6 DRDOS/sec (FAT- 7 HPFS/NTFS 40 Venix 80286 80 Old Minix c7 Syrinx 8 AIX 41 PPC PReP Boot 81 Minix / old Lin db CP/M / CTOS / . 9 AIX bootable 42 SFS 82 Linux swap e1 DOS access a OS/2 Boot Manag 4d QNX4.x 83 Linux e3 DOS R/O b Win95 FAT32 4e QNX4.x 2nd part 84 OS/2 hidden C: e4 SpeedStor c Win95 FAT32 (LB 4f QNX4.x 3rd part 85 Linux extended eb BeOS fs e Win95 FAT16 (LB 50 OnTrack DM 86 NTFS volume set f1 SpeedStor f Win95 Ext'd (LB 51 OnTrack DM6 Aux 87 NTFS volume set f4 SpeedStor 10 OPUS 52 CP/M 93 Amoeba f2 DOS secondary 11 Hidden FAT12 53 OnTrack DM6 Aux 94 Amoeba BBT fd Linux raid auto 12 Compaq diagnost 54 OnTrackDM6 a0 IBM Thinkpad hi fe LANstep 14 Hidden FAT16 <3 55 EZ-Drive a5 BSD/386 ff BBT 16 Hidden FAT16 56 Golden Bow
The design limitation that imposes a maximum of four partitions proved to be troublesome as disks grew larger and larger. Therefore, a work-around was invented: by specifying one of the partitions as a “DOS Extended partition” it in effect becomes a container for more partitions aptly named logical partitions. The “Extended partition” can be regarded as a container, that holds one or more logical partitions. The total size of all logical partitions within the extended partition can never exceed the size of that extended partition.
In principle Linux lets you create as many logical partitions as you want, of course restricted by the physical boundaries of the extended partition and hardware limitations. The logical partitions are described in a linked list of sectors. The four primary partitions, present or not, get numbers 1-4. Logical partitions start numbering from 5. The main disk contains a partition table that describes the partitions, the extended partitions contain logical partitions that in turn contain a partition table that describes a logical partition and a pointer to the next logical partitions partition table, see the ASCII art below:
LILO's first stage loader program can either be put in the MBR, or it can be put in any partitions boot sector. Of course, you could put it in both locations if you wanted to, for example in the MBR to decide whether to boot Windows, DOS or Linux and if Linux is booted, its boot sector could contain LILO's primary loader too, which would for example enable you to choose between different versions/configurations of the kernel.
The tandem “Linux and Windows” is frequently used to ease the migration of services to the Linux platform or to enable both Linux and Windows to run on the same computer. To dual boot Linux and Windows 95/98, you can install LILO on the master boot record. Windows NT and Windows 2000 require their own loader in the MBR. In these case, you can install LILO in the Linux partition as a secondary boot loader. The initial boot will be done by the Windows loader in the MBR, which then can transfer control to LILO.
/sbin/lilo can create the bootprogram in the MBR or in the
first sectors of a partition. The bootprogram, sometimes referred to as
the first stage loader will try to load the
second stage boot loader. The seconds stage bootloader is
contained in a file on the boot partition of your Linux system, by default it
is in the file /boot/boot.b.
If you use /sbin/lilo to write the bootprogram it will
try to make a backup copy of the old contents of the bootsector and will
write the old contents in a file named /boot/boot.####.
The hash symbols are actually replaced by the major and minor numbers of the
device where the original bootsector used to be, for example, the backup
copy of the MBR on the first IDE disk would be stored as
/boot/boot.0300: 3 is the major number for the device
file /dev/hda, and 0 is the minor number for it.
/sbin/lilo will not overwrite an already existing
backup file.
When LILO loads itself, it displays the word
LILO
Each letter is printed before or after performing some specific action. If LILO fails at some point, the letters printed so far can be used to identify the problem.
(nothing)
No part of LILO has been loaded. Either LILO isn't installed or the partition on which its boot sector is located isn't active.
L error
The first stage boot loader has been loaded and started, but it can't load the second stage boot loader. The two-digit error codes indicate the type of problem. This condition usually indicates a media failure or a geometry mismatch. The most frequent causes for a geometry mismatch are not physical defects or invalid partition tables but errors during the installation of LILO. Often these are caused by ignoring the 1024 cylinder boundary.
This error code signals a transient problem - in that case LILO will try to resume or halt the system. However, sometimes the error code is not transient and LILO will repeat it, over and over again. This means that you end up with a scrolling screen that contains just the error codes. For example: the error code “01” signifies an “illegal command”. This signifies that the disk type is not supported by your BIOS or that the geometry can not correctly be determined. Other error codes are described in full in the LILO's user documentation.
LI
The first stage boot loader was able to load the second stage boot loader, but
has failed to execute it. This can either be caused by a geometry mismatch or
by moving /boot/boot.b without running the map installer.
LIL
The second stage boot loader has been started, but it can't load the descriptor table from the map file. This is typically caused by a media failure or by a geometry mismatch.
LIL?
The second stage boot loader has been loaded at an incorrect address. This is
typically caused by a subtle geometry mismatch or by moving
/boot/boot.b without running the map installer.
LIL-
The descriptor table is corrupt. This can either be caused by a geometry
mismatch or by moving /boot/map without running the map
installer.
LILO