George Claghorn

Switching from the legacy 8259 PIC to the modern APIC

The Writing an OS in Rust series covers enabling periodic timer interrupts with the 8259 PIC. As it explains, the 8259 is superseded by the APIC in modern x86-64 processors:

The Intel 8259 is a programmable interrupt controller (PIC) introduced in 1976. It has long been replaced by the newer APIC, but its interface is still supported on current systems for backwards compatibility reasons. The 8259 PIC is significantly easier to set up than the APIC, so we will use it to introduce ourselves to interrupts before we switch to the APIC in a later post.

Once I got timer interrupts working with the 8259 per the tutorial, I read up on using the APIC—particularly Xv6’s LAPIC initialization code—and decided to make the switch right away.

Mapping the LAPIC register file into virtual memory

The local APIC is configured by writing to memory-mapped registers. By default, the 1 KB LAPIC register file begins at physical address 0xFEE00000, near the top of the 32-bit address space. Because paging is enabled—it’s required in long mode—we need to map physical addresses 0xFEE00000 through 0xFEE00400 into virtual memory.

Georgix’s bootstrap code identity-maps the first gigabyte of physical memory using a three-level page table and 2 MB pages:

    # Add one entry to the top-level page table (the Page Map Level 4 Table)
    # pointing to the middle level (the Page Directory Pointer Table).
    movl $boot.page_directory_pointer_table, %eax
    orl $0b11, %eax
    movl %eax, boot.page_map_level_4_table

    # Add one entry to the PDPT pointing to the bottom level (the Page Directory Table).
    movl $boot.page_directory_table, %eax
    orl $0b11, %eax
    movl %eax, boot.page_directory_pointer_table

    # Populate the PDT with 512 entries, each pointing to a 2 MB physical page frame.
    movl $0, %ecx
1:  movl $0x200000, %eax
    mull %ecx
    orl $0b10000011, %eax
    movl %eax, boot.page_directory_table(,%ecx,8)
    inc %ecx
    cmpl $512, %ecx
    jne 1b

(Yes, the top level in a three-level page table is called the Page Map Level 4 Table. x86-64 supports four-level page tables with 4 KB pages. The literature refers to the top-level page table as the Page Map Level 4 Table even when three-level page tables are used. 🤷🏻‍♂️)

We’ll identity-map the LAPIC register file like the first gigabyte. Virtual address 0xFEE00000 will map to physical address 0xFEE00000, virtual address 0xFEE00001 to physical address 0xFEE00001, and so on. Conveniently, 0xFEE00000 is aligned to the beginning of a 2 MB page—it’s evenly divisible by 2 MB, in other words—so we only need to map a single page.

Per the AMD64 system programming manual, section 5.3.4, with 2 MB pages:

  • Bits 39–47 of a virtual address index into the Page Map Level 4 Table.
  • Bits 30–38 index into the Page Directory Pointer Table.
  • Bits 21–29 index into the Page Directory Table.
  • Bits 0–28 are the byte offset into the physical page.

Virtual address 0xFEE00000 in binary is 0b11111110 11100000 00000000 00000000. Bits are numbered right to left, or least significant to most significant. The rightmost bit is bit 0 and the leftmost bit is bit 31. Virtual addresses in 64-bit mode are 64 bits long, so bits 32–63 are all zeroes.

Bits 39–47, the PML4T index, are 0, indicating the first PML4 entry (PML4E). The first PML4E and the corresponding PDPT are already set up.

Bits 30–38, the PDPT index, are 0b000000011, or 3 in decimal, indicating the fourth PDP entry (PDPE). This PDPE doesn’t exist, so we must add it and point it at a new Page Directory Table. We reserve space for a second 4 KB PDT:

.bss
.align 4096

# ...

boot.page_directory_table:
    # Was `.skip 4096`:
    .skip 2 * 4096

Each page table entry is a quadword, or 8 bytes, so we populate the entry at byte offset 24 into the PDPT (index 3 × 8 bytes) with the address of the new PDT:

movl $(boot.page_directory_table + 0x1000), %eax
orl $0b11, %eax
movl %eax, boot.page_directory_pointer_table + 24

Bits 21–29 of the virtual address are 0b111110111, or 503 in decimal, indicating the 504th entry of the new PDT. Because we’re identity-mapping, we point this entry at the 2 MB page frame beginning at physical address 0xFEE00000. Remember that the new page table begins at offset 4,096 from boot.page_directory_table, and page table entries are 8 bytes, so we populate the PDE at offset 4,096 + (index 503 × 8 bytes) = 8,120 = 0x1FB8.

movl $0xFEE00000, %eax
orl $0b10000011, %eax
movl %eax, boot.page_directory_table + 0x1FB8

With that, we’ve identity-mapped the 2 MB beginning at physical address 0xFEE00000 into virtual memory. The QEMU monitor confirms this:

$ cargo run
(qemu) info mem
0000000000000000-0000000040000000 0000000040000000 -rw
00000000fee00000-00000000ff000000 0000000000200000 -rw

We can now access the LAPIC register file in 64-bit code.

Representing the LAPIC in Rust

The LAPIC registers and their offsets are listed on the OSDev wiki. They’re each 4 bytes wide and aligned on a 16-byte boundary. For the purpose of setting up periodic timer interrupts, there are four registers we care about:

  • The End of Interrupt Register (0x0B0, write-only)
  • The Timer Vector Register (0x320)
  • The Timer Initial Count Register (0x380)
  • The Timer Divide Configuration Register (0x3E0)

We define a structure to represent the LAPIC register file in a new module:

// src/arch/x86_64/interrupts/apic.rs

#[repr(C)]
pub struct APIC {
    _1: [u32; 44],
    end_of_interrupt_register: volatile::WriteOnly<u32>,
    _2: [u32; 155],
    timer_vector_register: volatile::ReadWrite<u32>,
    _3: [u32; 23],
    timer_initial_count_register: volatile::ReadWrite<u32>,
    _4: [u32; 23],
    timer_divide_configuration_register: volatile::ReadWrite<u32>
}

The repr(C) attribute ensures the Rust compiler preserves the order of the struct fields. The unused fields (_1, _2, etc.) pad the struct so that the fields we do use begin at the correct offsets. For example, the End of Interrupt Register begins at offset 0x0B0, or 44 doublewords into the register file, so _1 is 44 doublewords wide.

The wrapper types volatile::ReadWrite<u32> and volatile::WriteOnly<u32> come from the volatile crate. If we used plain-old u32s, the Rust compiler might notice if we never read the values we write to the registers and optimize the writes away. The writes must happen even if we never read them; they have side effects we rely on. We declare this third-party crate dependency in Cargo.toml:

[dependencies]
volatile = "0.2.6"

Now we define a constructor function that returns a static, mutable borrow of the APIC at the default address:

// src/arch/x86_64/interrupts/apic.rs

impl APIC {
  pub unsafe fn get() -> &'static mut APIC {
      &mut *(0xFEE00000 as *mut APIC)
  }
}

This function is unsafe because two separate callers will each get mutable references to the same data. It is the caller’s responsibility to ensure no other code is configuring the APIC.

Lastly, we’ll define a static variable in the arch::x86_64::interrupts module to hold a safely-locked APIC reference:

// src/arch/x86_64/interrupts/mod.rs

mod apic;
use apic::APIC;

use lazy_static::lazy_static;
use spin::Mutex;

lazy_static! {
    static ref LAPIC: Mutex<&'static mut APIC> = Mutex::new(unsafe { APIC::get() });
}

Whenever we need to access the LAPIC, we’ll take an exclusive lock on the LAPIC static variable instead of calling APIC::get(). This will ensure our code is safe from data races.

Initializing the LAPIC

We add an initialize method to APIC:

// src/arch/x86_64/interrupts/apic.rs

impl APIC {
  // pub unsafe fn get() -> APIC { … }

  pub fn initialize(&self) {
  }
}

The APIC timer is placed in periodic mode by setting bit 17 of the Timer Vector Register. The vector to be used for timer interrupts goes in bits 0–7 of the same register:

use super::Vector;

impl APIC {
  // pub unsafe fn get() -> APIC { … }

  pub fn initialize(&self) {
      self.timer_vector_register.write(0x20000 | Vector::Timer);
  }
}

The APIC timer starts from the value of the Timer Initial Count Register and decrements at a rate determined by the value of the Timer Divide Configuration Register. When the current count reaches zero, the APIC issues a timer interrupt. A real operating system would tune the values of the Timer Initial Count Register and the Timer Divide Configuration Register based on the CPU’s speed. We can hard-code them for now.

A value of 0b1011 in the Timer Divide Configuration Register indicates the timer’s current count should decrement on every cycle of the bus clock. We’ll use this divide configuration, but there are others—for example, a value of 0 indicates the timer’s count should decrement every two cycles, and 0b1010 indicates every 128 cycles.

self.timer_divide_configuration_register.write(0b1011);

Next, we’ll set the Timer Initial Count Register to 10,000,000. With our divide configuration, this means a timer interrupt will be issued every 10,000,000 clock cycles.

self.timer_initial_count_register.write(10000000);

Interrupts from the APIC are acknowledged by writing 0 to the End of Interrupt Register. We acknowledge any outstanding interrupts:

self.end_of_interrupt_register.write(0);

All put together, the initialize method looks like this:

pub fn initialize(&self) {
    self.timer_vector_register.write(0x20000 | Vector::Timer);
    self.timer_divide_configuration_register.write(0b1011);
    self.timer_initial_count_register.write(10000000);

    self.end_of_interrupt_register.write(0);
}

Finally, we call LAPIC.initialize(&self) from arch::x86_64::interrupts::initialize. This function is called from arch::x86_64::initialize(), which is in turn called from main(), the Rust kernel entrypoint.

// src/arch/x86_64/interrupts/mod.rs

// ...

pub(super) fn initialize() {
    INTERRUPT_DESCRIPTOR_TABLE.load();
    LAPIC.lock().initialize();
}

Handling APIC timer interrupts

We configured the APIC timer to use the same interrupt vector that the 8259 timer used (Vector::Timer). We already handle interrupts with that vector. See INTERRUPT_DESCRIPTOR_TABLE in arch::x86_64::interrupts:

// src/arch/x86_64/interrupts/mod.rs

lazy_static! {
    static ref INTERRUPT_DESCRIPTOR_TABLE: InterruptDescriptorTable = {
        let mut table = InterruptDescriptorTable::new();

        // ...

        table[Vector::Timer].handle_with(self::handlers::timer);

        table
    };
}

self::handlers::timer prints a dot and acknowledges the interrupt so the APIC can deliver the next one:

// src/arch/x86_64/interrupts/handlers.rs

use crate::{println, print};
use super::idt::{InterruptStackFrame, PageFaultErrorCode};
use super::complete;

pub extern "x86-interrupt" fn timer(_stack_frame: &InterruptStackFrame) {
    print!(".");
    complete();
}

There’s one thing we need to change here. crate::arch::x86_64::interrupts::complete needs to acknowledge the current interrupt via the APIC instead of the 8259. We add a complete method to APIC that writes 0 to the End of Interrupt Register:

// src/arch/x86_64/interrupts/apic.rs

impl APIC {
    // ...

    pub fn complete(&mut self) {
        self.end_of_interrupt_register.write(0);
    }
}

Then we update crate::arch::x86_64::interrupts::complete to call it:

// src/arch/x86_64/interrupts/mod.rs

fn complete() {
    LAPIC.lock().complete();
}

At this point, the APIC is delivering periodic timer interrupts and we’re correctly handling them, but if we run Georgix, it will crash immediately. This is because the 8259 is also delivering periodic timer interrupts.

Disabling the 8259 PICs

The chained 8259 PICs are enabled at boot. They’re disabled by writing 0xFF to their respective data ports.

The tutorial had us use the third-party pic8259_simple crate to configure the 8259s. This crate doesn’t support disabling the 8259s, and even if it did, it doesn’t make much sense to rely on a third-party driver just for that.

Instead, we define a minimal pair of structures in a new module, one to represent a single 8259 PIC and another to represent the chained pair, each with an appropriate disable method. We use the in-tree Port API to interact with the PICs’ I/O ports:

// src/arch/x86_64/interrupts/pic.rs

use crate::arch::x86_64::io::Port;

pub struct ChainedPIC {
    parent: PIC,
    child: PIC
}

impl ChainedPIC {
    pub fn new(parent: PIC, child: PIC) -> ChainedPIC {
        ChainedPIC { parent, child }
    }

    pub fn disable(&self) {
        self.parent.disable();
        self.child.disable();
    }
}

#[allow(dead_code)]
pub struct PIC {
    command_port: Port,
    data_port: Port
}

impl PIC {
    pub fn new(command_port: u16, data_port: u16) -> PIC {
        PIC {
            command_port: Port::new(command_port),
            data_port: Port::new(data_port)
        }
    }

    pub fn disable(&self) {
        unsafe { self.data_port.write(0xFFu8); }
    }
}

In arch::x86_64::interrupts, we use our new API to disable the 8259s. Remember that the parent PIC uses command port 0x20 and data port 0x21, and the child PIC uses 0xA0 and 0xA1:

// src/arch/x86_64/interrupts/mod.rs

mod pic;
use pic::{ChainedPIC, PIC};

lazy_static! {
    // ...

    static ref PICS: Mutex<ChainedPIC> = Mutex::new(
        ChainedPIC::new(
            PIC::new(0x20, 0x21),
            PIC::new(0xA0, 0xA1)
        )
    );

    // ...
}

pub(super) fn initialize() {
    // ...
    PICS.lock().disable();
    LAPIC.lock().initialize();
}

Finally, as the pic8259_simple dependency is no longer used, we can remove it from Cargo.toml.

Conclusion

With that, we’ve successfully replaced the 8259 PIC with the APIC. We can compile and run Georgix with cargo run and see timer interrupts firing via the APIC:

Screenshot of Georgix running in QEMU on macOS. A series of dots indicates the timer interrupt has fired.

The QEMU monitor confirms that we’ve configured the APIC timer correctly:

$ cargo run
(qemu) info lapic
dumping local APIC state for CPU 0 

...
LVTT	 0x00020020 active-hi edge                 periodic     Fixed  (vec 32)
Timer	 DCR=0xb (divide by 1) initial_count = 10000000
...

This was a fun detour.

For more information on the x86-64 APIC, see the Intel 64 software developer’s manual, volume 3A, chapter 10, and the AMD64 architecture programmer’s manual, volume 2, chapter 16.