Booting into Rust and deadlocking right away - a gnarly bug in my hobby kernel, one of many to come

Let me provide same background context about this hobby kernel adventure first:

Ever since working with the xv6 kernel for MIT’s 6.828 “Operating System Engineering” labs (which are publicly available, see my previous blog post), reading its source and extending some of it functionality, I have been wanting to start writing my own hobby kernel from scratch. I never felt quite ready, so I kept looking at other hobby OS projects and read many blog posts. Finally, I decided to just start working on my own recently, what could go wrong? Well, many things can go wrong in OS dev, more on that soon, but I encourage anyone interested in kernel development to just go and start writing their own without waiting “to be ready” for too long.

Since I am most familiar with xv6’s design and RISC-V architecture being more approachable than x86, I opted for targeting qemu’s riscv64 virt machine, just like xv6. There are even a few xv6 ports to Rust already, which is the implementation language of my choice, so I’ve got enough source code to look at when I get stuck. However, I am trying to do that as little as possible, for it is far too easy to start copying things verbatim. To avoid my implementation being just an exact clone, I am also diverging from xv6 in some ways, for example I make use of OpenSBI, a sort of hardware abstraction layer that provides a [S]upervisor [B]inary [I]nterface to the kernel, analogous to what an ABI and in particular system calls are to user space. This makes interacting with hardware easier, instead of bit fiddling memory-mapped peripherals, one can just do function calls (e.g debug print via UART). See this great article by Uros Popovic to get a proper introduction to the SBI standard.

Besides the SBI functionality itself, OpenSBI provides a pointer to a structure describing the machine’s hardware to the kernel by passing it in a register before calling into the kernel code. This structure is called a devicetree and I use it to obtain the number of cores and amount of memory of the machine dynamically. In contrast, xv6 and many other small hobby/educational kernels use static values for these at compile time, which simplifies initialization code and general design. It seemed like a cool first thing improve upon compared to the kernels I had looked at. Well, this divergence from xv6’s design is also the root cause for the first gnarly bug that left me bewildered for quite a while until I realized what was happening…

In order to parse the fdt (flattened devicetree) structure passed by OpenSBI, I pulled in my first dependency to do the job. Here is a simplified snippet of the relevant code in kmain the first Rust code getting executed after a little setup in Assembly:

 1pub static KMEM: SpinLock<PhysMem> = SpinLock::new(PhysMem::new());
 2
 3#[no_mangle]
 4pub extern "C" fn kmain(fdt_addr: usize) -> ! {
 5    // addresses of the linker defined symbols are the actual values we need
 6    let heap_start = unsafe { ((&_HEAP_START) as *const usize) as usize };
 7	
 8    // determine amount of cores and memory space through devicetree
 9    let (num_cpu, mem_end) = {
10        let fdt = match unsafe { Fdt::from_ptr(fdt_addr as *const u8) } {
11            Ok(fdt) => fdt,
12            Err(err) => panic!("failed to parse fdt, err: {}", err),
13        };
14		
15        let mem_region = fdt.memory().regions().next().unwrap();
16        let mem_end = mem_region.starting_address + mem_region.size.unwrap();
17		
18        (fdt.cpus().count(), mem_end)
19    };
20
21
22    let mut kmem = KMEM.lock();
23    kmem.init(heap_start, mem_end);
24
25    panic!("kmain done");
26}

After adding the few lines of code to parse the fdt and obtain amount of cores and memory, I reran the kernel and observed my kmain function getting stuck right at the point of locking the global KMEM (line 22), the physical page allocator (which needs to know the amount of memory to initialize the datastructure that records all the free pages). A little bit more of context: at this point of the initialization process the kernel is only executing on the boot core (OpenSBI only “starts” one core, others have to be activated by the kernel), the Spinlock implementation was incomplete, but it did correctly initialize the lock to not be locked from the beginning. So what could cause the KMEM to be in a locked state before ever calling the lock() function? Using gdb to step through the code, I did observe the value of KMEM changing to seemingly garbage right after executing Fdt::from_ptr (line 10). Apparently something caused memory corruption around the area where KMEM is stored. Initially my bet was on some incorrect use of unsafe code causing undefined behavior, but it turned out that wasn’t it. Maybe some readers can already guess what is the problem even without me having mentioned the memory layout yet, for me however, it took quite a while to finally have an idea of the underlying issue. Well, here is the memory layout (defined by this linker script):

...
--- 0x80200000
.text
--- 0x8020a398
.data
--- 0x8020a560 
.rodata
--- 0x8020dcd8 
.bss
(KMEM at 0x8020dce0)
--- 0x8020dd00
boot core stack, 4096kb size
stack grows upwards in this diagram
(before calling into Rust sp=0x8020ed00)
--- 0x8020ed00

The problem: kmain’s stack overflows into the .bss section (where global variables and thus KMEM resides). My solution (for now): just increase the stack size. One technique I know of that guards against kernel stack overflow is the use of guard pages. A quote from the excellent xv6 book:

Each process has its own kernel stack, which is mapped high so that below it xv6 can leave an unmapped guard page. The guard page’s PTE is invalid (i.e., PTE_V is not set), so that if the kernel overflows a kernel stack, it will likely cause an exception and the kernel will panic. Without a guard page an overflowing stack would overwrite other kernel memory, resulting in incorrect operation. A panic crash is preferable.

But this requires virtual memory and paging to be enabled, which only comes later in the initialization process. Let me know if you know a better solution than just using a bigger stack space in this scenario!