Project 5: Systems Programming
Reading time: 18 minutes
Difficulty Level: ★★★★★
Checkpoint Due May 17 23:59, PA Due May 24 23:59
The checkpoint may look simple, but it takes a lot of work to get it working. So please start this assignment early, and utilize tutor hours!
Slip days: You can use 3 slip days for both the checkpoint and the final submission.
Updates and Revisions
- [May 21 22:51] Updated final submission instructions.
- [May 15 15:24] Updated checkpoint submission instruction.
Table of contents
Learning Goals
Welcome to systems programming! This week we begin working on our own memory allocation system in C.
Specfically, we will practice the following concepts in C:
- bitwise operations,
- pointer arithmetic,
- memory management, and of course,
- using the terminal and vim.
Introduction
𝐀𝐥𝐥 𝐭𝐡𝐞 𝐦𝐞𝐦𝐨𝐫𝐲’𝐬 𝐚 𝐬𝐭𝐚𝐠𝐞,
𝐀𝐧𝐝 𝐚𝐥𝐥 𝐭𝐡𝐞 𝐚𝐫𝐫𝐚𝐲𝐬 𝐚𝐧𝐝 𝐬𝐭𝐫𝐮𝐜𝐭𝐬 𝐦𝐞𝐫𝐞𝐥𝐲 𝐩𝐨𝐢𝐧𝐭𝐞𝐫𝐬;
𝐓𝐡𝐞𝐲 𝐡𝐚𝐯𝐞 𝐭𝐡𝐞𝐢𝐫 𝐞𝐱𝐢𝐭𝐬 𝐚𝐧𝐝 𝐭𝐡𝐞𝐢𝐫 𝐞𝐧𝐭𝐫𝐚𝐧𝐜𝐞𝐬;
𝐀𝐧𝐝 𝐨𝐧𝐞 𝐛𝐥𝐨𝐜𝐤 𝐢𝐧 𝐢𝐭𝐬 𝐭𝐢𝐦𝐞 𝐩𝐥𝐚𝐲𝐬 𝐦𝐚𝐧𝐲 𝐩𝐚𝐫𝐭𝐬.
This is a monumental PA, and for some, a career-defining one. Generations of students of Computer Science have worked on this PA, from our own Prof. Aaron Schulman to your humble TA, and now to you. Decades may pass, but memory of this PA will remain with many of you, as it did with us.
This version of the allocator assignment was adapted from the version that I worked on when I learned C at UW-Madison years ago.
I tried to find a picture of myself in the old Madison CS building around that time. I could not find one, but the old memories made me deeply emotional for a minute.
How many unexpected turns life has taken! Seeing the way things were makes me think of the way many things would have, could have, or should have been. But onto the PA now:
The Final Product
The final product of our code for this PA will no longer be an executable, but a library. Your code will be compiled into a shared object file (.so
). We can then write other programs using our own library functions.
More specifically, you will be writing your own versions of the malloc
and free
functions, which, after PA 4, you should be very familiar with.
Getting Started
The starter code for this assignment is hosted on GitHub Classroom. Use the following link to accept the GitHub Classroom assignment:
Click here to accept this GitHub Classroom assignment. (Right click to open in new tab.)
Just like last time, clone the repository to your ieng6
server. (Do NOT clone the repo to your local machine.)
The Code Base
This is perhaps the largest and most complex code base we will ever deal with in CSE 29 this quarter. Let’s take a look at what we have–
The Library
vmlib.h
: This header file defines the public interface of our library. Other programs wishing to use our library will include this header file.vm.h
: This header file defines the internal data structures, constants, and helper functions for our memory management system. It is not meant to be exposed to users of the library.vminit.c
: This file implements functions that set up our “heap”.vmalloc.c
: This file implements our own allocation function calledvmalloc
.vmfree.c
: This file implements our own free function calledvmfree
.utils.c
: This file implements helper functions and debug functions.
Testing
vmtest.c
: This file is not a part of the library. It defines a main function and uses the library functions we created. We can test our library by compiling this file into its own program to run tests. You can write code in themain()
function here for testing purposes.tests/
: This directory contains small programs and other files which you should use for testing your code. We will explain this in more detail in a later section.
Compiling the Starter Code
To compile, run make
in the terminal. You should see the following items show up in your directory:
libvm.so
: This is a dynamically linked library, i.e., our own memory allocation system that can be linked to other programs (e.g.,vmtest
). The interface for this library is defined invmlib.h
.vmtest
: This executable is compiled fromvmtest.c
withlibvm.so
linked together. It uses our own memory management library to allocate/free memory.
The starter code in vmtest.c
is very simple: it calls vminit()
to initialize our simulated “heap”, and calls the vminfo()
function to print out the contents of the heap in a neatly readable format. Run the vmtest
executable to find out what the heap looks like rgith after it’s been initialize! (Hint: it’s just one giant free block.)
Reading the Starter Code
In previous PAs, you did not need to pay much attention to what we give you in the starter code. This time, however, you should begin by reading (and understanding!) some of the code that we have provided.
In lecture, you were taught using an example allocator with 4-byte headers and an 8-byte alignment requirement.
In this programming assignment, our header/footeres are 8 bytes in size, and all blocks are 16-byte aligned. This means the smallest unit of allocation is 16 instead of 8.
The
size_t
data type, which you will see frequently throughout this PA, is a 64-bit (8-byte) unsigned integer type.
Begin by reading through the vm.h
file, where we define the internal data structures for the heap. This is where you will find the all-important block headers and block footers. Focus on understanding how struct block_header
is used. You will see it in action later.
Next, open vminit.c
. This is a very big file containing functions that create and set up the simulated heap for this assignment. Find the init_heap()
function, and read through the entire thing to understand how it is setting up the heap. This is not an easy read! There is a lot of pointer arithmetic involved, you will need to do similar things for implementing vmalloc
and vmfree
, so make sure you have a solid understanding of that. (Why is it necessary to cast pointers to different types?)
Our allocator does not manipulate the actual heap in the address space. Instead, we create a large chunk of memory as our “simulated heap”, and allocations/frees are performed in that memory region.
Once you understand what the init_heap()
function is doing, open up utils.c
. Here we have implemented the function vminfo()
, which will be your ally throughout this PA. This function traverses through the heap blocks and prints out the metadata in each block header. You should find inspiration for how to write your own vmalloc
function here. Look at how it manipulates the pointer to jump between blocks!
Implementing vmalloc
void *vmalloc(size_t size);
The vmalloc()
function returns a void *
pointer. Unlike other pointers we have dealt with in the past, the void *
pointer is not associatd with any concrete data type. It is used as a generic pointer, and can be implicitly cast to any other type of pointer.
The
malloc()
function instdlib.h
also returns avoid *
pointer. Notice how we can assign the pointermalloc()
returns to any pointer type.int *p = malloc(sizeof(int) * 10); char *c = malloc(sizeof(char) * 64);
The size_t size
parameter specifies the number of bytes the caller wants. Again, this is the exact same as the usual malloc
we have been using.
If size
is not greater than 0, or if the allocator could not find a heap block that is large enough for the allocation, then vmalloc
should return NULL
to indicate no allocation was made.
Allocation Size Calculation
We need to do some calculations based on the requested size to get the actual block size that we need to look for in our heap.
- Add 8 bytes for the block header to the requested size;
- Round up the new size to the nearest multiple of 16.
The reason we do the rounding up is so that all allocated memory blocks are 16-byte aligned. There are complex architectural reasons behind this choice that we will not go into in this PA.
For example, if the user calls vmalloc(10)
, we first add 8 bytes for the header to get 18, and then round up to the nearest multiple of 16, which gets us 32. So to allocate 10 bytes, we need to look for a heap block that is at least 32 bytes in size.
Allocation Policy
We discussed many different allocation policies during class, and the one you will need to implement for this PA is the best-fit policy.
Once you have determined the size requirement of the desired heap block, you need to traverse the entire heap to find the best-fitting block. If there are multiple candidates of the same size, then you should use the first one you find.
Splitting Large Blocks
If the best-fitting block you find is larger than what we need, then we need to split that block into two. For example, if we are looking for a 16-byte block for vmalloc(4)
, and the best fitting candidate is a 64-byte block, then we will split it into a 16-byte block and a 48-byte block. The former is allocated and returned to the caller, the latter remains free.
Updating Block Header(s)
If a new block was created as a result of a split, you need to create a new header for it, and
- set the correct size.
- set the status bit to 0.
- set the previous bit to 1.
And for the block that was allocated, you need to
- update the size (if splitting happened).
- set the status bit to 1.
- go to the next block header (if it’s not the end mark) and set its previous bit to 1.
Returning the Address
After updating all the relevant metadata, vmalloc
should return a pointer to the start of the payload. Do not return the address of the block header!
Implementing vmfree
void vmfree(void *ptr)
The vmfree
function expects a 16-byte aligned pointer to an allocated payload (i.e., an address obtained by a previous call to vmalloc
). If the address ptr
is NULL
, then vmfree
does not perform any action and returns directly.
Once the address is verified, vmfree
goes to the block header to begin freeing the block. If the block header indicates that the block is already free, then vmfree
returns without performing any additional action. (This means that in our system, a double-free error is theoretically impossible.)
To free a block, the following actions must be taken (not necessarily in this order):
- unset the status bit to 0,
- create a block footer,
- unset the next block’s previous bit,
- coalesce with the next block if it is also free, and
- coalesce with the previous block if it is also free.
If you are coalescing two blocks, remember to update both the header and the footer!
You will also need to revisit your vmalloc
implementation to add code for creating footers.
Incremental Development Tips
- Run the starter code and understand the output. Always figure out how to run your program first! You can’t do any testing if you don’t know how to run your program.
- Understand the
init_heap()
function. - Understand the
vminfo()
function. - Begin writing the
vmalloc()
function: - write and test the size calculation code.
- write and test the best-fit policy. (A helper function would be great here.) See later section for testing images.
- write and test splitting free blocks.
- Test everything there is to test for
vmalloc()
. - Begin writing the
vmfree()
function: - Update
vmalloc()
implementation to include block footers. - write and test a basic
vmfree()
implementation that only frees one block. - write and test coalescing with the previous/next block.
- Test everything under the sun.
And lastly, as a general tip: Create lots of helper functions! – one for getting the block size, one for getting a pointer to the next block, one for setting the allocation bit, one for setting the previous bit…
Write code that is not only functional, but elegant.
Test Programs
There is no reference implementation for this assignment. Instead, we provide you with plenty of testing code to help you make sure your program is working.
In the repository, you will find the tests/
directory, where we have privided some small testing programs that test your library function.
By cd
ing in to the tests/
directory and running make
, you can compile all these test programs and run them yourself. These programs are what we will be using in the autograder as well.
For example, here’s the code for the very first test: alloc_basic.c
:
#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include "vmlib.h"
#include "vm.h"
/*
* Simplest vmalloc test.
* Makes one allocation and confirms that it returns a valid pointer,
* and the allocation takes place at the beginning of the heap.
*/
int main()
{
vminit(1024);
void *ptr = vmalloc(8);
struct block_header *hdr = (struct block_header *)
((char *)ptr - sizeof(struct block_header));
// check that vmalloc succeeded.
assert(ptr != NULL);
// check pointer is aligned.
assert((uint64_t)ptr % 16 == 0);
// check position of malloc'd block.
assert((char *)ptr - (char *)heapstart == sizeof(struct block_header));
// check block header has the correct contents.
assert(hdr->size_status == 19);
vmdestroy();
return 0;
}
In this test, we simply make one vmalloc()
call, and we verify that the allocation returned a valid pointer at the correct location in the heap.
These tests use the assert()
statement to check certain test conditions. If your code passes the tests, then nothing will be printed. If one of the assert
statements fail, then you will see an error.
Test Images
When you are writing vmalloc
, you may wonder how you can test the allocation policy fully when you don’t have the ability to free blocks to create a more realistic allocation scenarios (i.e., a heap with many different allocated/free blocks to search through).
To help you with that, we have created some images in the starter code in the tests/img
directory. In your test programs, instead of calling vminit()
, you can call the vmload()
function instead to load one of these images.
Our alloc_basic2
program uses one such image. If you open tests/alloc_basic2.c
, you will see that it creates the simulated heap using the following function call:
vmload("img/many_free.img");
If you load this image and call vminfo()
, you can see exactly how this image is laid out:
vmload: heap created at 0x7c2abd1da000 (4096 bytes).
vmload: heap initialization done.
---------------------------------------
# stat offset size prev
---------------------------------------
0 BUSY 8 48 BUSY
1 BUSY 56 48 BUSY
2 FREE 104 48 BUSY
3 BUSY 152 32 FREE
4 FREE 184 32 BUSY
5 BUSY 216 48 FREE
6 FREE 264 128 BUSY
7 BUSY 392 112 FREE
8 BUSY 504 32 BUSY
9 FREE 536 112 BUSY
10 BUSY 648 352 FREE
11 BUSY 1000 304 BUSY
12 BUSY 1304 336 BUSY
13 FREE 1640 320 BUSY
14 BUSY 1960 288 FREE
15 BUSY 2248 448 BUSY
16 BUSY 2696 256 BUSY
17 BUSY 2952 96 BUSY
18 BUSY 3048 368 BUSY
19 FREE 3416 672 BUSY
END N/A 4088 N/A N/A
---------------------------------------
Total: 4080 bytes, Free: 6, Busy: 14, Total: 20
You can use this image to test allocating in a more realistic heap.
Three images exist in total:
last_free.img
: the last block is free,many_free.img
: many blocks are free,no_free.img
: no block is free (use this to test allocation failure).
Formatting Your Code
Once again, we expect your code to be formatted for good readability:
Running clang-format
In the directory with your C code, run the following command:
$ clang-format -i *.c *.h
This command runs the clang-format
program. The -i
flag tells it to modify the source files in place, which means changes will be made directly to the files. *.c *.h
tells the program to format all files with the extension .c
and .h
.
The hidden .clang-format
file defines the code style we use in this course.
The Checkpoint (Due May 17 23:59)
For the checkpoint, we expect you to finish the implementation for the basic features of vmalloc
. This means your alloctor should be able to do the following:
- Traverse the heap to find the best-fitting block, and
- Allocate that block by updating the block header and returning the correct address.
You do not need to have implemented block splitting or block footers, or anything related to vmfree
for this checkpoint. But we encourage you to start working on it as soon as possible.
Again, it is not easy to reach this checkpoint. You will need to have a very good understanding of the allocator and of the code base to be able to implement this.
Checkpoint Tests
There are no hidden tests for this checkpoint. The only two tests we will run are alloc_basic
and alloc_basic2
, both of which you can run yourself from the tests/
directory.
Final Submission
Submit your code to Gradescope. Make sure your code compiles and works with the aforementioned two test programs.