Project 1: The C Build Process

Difficulty Level: ★☆☆☆☆

Due date: April 9 23:59

Learning Goals
Getting Help
Introduction
Using the Terminal
Getting Started
1. Creating a pa1 directory
2. Hello, World!
Building the Program
Submission Checklist
Submission Instructions
Afterword

Learning Goals

In this assignment, we will–

Get more practice using the terminal on ieng6,
Write a simple hello world program in C,
Understand the basic build procedure of a C program, and
Learn to use the scp command to copy files from ssh servers.

Getting Help

You are always welcome to come to either instructor or TA office hours. Both of which are listed on the course website. In office hours, however, conceptual questions will be prioritized.

If you need help while working on this PA, make sure to attend tutor hours. Tutor hours policy can be found in the syllabus.

You can also post questions on Edstem.

Introduction

If you have only used an IDE before for programming (such as Eclipse, IntelliJ, VSCode, etc.), the build process may have been as simple as a click of a button, concealing the rather more complicated process underneath.

In this assignment, we will explore the different steps involved in building a famous C program, from writing the source code all the way to generating the final executable.

Written by Brian W. Kernighan and Dennis M. Ritchie in the bygone era of the 1970s, the name of our chosen program is no doubt familiar to us all: Hello World. You may know these two authors from the highly acclaimed book The C Programming Language, known more popularly as the “K&R” book, or simply K&R. A good book to own.

Using the Terminal

It is our hope that, after taking this course, all our students will become reasonably comfortable with using a computer from the terminal. This means editing files, navigating directories (folders), and running programs all with the keyboard. Achieving this goal would require lots of practice with various terminal commands and tools, some of which you should have been introduced to in lab already, and which you will put into practice for this assignment.

Specifically, you should try your best to become familiar with the Linux environment.

Outdated as it may seem at first glance, the text-based terminal is still the most efficient way to interact with a computer once you become good at it.

Mastery of the terminal is crucial for your success in this course, as well as, and perhaps even more importantly, for your future career.

Before you start this PA, it is important that you have completed Lab 1, and have become reasonably comfortable with the material from Lab 1.

Getting Started

To get started, connect to the ieng6 server using ssh. If you don’t remember how, check out the instructions from Lab 1.

Having logged into ieng6, navigate to your CSE 29 course directory using the cd command. You should have created this in Lab 1.

To verify that you are in the right directory, you can use the pwd command to print out your current working directory.

A directory is really just what we computer science people call a folder. If ever you find yourself wandering through the history of operating systems, you will learn how this name came to be. Or, just ask ChatGPT.

To see what is in the current directory, use the ls command to list the contents. If this is the first time you are doing this, then your course directory may be empty, and ls will not print anything.

Creating a `pa1` directory

We would like to keep all our files organized throughout the quarter, so let’s create a directory to hold our files for PA 1, and go into it.

$ mkdir pa1
$ cd pa1

The dollar sign ($) represents the prompt. You should type everything after the prompt into your terminal. The first command, mkdir pa1, creates the directory called pa1; the second, cd, goes into the directory we created.

Hello, World!

We will use the vim editor to create and write our first program. You should have been introduced to vim in the first lab, but if you want to remind yourself how to use it, run the vimtutor command to go through the tutorial again.

The source code for our first program will be in the hello.c file. To create it this file, we can simply open it using vim:

$ vim hello.c

Enter insert mode and write down your first C program of CSE 29:

#include <stdio.h>

int main()
{
    printf("Hello, World!\n");
    return 0;
}

Building the Program

Having written the code, we now go through the steps to obtain the final executable program.

Preprocessing

The first of many steps in the build process of a C program is preprocessing. In this step, we use gcc to preprocess our source file and store the resulting intermediate file as hello.i. The command we use to build the intermediate file is:

$ gcc -E -o hello.i hello.c -Wall

This long command might seem a bit impenetrable, or even a bit like black magic, so let’s try to demystify it piece by piece. But first, you should refer to the built-in manual page for gcc: Simply type in man gcc in the terminal. Read the DESCRIPTION section! The first paragraph is highly relevant to this assignment: “When you invoke GCC, …“

The man reader can be difficult to navigate since it’s not really mouse-based, but your vim training should come in handy here – the navigation keys are the same: j to go down, k to go up. To quit the manual, simply press q.

Yes, there will be a lot of documentation reading in this course, and more likely than not in your future career as wlll. Best to sharpen up your reading skills now!

The first part of the command, gcc, is just the name of the program we want to execute, i.e., the C compiler. All the other parts that come after are command-line arguments/options, which specify some settings we want to apply to gcc, or information that we want to pass to gcc.
Try to understand the meaning of the first command line option -E. Use the manual page man gcc in the terminal. (To quit the manual reader, simply press q.)
The -Wall option tells gcc to enable all warning messages, so that you are aware of all the potential issues in your program. (It is short for “Warn-All”.)
The -o hello.i option (yes, these two keywords are considered one option) specifies the output file of this step: hello.i.
hello.c is the input file, i.e. our source code.

Having executed the command, you should now have the intermediate file hello.i in your pa1/ directory. You can use the ls command to double check.

Open the file hello.i in the vim editor and see how much extra code has been added to our source code. All this extra code comes from #include <stdio.h> in our code. The preprocessor has now included the stdio.h header file, which contains a wide range of I/O-related functions, into our little hello world source code.

Try and find the declaration of the function printf that we use to print out the message. The declaration should look something like:

extern int printf (__const char *__restrict __format, ...);

Don’t worry about the precise meaning of this line, but understand that by including stdio.h in our source code, we tell the compiler that the printf function in our program hello.c is actually declared in stdio.h. The compiler then uses that information to find the definition of the printf function in the GNU C Library (glibc).

You will better understand the differences between a function declaration vs. definition as we progress through the quarter, but for now, this post explains it pretty well.

Compilation Phase

In this next step, we will turn the intermediate file from the previous step into assembly code. This step is known as compilation proper in the gcc manual. While often we call the entire build process “compiling”, this step is the actual compilation in the narrower sense.

The output of the compilation phase is a file named hello.s.

Now, it’s up to you to figure out what the command should be for this step! It should look something like this–

$ gcc <option> <input file> -Wall

In the previous step, <option> was set to -E, and the input file was hello.c. What should they be here?

Hint: Use the man page! (man gcc)

Once you have figured out the correct command and executed, open the generated hello.s file in vim. Your file should contain:

The string "Hello, World!", and
The label main:, which should look something like the following:

main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

Once again, don’t worry about understanding the contents of this file now. This is assembly language. When you move onto CSE 30, you will know what this is all about!

Assembling Phase

Now that we have the assembly code, we can go ahead with the assembling phase. We will once again ask gcc to stop short of the subsequent linking phase.

The output file for this step will be the object file hello.o. The command can be either of the following:

# option 1
$ gcc -c hello.c -Wall
# or, option 2
$ gcc -c hello.s -Wall

Note that the input file to gcc can either be the C source file that we started with (hello.c), or the assembly file that we just obtained (hello.s).

Run either one of these commands and check to make sure you have the new hello.o file.

Now, try to open the new object file hello.o in vim. If everything went right, you’ll see a bunch of garbage. (I bet you didn’t think you’d see this sentence.)

This is because, unlike the other files we have generated so far, hello.o is no longer a text file. Instead, it contains raw binary data, a long sequence of 1s and 0s – machine code! Welcome to the machine.

To view the contents of the object file, we need to use a disassembler to convert the machine code back to human-readable assembly code. The disassembler program is called objdump, and it can be run like so:

$ objdump -d hello.o

And the output should look something like this:

hello.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   ...

(The full output is not shown here.)

Save the disassembled output to a file named object_file.dmp. To do this, we can use the redirection operation so that, insteading being shown on your screen, the output of a program is redirected to a file:

$ objdump -d hello.o > object_file.dmp

The > indicates redirect, followed by the name of the file in which we wish to store the output.

Linking Phase

At last, we come to the final phase of the build process, where the object file is linked with the files in the standard C library to form our final executable file.

The command:

# option 1: compile from source file
$ gcc -o hello hello.c -Wall
# or, option 2: compile from object file
$ gcc -o hello hello.o -Wall

Note:

Just like before, we can either use the original source file or the object file from the previou step. GCC is intelligent enough to distinguish between the two to figure out what to do.
THe -o <output> option is used to specify the name of the output file hello. This will be the name of our program.

If successful, you should now have the executable file hello in your directory. This is our program. You can run it and observe the output:

$ ./hello
Hello, World!

Hello! The world of C programming welcomes you back.

Thus began the journey of millions of C programmers in the past half-century, of which now you are also a part. (Cue epic symphonic music.)

Note the ./ part of the command: We cannot simply type hello to run the program, because the computer would then try to locate the hello program as a built-in program elsewhere in the system. So we have to tell the terminal that we want to execute the hello program in the current directory, which is what ./ stands for.

Again, if you try to open the hello program in vim, all you get is garbage because, again, the file contains machine code. To inspect the actual contents of our program, we use objdump again.

Run the objdump command on the hello program, and save the disassembled output to a file named executable.dmp. This file should be much larger than the disassembled object file.

Submission Checklist

You should have the following files in your pa1/ directory after completing this part of the assignment:

hello.c: the source code for our Hello, World! program,
hello.i: the intermdiate file from preprocessing,
hello.s: the assembly file from the compilation proper,
hello.o: the object file from assembling,
hello: the executable file,
object_file.dmp: the disassembled hello.o object file.
executable.dmp: the disassembled hello executable file.

Submission Instructions

For this PA, we ask you to manually download your files from the ieng6 server for Gradescope submission.

Create a zip file

To do this, we first need to package all our files into a zip file:

The zip command creates zip files (duh). Your task is to create a zip file containing all the files listed in the checklist above. Feel free to use whatever resources you have available to you to discover how to use the zip command. (Google, stackoverflow, ChatGPT, etc.)

It does not matter what you name this zip file, as long as it ends in .zip.

Downloading the zip file

To download the zip file we just created, we’ll need to know its path, which means “where it is” in the file system.

For instance, you should have been working in the pa1 directory, which is a part of the cse29 directory, which is in your home directory (~/). The path to your pa1 directory is therefore ~/cse29/pa1/. Combine that with the name of your zip file, e.g. pa1.zip, the path of the zip file is therefore ~/cse29/pa1/pa1.zip.

This is dependent on how you have done the previous steps. If you created your directories differently, then you will have a different path.

The SCP command

scp is an important part of the ssh toolchain. It is used for copying files. It is very similar to cp in syntax and functionality except you can specify a file on a remote machine as a copy source and/or destination.

Remember that our ssh connection consists of both ssh clients and ssh servers. In our case, your personal device – or more specifically, the ssh program on your personal device – acts as the ssh client. The ieng6 server acts as the ssh server.

The scp command is a client-side command, which means it should always be run from the client side. So, if you are currently logged into ieng6, you can either:

log out by using the exit command, or
open a second terminal on your devices without connecting to ieng6. (If you are on Windows, this means opening up another WSL terminal.)

Having done this, we run the scp command from the client side:

$ scp source-file destination-file

In this case, source-file is the zip file on the ieng6 server, destination-file is where we want it to be on the client side, i.e., your device.

To specify a remote file, we use a combination of the remote server name and the file path on that remote server, joined together by a colon (:).

For example, say my remote server is jerry@ieng6.ucsd.edu, and the zip file I created is at ~/cse29/pa1/pa1.zip. I wish to copy the zip file from this remote location into my local home directory: ~/. Then, from my local machine (not on ieng6!), I would run the following command:

$ scp jerry@ieng6.ucsd.edu:~/cse29/pa1/pa1.zip ~/

MacOS & Linux

Now, if you are on MacOS or Linux, you should be able to see the pa1.zip file in your home directory. (How to find the home directory? Google it!) Use this zip file to make a Gradescope submission.

Windows (WSL)

If you are on Windows, and on WSL, then it is slightly more tricky to access the file, since WSL is actually running a separate file system on top of your Windows file system. But the good news is you can still open the WSL file system in Windows.

Assuming you scped the zip file into your home directory (~/) just like our example above, then you can open the home directory using the Windows file explorer:

$ explorer.exe ~/

You can run regular windows programs from the terminal too! How exciting.

Now that you can access your zip file, go ahead and make the Gradescope submission.

Gradescope

Go to the Gradescope submission page and submit your zip file.

Afterword

Even though we have now seen each and every in-between step for compiling a C program, the two files that you will most often see in this class are:

The C source file, and
The executable file.

As we progress through the quarter, you will also begin seeing .o files when a program is compiled from multiple source files.

And once you move on to CSE 30, you will begin seeing, and indeed writing, .s assembly files.

In reality, we usually compile small C programs like ours with a single command:

$ gcc -o hello -Wall hello.c

This command takes us directly from the source file to the final executable. All the steps in between still happen under the hood, but no intermediate files are generated.