Project 1: The C Build Process
Difficulty Level: ★☆☆☆☆
Due date: April 9 23:59
Table of contents
- Learning Goals
- Getting Help
- Introduction
- Using the Terminal
- Getting Started
- Building the Program
- Submission Checklist
- Submission Instructions
- Afterword
Learning Goals
In this assignment, we will–
- Get more practice using the terminal on ieng6,
- Write a simple hello world program in C,
- Understand the basic build procedure of a C program, and
- Learn to use the
scp
command to copy files from ssh servers.
Getting Help
You are always welcome to come to either instructor or TA office hours. Both of which are listed on the course website. In office hours, however, conceptual questions will be prioritized.
If you need help while working on this PA, make sure to attend tutor hours. Tutor hours policy can be found in the syllabus.
You can also post questions on Edstem.
Introduction
If you have only used an IDE before for programming (such as Eclipse, IntelliJ, VSCode, etc.), the build process may have been as simple as a click of a button, concealing the rather more complicated process underneath.
In this assignment, we will explore the different steps involved in building a famous C program, from writing the source code all the way to generating the final executable.
Written by Brian W. Kernighan and Dennis M. Ritchie in the bygone era of the 1970s, the name of our chosen program is no doubt familiar to us all: Hello World. You may know these two authors from the highly acclaimed book The C Programming Language, known more popularly as the “K&R” book, or simply K&R. A good book to own.
Using the Terminal
It is our hope that, after taking this course, all our students will become reasonably comfortable with using a computer from the terminal. This means editing files, navigating directories (folders), and running programs all with the keyboard. Achieving this goal would require lots of practice with various terminal commands and tools, some of which you should have been introduced to in lab already, and which you will put into practice for this assignment.
Specifically, you should try your best to become familiar with the Linux environment.
Outdated as it may seem at first glance, the text-based terminal is still the most efficient way to interact with a computer once you become good at it.
Mastery of the terminal is crucial for your success in this course, as well as, and perhaps even more importantly, for your future career.
Before you start this PA, it is important that you have completed Lab 1, and have become reasonably comfortable with the material from Lab 1.
Getting Started
To get started, connect to the ieng6
server using ssh
. If you don’t remember how, check out the instructions from Lab 1.
Having logged into ieng6, navigate to your CSE 29 course directory using the cd
command. You should have created this in Lab 1.
To verify that you are in the right directory, you can use the pwd
command to print out your current working directory.
A directory is really just what we computer science people call a folder. If ever you find yourself wandering through the history of operating systems, you will learn how this name came to be. Or, just ask ChatGPT.
To see what is in the current directory, use the ls
command to list the contents. If this is the first time you are doing this, then your course directory may be empty, and ls
will not print anything.
Creating a pa1
directory
We would like to keep all our files organized throughout the quarter, so let’s create a directory to hold our files for PA 1, and go into it.
$ mkdir pa1
$ cd pa1
The dollar sign ($
) represents the prompt. You should type everything after the prompt into your terminal. The first command, mkdir pa1
, creates the directory called pa1
; the second, cd
, goes into the directory we created.
Hello, World!
We will use the vim editor to create and write our first program. You should have been introduced to vim in the first lab, but if you want to remind yourself how to use it, run the vimtutor
command to go through the tutorial again.
The source code for our first program will be in the hello.c
file. To create it this file, we can simply open it using vim:
$ vim hello.c
Enter insert mode and write down your first C program of CSE 29:
#include <stdio.h>
int main()
{
printf("Hello, World!\n");
return 0;
}
Building the Program
Having written the code, we now go through the steps to obtain the final executable program.
Preprocessing
The first of many steps in the build process of a C program is preprocessing. In this step, we use gcc
to preprocess our source file and store the resulting intermediate file as hello.i
. The command we use to build the intermediate file is:
$ gcc -E -o hello.i hello.c -Wall
This long command might seem a bit impenetrable, or even a bit like black magic, so let’s try to demystify it piece by piece. But first, you should refer to the built-in manual page for gcc: Simply type in man gcc
in the terminal. Read the DESCRIPTION
section! The first paragraph is highly relevant to this assignment: “When you invoke GCC, …“
The man
reader can be difficult to navigate since it’s not really mouse-based, but your vim training should come in handy here – the navigation keys are the same: j
to go down, k
to go up. To quit the manual, simply press q
.
Yes, there will be a lot of documentation reading in this course, and more likely than not in your future career as wlll. Best to sharpen up your reading skills now!
-
The first part of the command,
gcc
, is just the name of the program we want to execute, i.e., the C compiler. All the other parts that come after are command-line arguments/options, which specify some settings we want to apply to gcc, or information that we want to pass to gcc. -
Try to understand the meaning of the first command line option -E. Use the manual page
man gcc
in the terminal. (To quit the manual reader, simply press q.) -
The
-Wall
option tells gcc to enable all warning messages, so that you are aware of all the potential issues in your program. (It is short for “Warn-All”.) -
The
-o hello.i
option (yes, these two keywords are considered one option) specifies the output file of this step:hello.i
. -
hello.c
is the input file, i.e. our source code.
Having executed the command, you should now have the intermediate file hello.i
in your pa1/
directory. You can use the ls
command to double check.
Open the file hello.i
in the vim editor and see how much extra code has been added to our source code. All this extra code comes from #include <stdio.h>
in our code. The preprocessor has now included the stdio.h
header file, which contains a wide range of I/O-related functions, into our little hello world source code.
Try and find the declaration of the function printf
that we use to print out the message. The declaration should look something like:
extern int printf (__const char *__restrict __format, ...);
Don’t worry about the precise meaning of this line, but understand that by including stdio.h
in our source code, we tell the compiler that the printf
function in our program hello.c
is actually declared in stdio.h
. The compiler then uses that information to find the definition of the printf
function in the GNU C Library (glibc
).
You will better understand the differences between a function declaration vs. definition as we progress through the quarter, but for now, this post explains it pretty well.
Compilation Phase
In this next step, we will turn the intermediate file from the previous step into assembly code. This step is known as compilation proper in the gcc manual. While often we call the entire build process “compiling”, this step is the actual compilation in the narrower sense.
The output of the compilation phase is a file named hello.s
.
Now, it’s up to you to figure out what the command should be for this step! It should look something like this–
$ gcc <option> <input file> -Wall
In the previous step, <option>
was set to -E
, and the input file was hello.c
. What should they be here?
Hint: Use the man page! (man gcc
)
Once you have figured out the correct command and executed, open the generated hello.s
file in vim. Your file should contain:
- The string
"Hello, World!"
, and - The label
main:
, which should look something like the following:
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
Once again, don’t worry about understanding the contents of this file now. This is assembly language. When you move onto CSE 30, you will know what this is all about!
Assembling Phase
Now that we have the assembly code, we can go ahead with the assembling phase
. We will once again ask gcc to stop short of the subsequent linking phase.
The output file for this step will be the object file hello.o
. The command can be either of the following:
# option 1
$ gcc -c hello.c -Wall
# or, option 2
$ gcc -c hello.s -Wall
Note that the input file to gcc
can either be the C source file that we started with (hello.c
), or the assembly file that we just obtained (hello.s
).
Run either one of these commands and check to make sure you have the new hello.o
file.
Now, try to open the new object file hello.o
in vim. If everything went right, you’ll see a bunch of garbage. (I bet you didn’t think you’d see this sentence.)
This is because, unlike the other files we have generated so far, hello.o
is no longer a text file. Instead, it contains raw binary data, a long sequence of 1s and 0s – machine code! Welcome to the machine.
To view the contents of the object file, we need to use a disassembler to convert the machine code back to human-readable assembly code. The disassembler program is called objdump
, and it can be run like so:
$ objdump -d hello.o
And the output should look something like this:
hello.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
...
(The full output is not shown here.)
Save the disassembled output to a file named object_file.dmp
. To do this, we can use the redirection operation so that, insteading being shown on your screen, the output of a program is redirected to a file:
$ objdump -d hello.o > object_file.dmp
The >
indicates redirect, followed by the name of the file in which we wish to store the output.
Linking Phase
At last, we come to the final phase of the build process, where the object file is linked with the files in the standard C library to form our final executable file.
The command:
# option 1: compile from source file
$ gcc -o hello hello.c -Wall
# or, option 2: compile from object file
$ gcc -o hello hello.o -Wall
Note:
- Just like before, we can either use the original source file or the object file from the previou step. GCC is intelligent enough to distinguish between the two to figure out what to do.
- THe
-o <output>
option is used to specify the name of the output filehello
. This will be the name of our program.
If successful, you should now have the executable file hello
in your directory. This is our program. You can run it and observe the output:
$ ./hello
Hello, World!
Hello! The world of C programming welcomes you back.
Thus began the journey of millions of C programmers in the past half-century, of which now you are also a part. (Cue epic symphonic music.)
Note the ./
part of the command: We cannot simply type hello
to run the program, because the computer would then try to locate the hello
program as a built-in program elsewhere in the system. So we have to tell the terminal that we want to execute the hello
program in the current directory, which is what ./
stands for.
Again, if you try to open the hello
program in vim, all you get is garbage because, again, the file contains machine code. To inspect the actual contents of our program, we use objdump
again.
Run the objdump
command on the hello
program, and save the disassembled output to a file named executable.dmp
. This file should be much larger than the disassembled object file.
Submission Checklist
You should have the following files in your pa1/
directory after completing this part of the assignment:
hello.c
: the source code for our Hello, World! program,hello.i
: the intermdiate file from preprocessing,hello.s
: the assembly file from the compilation proper,hello.o
: the object file from assembling,hello
: the executable file,object_file.dmp
: the disassembledhello.o
object file.executable.dmp
: the disassembledhello
executable file.
Submission Instructions
For this PA, we ask you to manually download your files from the ieng6 server for Gradescope submission.
Create a zip file
To do this, we first need to package all our files into a zip file:
The zip
command creates zip files (duh). Your task is to create a zip file containing all the files listed in the checklist above. Feel free to use whatever resources you have available to you to discover how to use the zip
command. (Google, stackoverflow, ChatGPT, etc.)
It does not matter what you name this zip file, as long as it ends in .zip
.
Downloading the zip file
To download the zip file we just created, we’ll need to know its path, which means “where it is” in the file system.
For instance, you should have been working in the pa1
directory, which is a part of the cse29
directory, which is in your home directory (~/
). The path to your pa1
directory is therefore ~/cse29/pa1/
. Combine that with the name of your zip file, e.g. pa1.zip
, the path of the zip file is therefore ~/cse29/pa1/pa1.zip
.
This is dependent on how you have done the previous steps. If you created your directories differently, then you will have a different path.
The SCP command
scp
is an important part of the ssh toolchain. It is used for copying files. It is very similar to cp
in syntax and functionality except you can specify a file on a remote machine as a copy source and/or destination.
Remember that our ssh connection consists of both ssh clients and ssh servers. In our case, your personal device – or more specifically, the ssh program on your personal device – acts as the ssh client. The ieng6 server acts as the ssh server.
The scp
command is a client-side command, which means it should always be run from the client side. So, if you are currently logged into ieng6, you can either:
- log out by using the
exit
command, or - open a second terminal on your devices without connecting to ieng6. (If you are on Windows, this means opening up another WSL terminal.)
Having done this, we run the scp
command from the client side:
$ scp source-file destination-file
In this case, source-file
is the zip file on the ieng6 server, destination-file
is where we want it to be on the client side, i.e., your device.
To specify a remote file, we use a combination of the remote server name and the file path on that remote server, joined together by a colon (:
).
For example, say my remote server is jerry@ieng6.ucsd.edu
, and the zip file I created is at ~/cse29/pa1/pa1.zip
. I wish to copy the zip file from this remote location into my local home directory: ~/
. Then, from my local machine (not on ieng6!), I would run the following command:
$ scp jerry@ieng6.ucsd.edu:~/cse29/pa1/pa1.zip ~/
MacOS & Linux
Now, if you are on MacOS or Linux, you should be able to see the pa1.zip
file in your home directory. (How to find the home directory? Google it!) Use this zip file to make a Gradescope submission.
Windows (WSL)
If you are on Windows, and on WSL, then it is slightly more tricky to access the file, since WSL is actually running a separate file system on top of your Windows file system. But the good news is you can still open the WSL file system in Windows.
Assuming you scp
ed the zip file into your home directory (~/
) just like our example above, then you can open the home directory using the Windows file explorer:
$ explorer.exe ~/
You can run regular windows programs from the terminal too! How exciting.
Now that you can access your zip file, go ahead and make the Gradescope submission.
Gradescope
Go to the Gradescope submission page and submit your zip file.
Afterword
Even though we have now seen each and every in-between step for compiling a C program, the two files that you will most often see in this class are:
- The C source file, and
- The executable file.
As we progress through the quarter, you will also begin seeing .o
files when a program is compiled from multiple source files.
And once you move on to CSE 30, you will begin seeing, and indeed writing, .s
assembly files.
In reality, we usually compile small C programs like ours with a single command:
$ gcc -o hello -Wall hello.c
This command takes us directly from the source file to the final executable. All the steps in between still happen under the hood, but no intermediate files are generated.