Lab 8: File I/O

Table of contents

  1. Getting Started
  2. File I/O Basics
    1. Opening a File
    2. Writing a File
    3. Reading a File
  3. File I/O Activity
  4. UNIX I/O
  5. Memory vs. Disk
  6. Optional Feedback Form

Getting Started

With your partner, discuss:

  • What is your favorite place that you’ve been to, and why is it the CSE basement?
  • If it’s a travel destination, who did you go there with, and when?

File I/O Basics

Accept the assignment on GitHub Classroom (https://classroom.github.com/a/XJJXG35A) to create a repo with the starter code for this lab. Clone this repo into your cse29 folder on ieng6.

In this lab, we’ll demonstrate how to write C programs to read and write data to files.

Opening a File

First, we must open the file to obtain a file pointer to the file. In write_file.c, we open the file with fopen():

FILE* fp = fopen("file.txt", "w");

This function takes in two arguments:

  1. The filename of the file to open.
  2. The file opening mode. This defines what operations are allowed on the opened file. Here, “w” defines that only write operations are allowed. It also specifies that if the file does not exist, then it will be created.

This file pointer fp is then used in any following file operations. After we’re done operating on the file, we must close it with fclose():

fclose(fp);

Writing a File

Also in write_file.c, we write to the file with fwrite():

fwrite(str, sizeof(char), strlen(str)+1, fp);

This function takes in four arguments:

  1. A pointer to the data to write.
  2. The size of each element to be written.
  3. The number of elements to be written. The size of each element is defined by the previous argument. Here we say that we want to write one more byte than the length of the string in order to include the null terminator.
  4. The file pointer to be written to.

Try compiling and running write_file to see that it writes the string in the program code to file.txt.

$ make write_file
$ ./write_file

If you open file.txt in Vim, you might notice that there’s a weird thing at the end that wasn’t in the string originally. We can display the raw data of the file in hexadecimal with the xxd command:

$ xxd file.txt

This command output shows

  • in the middle eight columns, the contents of the file in hexadecimal. Each four-digit chunk represents two bytes of data (i.e. two characters).
  • on the left, the positions of each line’s data within the file, represented in hexadecimal.
  • on the right, the ASCII representation of the data in the middle columns.

At the end of the file, you can see that the ASCII representation shows two dots (.) at the end. They correspond to the bytes 0x2e (which is actually just the period character) and 0x00 (the null terminator). Any characters that cannot be normally rendered in ASCII are represented with a dot in the right column.

Reading a File

In read_file.c, we first create a large buffer. We use the buffer to store the data that will be read in from the file. Since we don’t necessarily know how large the data is, we try to make the buffer sufficiently large.

Again, we open the file, but this time we specify that the mode is “r”, meaning that we are only allowed to perform read operations on this file.

Then we use fread() to read from the file:

fread(buffer, sizeof(char), BUFFER_SIZE-1, fp);

This function takes in four arguments, which you might notice are very similar to fwrite():

  1. A pointer to some memory to write data to.
  2. The size of each element to be written.
  3. The number of elements to be written. This and the previous argument specify the same thing as the corresponding arguments in fwrite().
  4. The file pointer to read from.

Here we read in one fewer bytes than the size of the buffer. This is done so that when we print the buffer later, we want to make sure that it is null-terminated, which we set right after the read.

Try compiling and running read_file to see that it reads the file that was created by write_file, and prints this to the terminal.

$ make read_file
$ ./read_file

What if the data was a lot larger than our buffer? Try reducing the BUFFER_SIZE constant in read_file.c to some smaller number, like 16 bytes. If the buffer is smaller than the data, then only the data that fits into the buffer will be read. We can’t make the buffer infinitely large to accommodate any file size. Instead, we will make multiple read calls, filling the buffer with small portions of data from the file at a time.

Let’s try implementing this in read_file.c. This is best accomplished in a while loop:

  • The condition of the loop depends on the output of fread(). fread() returns the number of elements that have been read from the file. In this case, we’ve defined that each element is 1 byte, so fread() returns the number of characters read. We want to continue in this loop while the number of characters read is greater than zero.
  • The contents of the loop are the two lines that follow the read, without modifications.

Once this loop is implemented, read_file should be able to print out the entire file, even if the buffer is smaller than the total data size.

File I/O Activity

In this activity, we’ll use file I/O to extend the functionality of PA4. The webster program already supports loading from a dictionary file into the program, which is achieved with file I/O. We will implement the opposite: save the current dictionary in the program to a dictionary file.

Your task is to implement save_dictionary(), which is defined at the top of sh.c. This function takes in the filename of the file to save to, and the dictionary to save. Remember that the struct dictionary contains a linked list list which contains the contents of the dictionary. Some code to check that the filename is valid is already given. Step-by-step, the remainder of this function will:

  1. Open the file corresponding to filename using fopen(). We want to open this file for writing operations.
  2. Check if the file opening operation failed. fopen has failed if the file pointer it returns is NULL. If so, return 1.
  3. Iterate through the dictionary and write each word and its definition to a line in the file.
  4. Close the file.

Instead of using fwrite(), in which we have to specify how many bytes to write, we will instead use fputs(), which puts a string into a file, up to a null terminator. This function takes in two arguments: a null-terminated string to write, and a file pointer to write to. For each entry in the dictionary, we want to write, in order:

  1. The word.
  2. The separator between the word and the definition: “: “.
  3. The definition.
  4. The newline character, to end the line.

A call to fputs() would look like this:

fputs(str, fp);

where str is to null-terminated string to write, and fp is the file pointer.

Once this is implemented, you should be able to use the new save command in webster. Use make to compile webster, and try loading a dictionary, adding new words, and saving to a new dictionary file. If the file is formatted correctly, you should also be able to load this new dictionary into webster.

UNIX I/O

In the previous examples, we’ve demonstrated and practiced how to read and write to files using “standard I/O” (or stdio). There is also another set of functions which perform file I/O, called “UNIX I/O”. UNIX I/O functions are actually system calls, meaning that they are an interface provided by the operating system kernel. In fact, the standard I/O functions are implemented using these UNIX I/O functions.

The suspicious.bin file contains some data, but it doesn’t make sense when we try to look at it. Fortunately, we (I) happen to know that the file has been encrypted by adding 1 to each byte of data. In decode_bin.c, we have given a program which uses UNIX I/O functions to decode this file by reading each byte one-by-one from the file, subtracting 1, and writing the decoded byte to a new file.

The UNIX I/O functions are generally still the same: open, read, and write correspond to fopen, fread, and fwrite. Corresponding standard I/O function calls are written as comments above each UNIX I/O function call in decode_bin.c. Some notable differences include:

  • The opening mode is a number rather than a string. Some convenient constants are given to easily specify allowed operations (e.g. O_RDONLY for read-only, O_WRONLY for write-only, etc) and the bitwise OR can be taken between each of them to construct a mode with multiple allowed operations. For example, we open “decoded.txt” with both file writing (O_WRONLY) and file creation (O_CREAT) allowed.
  • If the opened file is newly created, then the call must specify which permissions the new file will have. S_IRUSR and S_IWUSR specify that the user is allowed to read and write to the file.
  • open returns a file descriptor as an integer, rather than a file pointer. This file descriptor is passed to read and write in the same way that file pointers are used in standard I/O.

Try compiling and running decode_bin to decode the file and display the decoded data. Is the decoded data still suspicious?

$ make decode_bin
$ ./decode_bin
$ cat decoded.txt

Memory vs. Disk

In lecture, we’ve demonstrated how writing to memory is significantly faster than writing to the disk. In the starter code, we’ve provided you with the program we use to show this. Compile this program and run it with the commands:

$ make mem_vs_disk_ns
$ ./mem_vs_disk_ns
  • Try adjusting the DATA_SIZE constant to modify how much data is written to each. How much does the time taken to write to each change when you increase the size by a lot?
  • We use fsync to write each number directly to the file on the disk. Writing to the disk so frequently is awfully slow (and intentionally so, that’s what we wanted to demonstrate). What changes if you move the call to fsync outside of the for loop it’s in?

Optional Feedback Form

If you’d like to give feedback on how labs are conducted and how they can be improved, please feel free to submit any comments in this anonymous form. This is a space for you to drop any comments you have at the end of every lab!