Project 4: Webster Dictionary Defines PA 4 As Reasonably Difficult

Reading time: 10 minutes

Difficulty Level: ★★★★☆

Due date: May 09 23:59*

Because of the upcoming midterm, you have an additional week to complete this PA. However, doing this PA will give you a much better understanding of concepts that will be heavily featured on the midterm. So do not wait until after the exam to start!

Updates and Revisions

[May 6 17:11] Extended deadline due to campus disruptions.
[May 4 15:01] Added Gradescope submission instructions.
[May 1 15:23] Added section on string.h library functions.
[May 1 12:18] Added Mid-Quarter Survey section.
[April 30 18:09] Added section on testing and valgrind.
[April 25 14:13] Updated GitHub Classroom link.
[April 25 10:21] Updated GitHub Classroom link.

Table of contents

Mid-Quarter Survey (3%)
Learning Goals
Introduction
1. Linked Lists
Getting Started
Using string.h Library Functions
Hints (for Making Your Life Easier)
Running A Script
Finding Memory Errors
Formatting Your Code
Submission
1. Submitting from GitHub
2. Gradescope Autograder

Mid-Quarter Survey (3%)

This survey is worth 3% of your PA 4 grade. Please complete it as soon as possible.

Complete the mid-quarter survey here.

Learning Goals

In this assignment, we will

practice dynamic memory allocation,
use pointers with structs,
debugging pointer code using gdb,
finding memory errors using valgrind,
implement a linked list,
learn how to use a code formatter, and of course,
practice using the terminal and vim some more.

At this point in the quarter, you should really know how to use gdb. Hopefully through lab and previous PAs you have gained a decent amount of experience with it.

You need to write a lot of code for this PA, and it is inevitable that you will run into a decent amount of bugs (and that’s perfectly fine). But not knowing how to use gdb well will make this PA 5 times harder than it needs to be!

Introduction

Our objective this week is to create an interactive dictionary in the command line. You do not have to implement the interactive part, as we have included in the starter code a fully-featured interactive shell. What you do have to implement is the underlying database containing a list of words loaded from a formatted dictionary file.

Linked Lists

Linked lists are one of the most fundamental data structures in computer science. Unlike an array, where the elements are stored continuously in memory, elements in a linked list can be anywhere, and the ordering is maintained by a pointer in each element that points to the next element (hence the name linked list).

Here’s an illustration of what a linked list looks like:

Linked List

(Source: GeeksforGeeks: Understanding the basics of Linked List)

In this programming assignment, you will be creating a linked list interface.

Getting Started

The starter code for this assignment is hosted on GitHub Classroom. Use the following link to accept the GitHub Classroom assignment:

Click here to accept this GitHub Classroom assignment. (Right click to open in new tab.)

Just like last time, clone the repository to your ieng6 server. (Do NOT clone the repo to your local machine.)

The Code Base

Just like in PA 3, the code base for our new assignment consists of multiple C files. The dictionary.h header file is our “interface” file, to use a more familiar Java analogy. The header file defines critical data types for our program and outlines the function interfaces that you need to implement.

dictionary.c is where you will write your implementation for the functions declared in the header file. It is also the only file that you should be editing for this PA. The functions implemented here are then used by the driver in sh.c to create the interactive command-line dictionary program.

If one C file wants to use a function written in another C file, the function must be declared in a header file, which should then be included (#include) in both C files, acting as a bridge connecting the two at compile time. We never include .c files directly!

Once again, we have provided you with an executable program compiled from our reference implementation: ref-webster. Your program should match the behavior of this reference implementation in every way.

To begin, read through the dictionary.h header file. We will go through the contents of this header file below.

Data Type: `struct dictionary`

struct dictionary, as the name suggests, defines a dictionary. A struct dictionary instance contains two fields: size keeps track of the total number of words, and list points to the head of a linked list of all the words in the dictionary.

Data Type: `struct dict_entry`

struct dict_entry defines an entry in the dictionary. Each entry also constitutes one node in the aforementioned linked list in struct dictionary. Notice that there are two strings in this struct:

word is a string that holds the word itself; it is a character array built into the struct itself,
definition is a pointer to a string containing the definition of word.

This means when we create a new dictionary entry, we can directly copy the word into the struct, but we need to separately allocate a string buffer dynamically to hold the definition. It is also important, consequently, to free that dynamically allocated definition string when we free the dictionary entry itself.

The Dictionary Interafce

You only need to implement one thing for this assignment: the entire linked list interface for the dictionary as described in dictionary.h:

struct dict_entry *new_entry(const char *word, const char *definition);
void free_entry(struct dict_entry *entry);
void dict_insert_sorted(struct dictionary *dict, struct dict_entry *entry);
void dict_remove_word(struct dictionary *dict, const char *word);
void print_word(struct dict_entry *entry);
void print_dictionary(struct dictionary *dict);
void print_word_list(struct dictionary *dict);
struct dict_entry *dict_lookup(struct dictionary *dict, const char *word);
struct dictionary *load_dictionary(const char *filename);
void free_dictionary(struct dictionary *dict);

You can find the specifications for all these functions as comments in the dictionary.h header file. You should implement all these functions in dictionary.c.

The Interactive Shell

The main function for our dictionary program is written in sh.c. You do not need to modify this file, but we will introduce what it does.

sh.c implements the interactive shell with which we interact with the underyling linked list data structure. THe list of commands are recorded in the print_help() function in sh.c, which can be invoked by typing in the help command, or simply h:

Available shell commands:
(q)uit             Quit the dictionary.
(h)elp             Show this command guide.
(l)oad             Initialize an empty dictionary.
(l)oad [file]      Load a formatted dictionary file.
(s)how             Show all words in the dictionary.
(s)how [word]      Show the definition of a word.
showall            Show all words and their definitions.
(i)nsert word def  Insert a new word definition into dictionary.
                   Assume there will be no duplicate entries.
(r)emove word      Remove a word from the dictionary.
size               Show the size of the dictionary.

When we first compile and run the webster executable, we are greeted with the title and a brief guide to all the built-in commands (reproduced above). At the prompt (webster) , we can enter one of the commands above to work with the dictionary.

Letters enclosed in parentheses are shorthands for their respective commands. E.g., instead of typing the whole word load, you can just type l. This is very similar to how we interact with gdb.

The `load` command

To begin, we need to load a dictionary file. You can use the load command like so:

(webster) l dict/short
Dictionary loaded with 5 word(s).

Alternatively, you can execute load without a dictionary file as argument, which will initialize an empty dictionary:

(webster) l
Dictionary loaded with 0 word(s).

Under the hood, the load command calls the load_dictionary function. If another dictionary has already been loaded, it would first call free_dictionary on that dictionary and then load in the new one.

The struct dictionary returned from the load_dictionary function is stored in a global variable in sh.c, so there can only be one loaded at the same time.

This command will not work right away, as it depends on the new_entry() and dict_insert_sorted() function, both of which you must implement yourself.

Incremental development is more vital than ever for this programming assignment. You should really think about how you can apply the principles here. You do not want to wait till you have finished both functions before testing it! We have already talked about a wide range of strategies to inspect and validate your code. Try to apply those skills now.

We have provided in the dict/ directory three dictionary files for you to test with. You can begin with dict/short for initial development, as its smaller size makes it more manageable and easier to debug with.

The `size` command

The size command directoy accesses the size field in struct dictionary. So make sure when you insert/remove words from your dictionary, you are also incrementing and decrementing this size field appropriately.

(webster) l dict/gre
Dictionary loaded with 642 word(s).
(webster) size
642

The `show` command

The show command can also be executed without arguments.

When executed by itself, it calls the print_word_list() function to print out all the words currently loaded in the dictionary.

(webster) s 
fun
gone
heard
protect
these

When called with a word as argument, it calls the dict_lookup() function to look up the definition of the given word. The lookup is case-sensitive.

(webster) s protect
protect
 To cover or shield from danger or injury; to defend; to guard; to preserve in safety; as, a father protects his children. The gods of Greece protect you! Shak. Syn. -- To guard; shield; preserve. See Defend.
(webster) s those
Cannot find "those" in dictionary.

The `showall` command

The showall command calls the print_dictionary() command under the hood, which prints out all the words along with their definitions in the dictionary.

The `insert` command

The insert command must be called with a word and its definition as arguments. The first word after the command is the word to be added, and everything that comes after is treated as the definition:

(webster) l
Dictionary loaded with 0 word(s).
(webster) i protect To cover or shield from danger or injury.
(webster) s
protect
(webster) s protect
protect
 To cover or shield from danger or injury.

The shell program we provided will parse the input for you, and call new_entry() and then dict_insert_sorted() to add the word to the dictionary.

For the implementation of dict_insert_sorted(), you may assume that there will not be duplicate words. All words will also be lower-case only.

The `remove` command

The remove command must be called with a word. It calls the dict_remove_word() function to try to remove the given word from the dictionary linked list. If the word does not exist in the dictionary, then nothing happens.

(webster) l dict/short
Dictionary loaded with 5 word(s).
(webster) s
fun
gone
heard
protect
these
(webster) r gone
(webster) s
fun
heard
protect
these
(webster) s gone
Cannot find "gone" in dictionary.
(webster) r nonexistent
(webster) s
fun
heard
protect
these

Using `string.h` Library Functions

To implement this assignment, you will find some functions in the string.h library particularly helpful. Specifically, we would like you to pay attention to strlen() and strncpy().

(The descriptions are taken directly from the Linux manual pages. To access these documentation pages yourself, type in the terminal man strlen and man strcpy. You can do this with other function as well, such as man fopen.)

First, we have strlen() for finding the length of a string.

STRLEN(3)                  Linux Programmer's Manual                 STRLEN(3)

NAME
      strlen - calculate the length of a string

SYNOPSIS
      #include <string.h>

      size_t strlen(const char *s);

DESCRIPTION
      The strlen() function calculates the length of the string pointed to by
      s, excluding the terminating null byte ('\0').

RETURN VALUE
      The strlen() function returns the number of bytes in the string pointed
      to by s.

To copy strings from a source buffer to a destination buffer, use the `strncpy()`
function. **Not the `strcpy()` function!

Here’s the documentation for strcpy and strncpy. Use the latter!

STRCPY(3)                  Linux Programmer's Manual                 STRCPY(3)

NAME
      strcpy, strncpy - copy a string

SYNOPSIS
      #include <string.h>

      char *strcpy(char *dest, const char *src);

      char *strncpy(char *dest, const char *src, size_t n);

DESCRIPTION
      The  strcpy()  function  copies the string pointed to by src, including
      the terminating null byte ('\0'), to the buffer  pointed  to  by  dest.
      The  strings  may  not overlap, and the destination string dest must be
      large enough to receive the copy.  Beware  of  buffer  overruns!   (See
      BUGS.)

      The  strncpy()  function is similar, except that at most n bytes of src
      are copied.  Warning: If there is no null byte among the first n  bytes
      of src, the string placed in dest will not be null-terminated.

      If  the  length of src is less than n, strncpy() writes additional null
      bytes to dest to ensure that a total of n bytes are written.

RETURN VALUE
      The strcpy() and strncpy() functions return a pointer to  the  destina‐
      tion string dest.

BUGS
      If the destination string of a strcpy() is not large enough, then  any‐
      thing  might  happen.  Overflowing fixed-length string buffers is a fa‐
      vorite cracker technique for taking complete control  of  the  machine.
      Any  time  a  program  reads  or copies data into a buffer, the program
      first needs to check that there's enough space.  This may  be  unneces‐
      sary  if you can show that overflow is impossible, but be careful: pro‐
      grams can get changed over time, in ways that may make  the  impossible
      possible.

Again, do not use strcpy()! Use strncpy() with an explicit length to avoid dangerous bugs like buffer overflow! See the BUGS section in the documentation above to understand why.

Again x2, DO NOT USE strcpy()!

Hints (for Making Your Life Easier)

To make this assignment easier to work with, you need to make sure you have a solid grasp on the following:

using GDB, and
memory layout of a linked list.

Dealing with pointers is hard, but it gets easier over time for sure. There is one important technique to accelerate this process: liberal use of memory diagrams.

In the coming weeks, you should aim to become very good at drawing memory diagrams. For now, we will show you an example tailored to this PA:

PA 4 memory layout

The diagram above shows the memory layout of a dictionary containing three words: "first", "second", and "third". Each arrow represents a pointer.

The dictionary itself is a struct dictionary. The size field has a value of 3 because there are 3 words in the dictionary. The list field points to the first entry, a struct dict_entry.

In each struct dict_entry, you can see the word is stored within the struct, but the definition is allocated separately using malloc. The next pointer points to the next entry in the dictionary, or NULL if the entry is at the end of the list.

Running A Script

When you are testing your program, it can be a lot of work to manually run commands every time you need to test your code. To overcome this inconvenience, we can create a file containing a sequence of commands, and use our old friend indirection to feed the file into our webster program to automatically run all these commands.

Here’s an example of a simple script. Let’s name this commands.txt:

load dict/short
show
insert word new definition
s word
quit

To feed this into the program:

$ ./webster < commands.txt

You can create different scripts based on your testing needs.

Finding Memory Errors

In this PA, you are very likely to run into various types of memory errors, and as we introduced in Discussion 4, valgrind is an excellent tool for finding memory errors.

In our autograder, we expect your code to run without any memory errors. (Most importantly, it should not have any memory leaks!) To make sure of that, you need to run valgrind yourself on your program:

$ valgrind --leak-check=full --track-origins=yes ./webster

To make things even easier, we can run it with a script (see previous section) so that we don’t have to type in commands when valgrind is running. Suppose you have a script named test_insert.txt:

$ valgrind --leak-check=full --track-origins=yes ./webster < test_insert.txt

Formatting Your Code

Let’s talk about coding style.

You probably have heard of the term before. Coding style dictates how source code is laid out in a file. As you read more and more code throughout your career, you will realize the importance of having a good and clean style.

It’s what makes this so hard to understand:

#include<stdio.h>
int main(){
int l;


  printf("Enter a limit: ");


          //scanf("%i", limit);
          scanf("%d", &limit);
        printf("Even numbers between 1 and %d:\n", limit);
    for (int i=1;i<=l; i++) {
    if (i%2 ==0) {
      //printf(i);
        printf(  "%d ",i);
    }
}
printf ("\n");
    return 0;

}

…whereas this is nice, clean, and straightforward to read–

#include <stdio.h>

int main() 
{
    int limit;

    printf("Enter a limit: ");
    scanf("%d", &limit);

    printf("Even numbers between 1 and %d:\n", limit);
    for (int i = 1; i <= limit; i++) {
        if (i % 2 == 0) {
            printf("%d ", i);
        }
    }
    printf("\n");

    return 0;
}

The latter example is neat, well-indented, highly readable with well-named variables. The former is a bit of an unholy mess, which sadly we see too often in lab hours and office hours.

There is empirical evidence that poorly formatted code leads to more bugs!

Starting this PA, we are introducing a new requirement – code formatting.

But who has time to format their code all the time? I hear you ask. And indeed, it would be much easier to automate this process, which we will introduce next.

The `clang-format` tool

clang-format is a tool that automatically formats source files for languages like C, C++, JavaScript, etc. While it can often be integrated into an IDE or an editor like Visual Studio Code, we will simply run it from the command line.

The `.clang-format` configuration file

Just like the .vimrc file that you set up during lab, clang-format settings are also stored in a file: .clang-format is a hidden file in your repository that we have created. This file defines the coding style that your code will be formatted to.

The starter code you receive is formatted using this setting.

Running `clang-format`

In the directory with your C code, run the following command:

$ clang-format -i *.c *.h

This command runs the clang-format program. The -i flag tells it to modify the source files in place, which means changes will be made directly to the files. *.c *.h tells the program to format all files with the extension .c and .h.

If you have not seen this in lab already, the * character is called a wildcard character. Basically, it means “all”.

For instance, if we want to list all files in the directory starting with test-:
$ ls test-*
Or, if we wish to remove all the files within the output/ directory:
$ rm ./output/*
This is a very useful tool for efficient terminal usage. Keep it in mind!

Submission

Again, please complete the mid-quarter survey (see first section) for 3% of your grade for this programming assignment.

Submitting from GitHub

First, make sure that the most up-to-date version of your code has been pushed to GitHub. If you want to double check, go to the GitHub web page and find your repository to make sure the code changes you made on ieng6 are actually all there.

Having done that, go to Gradescope and find the submission page for this PA. When you submit the assignment, you will see the option to submit from a specific GitHub repository. You will need to link your GitHub account first, after which you should be able to find your pa2 repository. Select it, and choose the main branch.

We have not discussed branching in git yet, so unless you know what you are doing, you should only see main for branch options.

Gradescope Autograder

We have changed our autograder so that you are able to see your final score when you submit, but you will not be able to see the hidden tests until after the grades have been released.

In this PA, we are also running some valgrind tests. So make sure you run valgrind on your code before submitting to gradescope.