csapp Decoding Lab

solution

(Note: This key may yield different results on different systems, or even across different IDEs within the same system, so the answer is for reference only.)

problem analysis

First, let's paste the source code:

#include <stdio.h>
#include <stdlib.h>

int prologue [] = {
   0x5920453A, 0x54756F0A, 0x6F6F470A, 0x21643A6F,
   0x6E617920, 0x680A6474, 0x6F697661, 0x20646E69,
   0x63636363, 0x63636363, 0x72464663, 0x6F6D6F72,
   0x63636363, 0x63636363, 0x72464663, 0x6F6D6F72,
   0x2C336573, 0x7420346E, 0x20216F74, 0x726F5966,
   0x7565636F, 0x20206120, 0x6C616763, 0x74206C6F,
   0x20206F74, 0x74786565, 0x65617276, 0x32727463,
   0x594E2020, 0x206F776F, 0x79727574, 0x4563200A
};

int data [] = {
   0x63636363, 0x63636363, 0x72464663, 0x6F6D6F72,
   0x466D203A, 0x65693A72, 0x43646E20, 0x6F54540A,
   0x5920453A, 0x54756F0A, 0x6F6F470A, 0x21643A6F,
   0x594E2020, 0x206F776F, 0x79727574, 0x4563200A,
   0x6F786F68, 0x6E696373, 0x6C206765, 0x796C656B,
   0x2C336573, 0x7420346E, 0x20216F74, 0x726F5966,
   0x7565636F, 0x20206120, 0x6C616763, 0x74206C6F,
   0x20206F74, 0x74786565, 0x65617276, 0x32727463,
   0x6E617920, 0x680A6474, 0x6F697661, 0x20646E69,
   0x21687467, 0x63002065, 0x6C6C7861, 0x78742078,
   0x6578206F, 0x72747878, 0x78636178, 0x00783174
};

int epilogue [] = {
   0x594E2020, 0x206F776F, 0x79727574, 0x4563200A,
   0x6E617920, 0x680A6474, 0x6F697661, 0x20646E69,
   0x7565636F, 0x20206120, 0x6C616763, 0x74206C6F,
   0x2C336573, 0x7420346E, 0x20216F74, 0x726F5966,
   0x20206F74, 0x74786565, 0x65617276, 0x32727463
};

char message[100];

void usage_and_exit(char * program_name) {
   fprintf(stderr, "USAGE: %s key1 key2 key3 key4\n", program_name);
   exit(1);
}

void process_keys12 (int * key1, int * key2) {
   
   *((int *) (key1 + *key1)) = *key2;
}

void process_keys34 (int * key3, int * key4) {

   *(((int *)&key3) + *key3) += *key4;
}

char * extract_message1(int start, int stride) {
   int i, j, k;
   int done = 0;

   for (i = 0, j = start + 1; ! done; j++) {
      for (k = 1; k < stride; k++, j++, i++) {

         if (*(((char *) data) + j) == '\0') {
            done = 1;
            break;
         }
                      
         message[i] = *(((char *) data) + j);
      }
   }
   message[i] = '\0';
   return message;
}

char * extract_message2(int start, int stride) {
   int i, j;

   for (i = 0, j = start; 
       *(((char *) data) + j) != '\0';
       i++, j += stride) 
       {
          message[i] = *(((char *) data) + j);
       }
   message[i] = '\0';
   return message;
}

int main (int argc, char *argv[])
{
   int dummy = 1;
   int start, stride;
   int key1, key2, key3, key4;
   char * msg1, * msg2;

   key3 = key4 = 0;
   if (argc < 3) {
      usage_and_exit(argv[0]);
   }
   key1 = strtol(argv[1], NULL, 0);
   key2 = strtol(argv[2], NULL, 0);
   if (argc > 3) key3 = strtol(argv[3], NULL, 0);
   if (argc > 4) key4 = strtol(argv[4], NULL, 0);

   process_keys12(&key1, &key2);

   start = (int)(*(((char *) &dummy)));
   stride = (int)(*(((char *) &dummy) + 1));

   if (key3 != 0 && key4 != 0) {
      process_keys34(&key3, &key4);
   }

   msg1 = extract_message1(start, stride);

   if (*msg1 == '\0') {
      process_keys34(&key3, &key4);
      msg2 = extract_message2(start, stride);
      printf("%s\n", msg2);
   }
   else {
      printf("%s\n", msg1);
   }

   return 0;
}

The program does not use the two arrays prologue and epilogue, so we can ignore them.

Converting the values in data into strings, we get:

cccccccccFFrromo
: mFr:ie ndC.TTo
:E Y.ouT.Gooo:d!
  NYowo tury. cE
hoxoscineg lkely
se3,n4 tto! fYor
oceu a  cgalol t
to  eextvraectr2
 yantd.havioind 
gth!e .caxllx tx
o xexxtrxacxt1x.

We can vaguely see the shadow of the plaintext (for example, words like from, friend, good), but we still need to examine the decryption function to confirm it.

Let's first look at extract_message1:

char * extract_message1(int start, int stride) {
   int i, j, k;
   int done = 0;

   for (i = 0, j = start + 1; ! done; j++) {
      for (k = 1; k < stride; k++, j++, i++) {

         if (*(((char *) data) + j) == '\0') {
            done = 1;
            break;
         }
                      
         message[i] = *(((char *) data) + j);
      }
   }
   message[i] = '\0';
   return message;
}

The program works as follows: starting from start+1 (inclusive), it converts every stride-1 data values into a string and appends them to message, then skips 1 data value, looping until it encounters an ASCII code of 0.

By visual inspection, we notice that starting from the 10th character (remember, counting begins at 0), reading 2 characters and skipping 1 works perfectly, forming an English sentence like (From: Friend To: You…). Thus, we can determine that start = 9 and stride = 3.

After slightly modifying the code and running it, we obtain our first message.

However, this is just the first step — we still need to figure out key1 and key2.

Then you'll be surprised to discover:

The key1 and key2 we input actually have nothing to do with the variables start and stride, right?

Both start and stride are tied to the variable dummy. We must find a way to manipulate dummy through our input, thereby changing start and stride.

This is where the magic of C pointers comes in.

Now, let's look at the process_keys12 function:

void process_keys12 (int * key1, int * key2) {

  

  *((int *) (key1 + *key1)) = *key2;

}

The code essentially means: take the value stored at (address key1 + value of key1) as a new address, and then assign the value of key2 to the content at this new address. (Without some foundation in pointers, this would be really hard to figure out.)

Through debugging, we find that the address of dummy is 0x61fe04, while the address of key1 is 0x61fe00. As everyone who has taken an intro CS course knows, an int occupies 4 bytes of memory, so the correct value for key1 should be 1.

Our next problem is: what value should dummy take? As readers of CS:APP will know, data is stored in little-endian format — lower addresses hold the low-order bytes, and higher addresses hold the high-order bytes. Clearly, start belongs to the lower address, while stride is in the higher address, with a difference of exactly sizeof(char), i.e., 1 byte. One byte corresponds to two hexadecimal digits. Therefore, key2 is 0x309, which is 777 in decimal.

Now, key3 and key4 are more troublesome. Looking at the hint from message1:

“Choose a pair of key3, key4 to bypass extract1 and force the program to call extract2.”

Let's analyze the extract2 function:

char * extract_message2(int start, int stride) {
   int i, j;

   for (i = 0, j = start; 
       *(((char *) data) + j) != '\0';
       i++, j += stride) 
       {
          message[i] = *(((char *) data) + j);
       }
   message[i] = '\0';
   return message;
}

It means: starting from start, read characters with a step size of stride. The process ends once an ASCII code of 0 is encountered.

By visual inspection again, we can see that when start = 9 and stride = 3, we can decrypt and obtain message2 as shown in the runtime screenshot.

Now the problem goes back to parameter passing. At first, I didn't understand why we had to bypass extract1. After carefully analyzing the program structure, I realized:

First, you cannot change the values of start and stride — otherwise, how would you decrypt correctly?
But if you don't change them, then you will inevitably decrypt message1 (since its beginning is not 0). This seems contradictory, so we need a different approach.

One of the few available write-ups online suggests modifying the return address to jump into that annoying conditional branch. However, for some unknown reason, even when I followed that method and successfully jumped, bypassing the if check, the return value of main was still not 0 (perhaps because stack overflow triggered the compiler's security mechanism — sometimes security really is a bad thing), which meant message2 could not be printed.

My idea, instead, is to try modifying the values in data within process_key34. Let's take a look at this function:

void process_keys34 (int * key3, int * key4) {

   *(((int *)&key3) + *key3) += *key4;
}

It's actually quite similar to process_keys12, except that the middle "+" becomes "+=".

Since message1 reads its first character from (*char)data[10] (the bolded part in data[2] = 0x72464663 — never forget little-endian order), we only need to change data[2] into 0x72004663 to make message1 an empty string.

The next step is to offset the address of (int*)&key3 by the value of key3 so that it points into data[2].

(Note: these addresses change each time the program runs, so the screenshot is for reference only.)

Through debugging, we can see that (int*)&key3 is at 0x61fdd0, while the address of data[2] is 0x4030a8. Since each element takes up 4 bytes, the offset is:

(0x4030a8 - 0x61fdd0) / 4 = -553802

This gives us the value of key3.

As for key4, the goal is to turn 0x72464663 into 0x72004663. That just means adding -0x460000, which is -4587520 in decimal — the value of key4.

Problem solved. ✅

remark

As a pwn enthusiast, it still took me three nights to solve this problem and finish writing up the thought process. In my view, the knowledge points involved here go a bit beyond the scope of standard textbooks (after all, the essence of C cannot be captured in just a few textbook paragraphs). If you ask a student who has just finished an introductory course and can barely distinguish between & and * to tackle this problem, that would be far too much.

But I have to admit, if you manage to solve this challenge independently (I also borrowed the approach for deriving key1 and key2), your C programming skills will have reached a whole new level — even stronger than that of an average OIer — and you'd be ready to start moving toward the binary exploitation track in CTF.