(sticky, archived) nju-pa experience

2023/4/20 ~ 2024/5/4

6705 commits (2 commits per compilation and execution)

A year of persistence has finally come to an end.

background

Given that my university's specialized courses haven't taught me much, I've begun to wonder whether there's still a gap in technical proficiency between me and my non-major classmates (or perhaps they've already surpassed me by taking classes, like Java), and what my strengths are.

I recall Qinyuan telling me a year ago that, from the employer's perspective, our students' programming skills aren't as good as those at the neighboring Electrical Engineering School. I was skeptical at the time, but looking at the current curriculum design, it's an undeniable fact (or perhaps an inevitable outcome). The university has indeed made some improvements, such as the online judge in C programming course (which has been criticized for its aggressive requirement and poor formatting), but of course, it's far from enough.

From the perspective of personal development, I'm running out of time. I must abandon any distractions that conflict with this goal (prioritizing, in order, comprehensive assessments, innovation and entrepreneurship exams, various competitions (including low-quality CTFs), and finally, my GPA (my strengths can sometimes also be my weaknesses)) to give myself as much time as possible to study the fundamentals that truly matter to my field of study.

Regarding the OS course, the college's theoretical courses are not bad at all, and are mainly exam-oriented. In contrast, the experimental courses are in a very awkward position, for the following reasons:

No preparatory course: There's no coverage of Linux basics or tutorials on how to use Git, which greatly improved my efficiency while completing the nachOS lab. I say this because some students are undoubtedly still writing code for the current lab by annotating code from previous labs (or re-copying the original source code). Those who understand will understand.
Excessively reduced difficulty: The current OS lab format combines PowerPoint presentations with video demonstrations. The video demonstrations inevitably include some source code, and students inevitably film or record this content with their phones. Consequently, the lab's difficulty is reduced to completing code fragments from different source code sources, and we lose the ability to use RTFSC. The consequences of this are: at best, we don't understand the overall architecture of nachOS; at worst, we can't answer the few questions Professor Liang Gang posed in lab 8 that are easily answered with RTFSC.
Plagiarism is a serious problem: People are lazy. nachOS is an old project. The answers to many labs can be found with a simple search. I actually plagiarized some of them.

why do I want to be a masochist (by doing PA)

~~simple, because I enjoy this~~

Being introduced by Tiger1218, nju_pa is absolutely a great course. In compare with nand2tetris I previously finished, it is more hard-core but a more smooth learning curve.
I have no more time, I need to acquire more information in a rather short period of time. High information density means high difficulty. Therefore, keeping in touch with something challenging is unavoidable.
In academia, having a deeper understanding of ISA & OS benefits to further research. In engineering, praticing coding skills makes me more competent in both major or non-major students in CS field.

pa0

I've already used Linux and built workflow for some time. So I just installed neovim and clone the source.

Learned some useful git commands like git branch, git checkout.

The Missing Semester of Your CS Education is a good course, bookmarked.

pa1

1.1

At first I was dumbfounded. Copilot gives some code suggestions, which makes me quickly understand what I need to do. Actually, it is quite easy.

1.2

Several months ago I learned regex and I forgot it. It took me 30min to learn it again. Actually the tokenize step is much easier than compiler section of nand2tetris.

Copilot helped me quickly finished the structure of eval function, but it made a mistake when finding the dominant operator and I spent several hours debugging this.

When it comes to modifying sdb.c to test a batch of expressions. I mistyped the path to my input file (btw, copilot suggested the path of yzh's project, which is a privacy issue). At first I don't know I can enable debug info in menuconfig, and static functions increased the difficulty analyzing the assembly instructions when using gdb. Therefore, it took me nearly an hour to debug this.

Also, I had a hard time tackling the floating point exception (div by 0) in expression generator. My idea is compile and run it, while redirecting exceptions to stderr. If grep exception stderr_file doesn't return 0, we think the expression is valid. However there are still some exceptions printed in my stdout_file, finally I've to use another command to filter the output.

perl -pe 's/Floating\ point\ exception\n//g' stdout_file > final_input

1.3

Expanding the eval function is not very hard, one important point is to change a condition to tackle unary operator (like * and -).

Implementing watchpoint pool is just some basic linklist operations, copilot did a good job.

However, copilot made a big mistake implementing watchpoint itself, it messed the return value of check_wp. I spent several hours again debugging this.

pa2

2.1

Understanding the design of risc-v is tough at first, copilot even decreased my proficiency by 20%, but when I found a book named RISC-V-Reader-Chinese-v2p1.pdf, things got better. It is actually just some repetitive work.

However there are still something requires patiece and you need to be careful especially when tackling opcodes which entail type conversion. It took me about an hour to debug again.

2.2

It's all about fundamental utilities again.

iringbuf, mtrace is quite easy, but ftrace takes a very long time, including these steps:

spend a little time to parse args, but failed to find a way to add this new feature to Makefile
spend some time to RTFM man 5 elf
spend a lot of time to distinguish call and ret step from jal and jalr opcodes, especially to acertain if I did it correctly because difftesting this is not a easy task. (finally I think it is not very important, maybe a waste of time?)

Successfully find some bugs in strcpy and sprintf by testcases from Copilot X.

Writing differential testing is easy and returns a lot, not quite understand why it isn't compulsory.

There is a correspondence in batch tests from a chapter ago:

Running NEMU in Batch Mode

We understand that most students might think: "I don't read the Makefile anyway, and neither do the instructors or TAs, so it doesn't matter."

So, let's add a mandatory step: When launching NEMU, you'll need to manually type "c" each time to run your client program. However, if you're not using NEMU's sdb, you can save yourself the typing of "c." NEMU implements a batch mode that allows you to run client programs directly after launching NEMU. Please read the NEMU code and modify the Makefile appropriately so that batch mode is enabled by default when launching the AM Makefile.

You can still skip this mandatory step for now, but it will soon become less convenient.

Actually I found it not convenient here, so I get back to this chapter and add it :(

2.3

The most painful step is debug the problem of system clock. After finishing AM_TIMER_UPTIME, first I use my laptop i7-6700HQ @ 2.60GHz to run performance test. However, it runs extremely slow (for microbench, it took over an hour to finish and only got 12 points). So first I try to find out why it runs in such a low efficiency for 2 or 3 days without success.

Occasionally I copied my code to another desktop i7-6700 @ 3.40GHz and run the performance test again. This time there is a floating point exception. I checked the formula for performance score and found the problem is AM_TIMER_UPTIME register hasn't been updated for each iteration. With knowing this, I quickly fixed the bug puzzled me for serveral days.

The next problem is AM_GPU_FBDRAW module. First I finished it and it seems no problem in video test. However in fceux-am the graphics cannot display properly, just like this:

To solve this problem, I enabled differential testing(difftest) and ftrace. The debug information shows the differs start at memcpy in my AM_GPU_FBDRAW function. However, the diff position varies when I run each time, which bothers me a lot. Occasionally, I deleted my original memcpy function

memcpy(&fb[(y + i) * W + x], ctl->pixels + i * w, w * 4);

to this:

memcpy(&fb[(y + i) * W + x], ctl->pixels, w * 4);

The graphics turn to whole blue. This assures me this bug relies on the second argument of this function. With the help of tiger1218 (I feel so sorry about that, I could solve this problem all by myself), I realized that ctl->pixels is a void pointer, it should address in bytes, not in DWORD, which solves it.

This story hasn't come to an end. After fixing this bug, the difftest problem still exists. I tried to run other tests in order to find some information helpful for debugging. During this period, I also fixed some other minor bugs such as blackscreen of slider (due to the boundary isn't set properly in AM_GPU_FBDRAW), program crashes when showing help message in am-tests (because %c is not implemented in my own library).

The real solution comes when I give up to work on pa3. I run nanos-lite and the program crashes again. This time I manually set panic breakpoint in different parts of main.c and found it crashes when printing the logo. I quickly realized the problem lies that the buffer is not big enough (1024 failed, 16384 is maybe enough), which also solves difftest problem.

I also wanted to work on sound driver. However, this requires cross-compilation of SDL2 library. I spent half an afternoon on this and failed. Tiger1218 tried to help me but quickly lost interest. He thinks this part is not very essential to whole PA. Maybe I'll finish sound driver when I have more time.

However, difftest failed to work after adjusting the buffer of print several days later, and I haven't fixed it again till now.

pa3

3.1

After the final exam, I continued to work out pa3.1.

I stuck at yield() for some days, for I have to read RISCV-manual, figure out execption trace and where to implement isa_raise_intr(). However if you did it, the rest tasks are much easier.

Although initialized mstatus to 0x1800, difftest still not able to work. I feel like giving up using it.

3.2

Because I've RTFSC for several days in 3.1, finishing 3.2 is just a piece of cake. I just stuck at printf output for several hours (it only prints H for each line). Finally I found that I forgot to make the whole directory of navy-apps.

3.3

PA3.3 contains a lot of work, the workload is about 30% of the code you need to write from PA1 to PA3. Moreover, as the system getting more and more complex, the time of debugging also increases. Actually I took 17 days to finish this chapter.

The work can be concluded in 3 parts: the VFS, NDL library, SDL library and corresponding applications. Here are some bugs that I struggled for a long time.

segmentation fault after fclose in file-test: first I thought there is something wrong in _free_r, however I'm not familiar with the code in system library. It is daunting to debugging this. So I tried to modify the file-test.c and observed segmentation fault has something to do with fscanf. I suspected there was a buffer overflow but without proof. At last I found the problem was in _sbrk which I written myself.
menu does not display correctly like mario before: this time I didn't make the memcpy mistake. However, I didn't figure out the relationship between width and height of canvas and screen. Also, I tackled the corner cases of SDL APIs incorrectly, which results jumbled output.
segmentation fault when entering the battle in PAL: I wanted to save time because using ftrace to find the backtrace is slow. So I used the traditional "print" method. Acutally the calling stack is a little longer than I expected (about 5 or 6 layers) and it took me even more time. Finally this call trace points to SDL_FillRect which written by myself again. And I found I didn't tackle the 8-bit color case (at first I added the fallback, but at sometime I think it was unnecessary and I deleted it) and the bound of pixel-copy procedure is incorrect, which caused my whole-day debugging.

Here is a screenshot of PAL in battle mode (I didn't use riscv32-nemu to take a screenshot for its extremely slowness):

pa4

4.1

According to ysyx, I need to finish rt-thread first. It took 9 days to finish it (from 11/28/2023 to 12/6/2023). After that I was preparing for the experiment for a paper and restarted to work on 4/2/2024. Finally finished pa4.1 on 4/6/2024.

Here I just list some bugs I encountered:

rt-thread does not work: The problem is the migration of abstract-machine, I restored the compile environment of it and it works.
rt-thread on NPC:
- I have to fflush(stdout) to make the output visible.
- Forgetting to modify riscv.h in abstract-machine to make context-switching work.
execve with args: The return value of declaration in syscall.c mistyped into void, the correct one should be Context *.
execve with args not working on pal: Forgetting to copy the argv and envp string to the user stack(Yes, only copy the pointer is not enough), which causes the content of argv and envp overwritten by the content of pal.

4.2

Between 4/9/2024 and 4/16/2024, I mainly spending time finishing my paper. After that, I continue to work on the rest of PA4. PA4.2 is mainly about paging mechanism, here are some points that worth mentioning:

You will need RISCV manual (privileged version) to understand the paging mechanism in SV32 and the usage of cpu.satp register. The content in ChatGPT is not always reliable.
I forgot to dereference as.area.end pointer, which causes that some content of pages are overlapped and results in hard-to-resolve bugs.
Not having enough testing makes it more difficult to resolve bugs in PA4.3.

I finished PA4.2 in 4/26/2024.

4.3

This holiday I made a promise to finish the whole PA4 before returning to school and I successfully achieved this. I first try to finish the preemptive process scheduling. However, I previously mentioned there are still some bugs in PA4.2 that hadn't been resolved. Of course, preemptive scheduling makes the system, finite state machine in essence, unpredictable and much harder to debug. Therefore, I temporarily gave up finishing this part and started to work on stack switching instead. Of course, I still came up with the same bugs as well. To exclude possible factors, I created a new branch in git to do controlled experiments for these factors. I finally found these bugs:

mm_brk does not verify whether the memory is virtual memory.
mm_brk is not fully aligned to the page.

A very useful debugging tip is that when you want to memset a range to raw memory, you'd better choose a special value. If the program crashes here later on, it is much easier to locate the bug, instead of being obsessed with a random address.

And I finally understand the reason why yzh says PA4.2 is the most difficult part. The answer is, there are really a bunch of details needed to care about.

Another bug is because of my carelessness when translating the pseudo C code into x86 assembly:

forgot to zero mscratch in am_asm_trap.

After working out stack switching, the switching function of foreground program is easy to implement. I went back to finish the preemptive part. First, the program never reaches IRQ_TIMER part, the reason is

I didn't assign cpu.pc to cpu.mtvec merely.

However the program still crashes after running for some time. And then I revised the exception handling procedure in PA3, and finally figured out

I didn't assign cpu.mepc to cpu.pc, either.

After fixing, the problem is finally solved. And the story of whole PA finally came to an end in 5/4/2024.

ysyx C stage testimonial（Updated on 2025/1/13）

As I mentioned in my blog, I started writing my first git log as a PA on April 20, 2023, during the second semester of my sophomore year. At that time, I realized that my university's professional courses weren't teaching me much, and I began to reflect on the differences between me and my non-major classmates. Meanwhile, my classmates gradually entered the lab and began their research training. I firmly believe that undergraduate studies are a time to explore your interests, improve your abilities, and broaden your horizons. There's no real need to immediately engage in trial and error at the forefront of a discipline.

Of course, I understand the motivations of those who enter the lab early—perhaps some are genuinely interested in research, but most are motivated by the desire to secure admission to graduate school or study abroad. This is an inevitable result of the deteriorating academic environment. Although my grades are in the top 5% of my university, I'm sometimes influenced by comments from my supervisors like, "Publish a C paper and you'll be ahead of me!" This prevents me from fully dedicating myself to writing every line of code. Despite this, despite the pressure of further studies, I continue to keep writing git logs.

The worst happened in the second semester of my junior year. One of my classmates, L, dragged me and another, W, into the OS Kernel Challenge. Although we ultimately won third place nationally, this was supposed to be a happy occasion. However, I was impressed by W's exceptional architectural skills (he built an OS that met the competition requirements from scratch, without any reference material). Although I had diligently cultivated the coding skills I'd developed since high school, after four or five years of training, I still couldn't even come close to him. This was a significant setback.

But it wasn't entirely without improvement. As an OSI player who retired in my sophomore year, I had barely written any Python code at the beginning of college. However, when forced to participate in an information security competition in the second semester of my junior year, I was able to write thousands of lines of Python code in a month and even wrote a corresponding paper in just seven days to "get the job done."

Although this paper was ultimately rejected due to poor writing, it at least proved that I possessed the skills of an "average" student, didn't it? And there were opportunities to resubmit later. Then, our cryptography professor went over the GMW protocol's execution process in class. Intrigued, I spent two or three days implementing it in code. The professor was impressed by my code, and I also felt her personality was excellent—so she became my graduate advisor.

So, I slowly figured out the answer to the question at the beginning of this article, "What's the difference between you and non-professional students?"—you are a person, not a tool. It's time for me to gradually transition from honing my skills to using my existing abilities to pursue my passions. I also realized I was better suited to research than engineering.

A year later, I picked up my code again—how could it be so terrible? But I'm a single-threaded person, and in pursuit of my passion, I might have to let go of some obsessions. Working towards a specific goal only makes me more anxious, and that only adds to the pile of shit.

So, maybe we'll meet again someday.