Encryption/Decryption Learning Notes

Compared to CSAPP, "Encryption and Decryption" leans more towards practical application, with rich but somewhat scattered content. The assembly code syntax differs from CSAPP, using Intel syntax.

Basic Knowledge

Introductions to topics such as APIs, Unicode, and Little-endian are omitted here.

(However, placing Win32 API and WOW64 in this section feels quite discouraging—oh well, I'll just refer to the documentation when needed in the future—)

Dynamic Analysis Techniques

OllyDbg
(Learned about hardware breakpoints, message breakpoints, conditional breakpoints, memory breakpoints, and the trace function)
x64dbg
~~MDebug~~
~~WinDbg~~

Static Analysis Techniques

PEiD/ExeinfoPE
~~ODDisasm, BeaEngine, Udis86, Capstone, AsmJit, Keystone~~
IDA is the GOAT!
(IDA offers a feature I wasn't aware of before—pressing F1 on a system API brings up its usage. However, this only supports .hlp files, so it's practically useless.)
WinHex/010editor

Reverse Engineering Techniques

32-bit Software Reverse Engineering Techniques

Startup Function

~~WinAPI centralized area. Skip if you don't understand.~~

Functions

Another chance to review stack calls for the Nth time.

Common calling conventions (VARARG indicates that the number of parameters is uncertain):

①: Only applicable when the stack balancer is the caller.

Data Structures

Local variables: Stored on the stack.

Global variables: Located in the .data section / cs:xxxx.

Arrays: Accessed via base-plus-index addressing. ~~To understand assembly code for arrays, I frantically reviewed the relevant chapters of CSAPP~~.

Virtual Functions - 1/15/2022

Reference to virtual functions: First, use a pointer (usually allocated by new or malloc) to point to the virtual function table (VTBL), which stores the addresses of all virtual functions. Then, use the virtual function table pointer (VPTR) to call the function.

Based on the virtual table, the number of virtual functions in the class and the code of the virtual functions can be restored.

Question: In the assembly code at address 0040101B, why is eax = *VTBL = **Add(), instead of eax = *VTBL = &Add()?

Control Statements — 1/17/2022

(Most of this has been covered in CS:APP; no particularly important points here.)

a & (-b) where b is a power of 2 is equivalent to $⌊ \frac{a}{b} ⌋ * b$

sbb A B instruction: A = A - B - CF

Loop Statements — 1/23/2022

As explained in CS:APP, the essence is a jump from a higher address to a lower address.

Corrections:

In both schemes, i < 5 should be changed to i <= 5.

The comment at address 0x40102E in the unoptimized code should read "from higher to lower address region."

Mathematical Operators — 1/23/2022

Addition and subtraction are accelerated using the lea instruction.

Multiplication is accelerated using shift instructions.

For division, when the divisor is known, a constant (similar to a modular inverse) is multiplied, and the high bits of the result are taken. Specifically, if the divisor is a power of two, a right shift is used directly.

If the result is negative, the value is incremented by 1 (likely related to rounding toward zero for negative numbers).

Text String——1/23/2022

The commonly used C string ends with '\0'. Other types include DOS strings (ending with $), PASCAL strings (with a single-byte ANSI character at the beginning indicating the length), and Delphi strings (with two-byte or four-byte length prefixes).

If an instruction like mov ecx, FFFFFFFF appears, it often indicates that the program is obtaining the length of a string. The corresponding assembly code is as follows.

Instruction Modification Techniques — 1/24/2022

Not much to say, so I'll just post a summary image directly.

64-bit Software Reverse Engineering — 1/24/2022

Registers and Loop Statements

There is only one calling convention, with the following rules (which differ from CSAPP again):
- The first four parameters are passed via registers, and the rest are pushed onto the stack.
- The order of register usage is: rcx, rdx, r8, r9 (xmm0~xmm3 for floating-point values).
- The order of stack parameters: from right to left.
- The stack reserves 32 bytes of space for these four parameters, even if only two are passed.
The rep instruction: Repeats the subsequent instruction while decrementing ecx, until ecx becomes zero after decrementing (thus, if ecx is 0, it will execute $2^{32}$ times).
The stos instruction: Stores the data in eax into the memory address pointed to by edi, then increments edi by 4 bytes. It is often used with the rep instruction to fill stack space with 0xcccccccc ("烫烫烫").
The movsb instruction: Copies data from the memory location pointed to by esi to the memory location pointed to by edi, then automatically increments both registers by 1 byte (due to the 'b' suffix). It is often used with the rep instruction to copy arrays, structs, etc.
For struct parameter passing: If the struct size is ≤ 8 bytes, it is passed directly via registers. If larger, the address of the struct is passed, and its contents are accessed via offsets.
Member function calls in classes have one additional parameter compared to regular functions: the this pointer.

What is __security_check_cookie?

By filling unused stack space with 0xcc and XORing this value with the rsp pointer, the result is __security_check_cookie. Since modern programs enable stack randomization protection, the cookie value cannot be predicted, making it an effective way to prevent stack overflow.

What does this statement mean? (P150)

mov eax, ds:(jpt_140001060 - 140000000h)[rcx+rax*4]

An operation like (a)[b+c*d] is similar to a(b,c,d) in AT&T syntax, meaning b + c*d + a. Here, it implements a jump table entry operation, preparing for branch jumps in a switch statement.

Besides jump tables, switch statements can also be optimized using decision trees.
The machine code for the 64-bit call instruction is FF 15 xx xx xx xx, where the last four bytes represent a relative offset, not a memory address.

Mathematical Operators

Various optimization techniques for division
- For divisors of the form $2^{n}$ , the formula is x >> n (when the dividend is positive) and (x + 2**n - 1) >> n (when the dividend is negative).
- For divisors of the form $- 2^{n}$ , it is the same as above, but the result must be negated.
- For divisors not of the form $2^{n}$ , there are two optimization methods. Here, we take 64-bit as an example.
  - The first is x * c >> 64 >> n (when the dividend is positive) and (x * c >> 64 >> n) + 1 (when the dividend is negative). Here, c is a positive number, and n may be 0.
  - The second is (x * c >> 64) + x >> n (when the dividend is positive) and ((x * c >> 64) + x >> n) + 1 (when the dividend is negative). Here, c is a negative number.
  - The divisor can be calculated using $o = \frac{2^{n + 64}}{c}$ , where $c$ is the magic_num.
A brief proof for the second formula (let $o$ be the divisor):
$\frac{x * c + x * 2^{64}}{2^{64 + n}} = \frac{x}{o}$ $o = \frac{x * 2^{64 + n}}{x * c + x * 2^{64}}$ $o = \frac{2^{64 + n}}{c + 2^{64}}$
Due to overflow, $o = \frac{2^{64 + n}}{c}$ , which reduces to the case of the first formula. The rest of the process is omitted.

Q: Why add x?

Because adding x is equivalent to adding $2^{32}$ to $c$ . Previously, $c$ was 0x92492493. After converting to signed, it becomes 0x92492493-0x100000000, and adding $2^{32}$ restores it to its original value. Without adding x, the result would be negative.
- For divisors not of the form $- 2^{n}$ , the formulas are essentially the same as above, except that in the first formula, c is negative; in the second formula, c is positive, and the "+" in the middle should be changed to "-". Here, $| o | = \frac{2^{n + 64}}{2^{64} - c}$ , where $c$ is the magic_num.
- For unsigned division, when the divisor is of the form $2^{n}$ , the formula is x >> n.
- For unsigned division, when the divisor is not of the form $2^{n}$ , there are two formulas:
  - The first is x * c >> 64 >> n, where $o = \frac{2^{64 + n}}{c}$ .
  - The second is (x - (x * c >> 64) >> n1) + (x * c >> 64) >> n2, where $o = \frac{2^{64 + n 1 + n 2}}{2^{64} + c}$ , and $c$ is the magic_num.
A brief proof for the second formula (let $o$ be the divisor):

~~I will never prove this in my life.~~
$\frac{\frac{x - \frac{x * c}{2^{64}}}{2^{n 1}} + \frac{x * c}{2^{64}}}{2^{n 2}} = \frac{x}{o}$ $\frac{x - \frac{x * c}{2^{64}} + 2^{n 1} * \frac{x * c}{2^{64}}}{2^{n 1 + n 2}} = \frac{x}{o}$ $\frac{1 - \frac{c}{2^{64}} + 2^{n 1} * \frac{c}{2^{64}}}{2^{n 1 + n 2}} = \frac{1}{o}$ $o = \frac{2^{64 + n 1 + n 2}}{2^{64} - c + c * 2^{n 1}} = \frac{2^{n 1 + n 2 + 64}}{c * (2^{n 1} - 1) + 2^{64}}$
Therefore, the formula given in the book, $o = \frac{2^{64 + n 1 + n 2}}{2^{64} + c}$ , is incorrect and only holds for $n 1 = 1$ . The link below also confirms the correctness of my formula.

https://zneak.github.io/fcd/2017/02/19/divisions.html
Some optimization techniques for modulo

Note: In C, the sign of the modulo result is the same as the sign of the dividend, and its value is the same as when both operands are positive. This differs from Python.

For example, calculating 5 % 3, (-5) % 3, 5 % (-3), and (-5) % (-3) in C yields 2 -2 2 -2, while in Python it yields 2 1 -1 -2.
- For divisors of the form $2^{n}$ , there are two optimization methods:
  - The first is x & ((1 << n) - 1) (when the dividend is positive) and ((x & ((1 << n) - 1)) - 1 | (~ ((1 << n) - 1))) + 1 (when the dividend is negative). Here, subtracting 1 and then adding 1 is to handle the special case where the remainder is 0.
  - When the dividend is negative, another optimization formula is ((x + ((1 << n) - 1)) & ((1 << n) - 1)) - ((1 << n) - 1). (According to the editor's tests, compilers mostly use this formula for optimization.)
Explanation of the rationality of the two formulas:

Formula 1:

C handles negative modulo in the opposite way to positive modulo. It simply sets the first $64 - n$ bits to 1.

Therefore, | (~ ((1 << n) - 1)) does exactly that.

Actually, the preceding & (~ ((1 << n) - 1)) can be omitted, but to reduce condition checks, it is unified with the positive case.

When the last $n$ bits of $x$ are $0$ , if treated as negative and the first $64 - n$ bits are all set to $1$ , the result would be $- 2^{n}$ .

By subtracting $1$ , the last $n$ bits of $x$ are set to $1$ , and after setting the first $64 - n$ bits to $1$ and then adding $1$ , the result becomes $0$ .

Generally, if you subtract $a$ at the beginning, the final modulo range becomes $(- 4 + a)$ ~ $(- 1 + a)$ .

Formula 2:

Take $n = 2$ as an example:

Observation shows that when x < 0, the element in column $c$ of the second row is the value of the element in column $c + 3$ of the third row minus $3$ . Thus, by generalization, the above formula is derived.
- When the divisor is not of the form $2^{n}$ , the optimization method is x - x / c * c, where $c$ is the divisor.

Virtual Functions

Virtualizing the destructor results in the compiler generating a regular destructor and a destructor with delete in the virtual table.
To prevent virtual destructors from freeing memory multiple times, VC++ adds a parameter to the destructor (if the parameter is 1, it frees the memory), while GCC places two destructor addresses in the virtual table.
Memory layout of a single object:
Constructor call order (can serve as a basis for reconstructing class inheritance hierarchies):
1. Call the virtual base class constructor (multiple calls in inheritance order).
2. Call the ordinary base class constructor (multiple calls in inheritance order).
3. Call the constructors of object members (multiple calls in definition order).
4. Call the derived class constructor.
The destructor call order is the reverse.
Virtual table population process for derived classes (can serve as a basis for reconstructing class inheritance hierarchies):
1. Copy the base class's virtual table.
2. If a derived class virtual function overrides a base class virtual function, replace the corresponding entry with the derived class's virtual function address.
3. If the derived class has new virtual functions, append them to the end of the virtual table.
Memory layout of single inheritance objects:
A characteristic of multiple inheritance is that the constructor performs two operations to initialize the virtual table.
Memory layout of multiple inheritance objects:
To prevent memory redundancy of the base class in diamond inheritance, virtual inheritance (virtual public: <class name>) is used. The specific implementation involves passing an additional parameter in the constructor to indicate whether to call the virtual base class constructor.
To facilitate locating the position of the virtual base class in the object's memory, an 8-byte virtual base class offset table (located in the global data area) is created, with the last 4 bytes indicating the offset of the virtual base class in the current virtual base class offset table.
Another way to determine virtual inheritance is whether the constructor initializes the virtual base class offset table.
Memory layout of diamond inheritance objects ~~(gradually losing sanity)~~:
In IDA, vftable represents the virtual table, while vbtable represents the virtual base class offset table. IDA can even automatically indicate which class a vftable/vbtable address points to, which is quite intelligent.
The only difference between the virtual table of an abstract base class and single inheritance is that the base class's virtual table code is _purecall, which displays an error message and exits the program. If a class has a _purecall virtual table entry, it can be suspected to be an abstract class.

Demo Version Protection Techniques

Serial Number Protection Methods - 2/4/2022

APIs for reading registration codes: GetWindowText, GetDlgItemText, GetDlgItemInt, etc.
APIs for displaying registration code correctness: MessageBox, MessageBoxEx, ShowWindow, CreateDialogParam, DialogBoxParam, etc.
For protection methods that use plaintext comparison of registration codes, you can open the memory window in OllyDbg and press Ctrl+B to search for the entered serial number (to locate the memory address of the input). In most cases, the actual serial number is located within ±90h bytes around this address.
The Asm2Clipboard plugin in OllyDbg can be used to extract disassembly and embed it into C code. When converting, pay attention to stack balance, data formats, assembly syntax, and string references.

Warning Window - 2/7/2022

Window ID Extraction: Use exescope or Resource Hacker.
The prototype of DialogBoxParam is as follows:

int DialogBoxParam(
    HINSTANCE hInstance,
    LPCTSTR lpTemplateName,
    HWND hWndParent,
    DLGPROC lpDialogFunc,
    LPARAM dwInitParam
);

Two methods to remove the warning window:
- Use assembly to skip the warning window.
- Replace the parameters of the warning window with those of a normal window.

Time Limit – 2/8/2022

Common timer functions: SetTimer(), KillTimer(), timeSetEvent(), GetTickCount(), timeGetTime().
Common functions for retrieving time: GetSystemTime(), GetLocalTime(), GetFileTime().
Two methods to bypass time restrictions:
- Skip time-related functions.
- Skip functions that check for timeout and trigger exit.
Speed Gear can be used to assist in debugging (not yet successful).

Related functions:
- The EnableMenuItem() function is defined as BOOL EnableMenuItem(HMENU hMenu, UINT uIDEnableItem, UINT uEnable). The uEnable parameter includes options such as MF_ENABLED (0h), MF_GRAYED (1h), MF_DISABLED (2h), MF_COMMAND, and MF_BYPOSITION.
- The EnableWindow() function is defined as BOOL EnableWindow(HWND hWnd, BOOL bEnable). The function returns a non-zero value for success, and 0 for failure.
Method to remove restrictions (only applicable when the full version and trial version files are identical): Modify the Enable parameter passed during the push calls of these two functions.

KeyFile Protection——2/10/2022

Related Functions:
LODS instruction: lods byte ptr [esi], moves one byte of data pointed to by [esi] into eax, while incrementing esi.
Analysis Approach:
1. Use Process Monitor to monitor the program's file operations to identify the KeyFile's filename.
2. Edit the KeyFile using a hex editor.
3. Set a breakpoint on CreateFile in the debugger to check the pointer to the opened filename and note the returned handle.
4. Set a breakpoint on the ReadFile function to analyze the file handle passed to ReadFile and the buffer address. The file handle is usually the same as the one returned in step 3. Set a memory breakpoint on the bytes stored in the buffer to monitor the content read from the KeyFile.

Network Verification - 2/12/2022

Related Functions:

send() function, extended by Microsoft as WSASend()

int send(
    SOCKET s,
    const char FAR *buf,
    int len,
    int flags
);

recv() function, extended by Microsoft as WSARecv()

int recv(
	SOCKET s,
	char FAR *buf,
	int len,
	int flags
);

Analysis Approach:
1. Analyze the sent and received data packets. (Key step)
2. Two methods:
  - Write a server to receive and send data. If the client uses a domain name to log in, modify the hosts file; if it connects directly via IP, use inet_addr or set a breakpoint at connect to redirect the IP to the local machine. Alternatively, proxy software can achieve this.
  - Directly modify the client program. First, paste the correct received data packet to a blank address, then skip the send() and recv() functions and replace them with functions that handle the correct data packets. Finally, bypass dialogs such as "Connection Failed."

Question: The getasm.py script does not seem to be compatible with IDA 7.6. How can the following code be reimplemented?

#coding=utf-8
##"Encryption and Decryption" Fourth Edition
##code by DarkNess0ut

import os
import sys

def Getasm(ea_from, ea_to, range1, range2):
    fp = open("code.txt","w")
    ea = ea_from
    while ea < ea_to:
        cmd = GetMnem(ea)
        if cmd == "mov" or cmd == "lea":
            opcode = Dword(NextNotTail(ea)-4)
            if opcode < 0: #opcode < 0, handles instructions like mov edx, [ebp-350]; otherwise, handles mov edx, [ebp+350]
                opcode = (~opcode + 1)
            Message("-> %08X %08X\n" % (ea, opcode))

            if range1 <= opcode <= range2:
                delta = opcode - range1
                MakeComm(ea, "// +0x%04X" % delta) # Add comment to IDA
                fp.write("%08X %s\n" % (ea, GetDisasm(ea)))
        ea = NextNotTail(ea)
    fp.close()
    Message("OK!")
Getasm(0x401000,0x40F951,0x41AE68,0x0041AEC1);

CD Detection - 2/11/2022

Related Functions:

GetDriveType(), retrieves the type of a disk drive

UINT GetDriveType(
    LPCTSTR lpRootPathName
);
/*
Return values:
0: Drive type cannot be determined.
1: Root path does not exist.
2: Removable storage.
3: Fixed drive (hard disk).
4: Remote drive (network).
5: CD-ROM drive.
6: RAM disk.
*/

GetLogicalDrives(), retrieves logical drive letters, no parameters

/*
Return values:
Returns 0 on failure; otherwise, returns a bitmask representing currently available drives, e.g.,
bit 0		drive A
bit 1		drive B
bit 2		drive C
......
*/

GetLogicalDriveStrings(), retrieves root drive paths of logical drives

DWORD GetLogicalDriveStrings(
    DWORD nBufferLength,
    LPTSTR lpBuffer
);
/*
Return values:
Returns 0 on failure
Returns the actual number of characters on success
*/

GetFileAttributes(), determines the attributes of a specified file

DWORD GetFileAttributes(
    LPCTSTR lpFileName
);

Analysis Methods:
- For simpler CD detection (first obtain all drive lists, then check the type of each drive; if it is a CD-ROM drive, use CreateFile() or FindFirstFile() to check for file existence, attributes, size, content, etc.), simply set breakpoints using the above functions, locate where the CD drive is checked, and modify the conditional instructions.
- For enhanced types (where critical data for the program is stored on the CD), multiple copies can be made using burning tools, or virtual drive programs can be used to simulate the original CD (among which Daemon Tools~~ is not only free~~ but can also simulate some encrypted CDs).

Running Only One Instance - 2/11/2022

Implementation Methods:
1. Window Lookup Method: If a window with the same class name and title is found, exit the program. Implemented using FindWindowA() and GetWindowText().
```
HWND FindWindowA(
    LPCTSTR lpClassName,
    LPCTSTR lpWindowName
);
// Returns 0 if no matching window is found.
```
2. Using Mutex Objects: Generally implemented with CreateMutexA(), which creates a named or unnamed mutex object ~~(what is this?)~~.
```
HANDLE CreateMutexA(
    LPSECURITY_ATTRIBUTES lpMutexAttributes, // Security attributes
    BOOL bInitialOwner, // Initial ownership of the mutex
    LPCTSTR lpName // Pointer to the mutex name
);
// If the function succeeds, it returns a handle to the mutex object.
```
3. Using Shared Sections (Section). This section has read, write, and shared protection attributes, allowing multiple instances to share the same memory block. Place a variable as a counter in this section, and all instances of the application can share this variable, thereby determining whether another instance is already running.
Bypass Methods:
- Modify the application's window title.
- Alter the return values of functions like FindWindow() (or modify the conditional instructions).

Common Breakpoint Setting Techniques——2/11/2022

Mastering Win32 programming techniques is still very important!

Encryption Algorithms

The notes in this section primarily focus on algorithm identification, without delving into the specific processes of the algorithms (except for public key algorithms, as they lack distinctive constants), assembly analysis, and crack operations.

Tools such as IDA's FindCrypt or PEiD's Krypto ANALyzer can be used to assist in algorithm analysis.

One-Way Hash Algorithm — 2/27/2022

MD5

Important Constants:

Initial message digest: 67452301h, efcdab89h, 98badcfeh, 10325476h.
32-bit values corresponding to floor(2**32 * abs(sin(i))), such as d76aa478h.

Possible Variants:

Modify the initial four constants.
Change the padding method of the original string.
Alter the processing steps of the hash transformation.

SHA

SHA-1 constants: 5a827999h, 6ed9eba1h, 8f1bbcdch, ca62c1d6h.

SHA-1 160-bit initial message digest: 67452301h, efcdab89h, 98badcfeh, 10325476h, c3d2e1f0h.

Initial message digests for SHA-256, SHA-384, and SHA-512:

SM3

Publicly available national cryptographic algorithm, process overview: https://zhuanlan.zhihu.com/p/129692191

Possible code implementation: https://blog.csdn.net/a344288106/article/details/80094878

Constants? 79CC4519h, 7A879D8Ah
Initialize message digest?
7380166Fh, 4914B2B9h, 172442D7h, DA8A0600h, A96F30BCh, 163138AAh, E38DEE4Dh, B0FB0E4Eh

Symmetric Encryption Algorithm – 2/28/2022

RC4

Decryption script sourced online:

import base64
def rc4_main(key = "init_key", message = "init_message"):
    print("RC4 decryption main function called successfully")
    print('\n')
    s_box = rc4_init_sbox(key)
    crypt = rc4_excrypt(message, s_box)
    return crypt
def rc4_init_sbox(key):
    s_box = list(range(256))
    print("Original s-box: %s" % s_box)
    print('\n')
    j = 0
    for i in range(256):
        j = (j + s_box[i] + ord(key[i % len(key)])) % 256
        s_box[i], s_box[j] = s_box[j], s_box[i]
    print("Scrambled s-box: %s"% s_box)
    print('\n')
    return s_box
def rc4_excrypt(plain, box):
    print("Decryption program called successfully.")
    print('\n')
    plain = base64.b64decode(plain.encode('utf-8'))
    plain = bytes.decode(plain)
    res = []
    i = j = 0
    for s in plain:
        i = (i + 1) % 256
        j = (j + box[i]) % 256
        box[i], box[j] = box[j], box[i]
        t = (box[i] + box[j]) % 256
        k = box[t]
        res.append(chr(ord(s) ^ k))
    print("res is used to decrypt the string, decrypted result: %res" %res)
    print('\n')
    cipher = "".join(res)
    print("Decrypted string: %s" %cipher)
    print('\n')
    print("Decrypted output (without any encoding):")
    print('\n')
    return cipher
a=[0xc6,0x21,0xca,0xbf,0x51,0x43,0x37,0x31,0x75,0xe4,0x8e,0xc0,0x54,0x6f,0x8f,0xee,0xf8,0x5a,0xa2,0xc1,0xeb,0xa5,0x34,0x6d,0x71,0x55,0x8,0x7,0xb2,0xa8,0x2f,0xf4,0x51,0x8e,0xc,0xcc,0x33,0x53,0x31,0x0,0x40,0xd6,0xca,0xec,0xd4]
s=""
for i in a:
    s+=chr(i)
s=str(base64.b64encode(s.encode('utf-8')), 'utf-8')
rc4_main("Nu1Lctf233", s)

TEA

The constant 0x9e3779b9 is derived from the 32-bit golden ratio $\frac{\sqrt{5} - 1}{2}$

(Note: XTEA/XXTEA also use this constant)

Decryption script:

#include <stdio.h>
#include <stdint.h>
#define DELTA 0x9e3779b9
#define MX (((z>>5^y<<2) + (y>>3^z<<4)) ^ ((sum^y) + (key[(p&3)^e] ^ z)))

void btea (uint32_t* v,int n, uint32_t* k) { // however the 'n' is useless
	uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i;  /* set up */
	uint32_t delta=0x9e3779b9;                     /* a key schedule constant */
	uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
	for (i=0; i<32; i++) {                         /* basic cycle start */
		v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
		v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
		sum -= delta;
	}                                              /* end cycle */
	v[0]=v0; v[1]=v1;
}

int main()
{
	uint32_t v[2]= {0x3e8947cb,0xcc944639};
	uint32_t w[2]= {0x31358388,0x3b0b6893};
	uint32_t x[2]= {0xda627361,0x3b2e6427};

	uint32_t const k[4]= {17477,16708,16965,17734};
	int n = 2; //The absolute value of n indicates the length of v, positive for encryption, negative for decryption
	// v is the data to be encrypted/decrypted, consisting of two 32-bit unsigned integers
	// k is the encryption/decryption key, consisting of four 32-bit unsigned integers, i.e., a 128-bit key
	btea(v, -n, k);
	printf("%x %x ",v[0],v[1]);
	btea(w, -n, k);
	printf("%x %x ",w[0],w[1]);
	btea(x, -n, k);
	printf("%x %x",x[0],x[1]);
	return 0;
}

IDEA

The 52 subkeys are the inverses of the encryption key pairs for 16-bit addition and multiplication modulo ( $2^{16} + 1$ ).

The subkeys should be used in the reverse order of the encryption key.

The decryption code is omitted. Please search for bouncycastle on your own.

BlowFish

Based on the Feistel network.

P-array (derived from the fractional part of Pi):

243f6a88h, 85a308d3h, 13198a2eh, 03707344h

Decryption code omitted.

AES (Rijndael)

Decryption modes include:

ECB (Electronic Code Book) mode
CBC (Cipher Block Chaining) mode
CTR (Counter) mode
CFB (Cipher Feedback Mode) mode
OFB (Output Feedback) mode

Knowing these isn't very useful—you'll still have to try them one by one when the time comes.

S-Box:

~~Online decryption websites can only decrypt their own, not others'.~~

Decryption website: http://tool.chacuo.net/cryptaes

SM4

https://zhuanlan.zhihu.com/p/363900323

S-box:

Values of the system parameters $F K_{i}$ :

Specific values of the 32 fixed parameters $C K_{i}$ :

Refer to the link in the SM2 section for the decryption tool.

Public Key Encryption Algorithm - 3/1/2022

RSA

Generation of Public and Private Keys:
1. First, we arbitrarily choose two prime numbers $p$ and $q$ . Here, we take $p = 33333331$ and $q = 998244353$ , and compute $N = p q = 33274809437429843$ .
2. Using Euler's totient function $φ (x)$ (the number of positive integers less than or equal to $x$ that are coprime to $x$ ), we find $r = φ (N) = φ (p) φ (q) = (p - 1) (q - 1) = 33274808405852160$ .
3. Select an integer $e$ smaller than $r$ such that $e$ is coprime to $r$ . Then, compute the modular inverse $d$ of $e$ modulo $r$ , i.e., $e d \equiv 1 (\mod r)$ . Here, we take $e = 65537$ and obtain $d = 17217187752050689$ .
4. Destroy $p$ and $q$ .
Thus, we obtain the public key $(N, e)$ and the private key $(N, d)$ .
Encrypting Information:

Suppose B wants to send a message $m$ to A, and B knows the $N$ and $e$ generated by A. B converts $m$ into a positive integer $n$ smaller than $N$ using a pre-agreed format (such as Unicode, described below). Then, B encrypts $n$ into $c$ using the following formula:
$c \equiv n^{e} (\mod N)$
$c$ can be computed efficiently using fast exponentiation.
Decrypting Information:

After receiving the message from B, A can use their private key $d$ to decode it. A recovers $n$ from $c$ using the following formula:
$n \equiv c^{d} (\mod N)$

For information on attacks against RSA, which is a specialized topic in cryptography, you can refer to this article.

ElGamal

Key Pair Generation:
1. Select a large prime $p$ , a random number $g$ , and a random number $x$ such that $x \leq p - 2$ and $g < p$ .
2. Compute $y \equiv g^{x} (\mod p)$ .
3. The public key is $(y, g, p)$ , and the private key is $x$ .
Encryption and Decryption:
1. Select a random number $k$ such that $k \leq p - 2$ and $gcd (k, p - 1) = 1$ .
2. Compute $a \equiv g^{k} (\mod p)$ .
3. Compute $b \equiv y^{k} M (\mod p)$ , where $(a, b)$ is the ciphertext.
4. For decryption, compute $M \equiv b / a^{x} (\mod p)$ .
Signature:
1. Select a random number $k$ such that $k \leq p - 2$ and $gcd (k, p - 1) = 1$ .
2. Compute $a \equiv g^{k} (\mod p)$ .
3. Let the plaintext be $M$ , and find a solution $b$ that satisfies $x a + k b \equiv M (\mod p - 1)$ . It can be proven that such a $b$ is unique. The signature is $(a, b)$ .
4. To verify the signature, ensure that $y^{a} a^{b} \equiv g^{M} (\mod p)$ and $a < p$ .

Attacks on Discrete Logarithms: BSGS, Pollard-Rho, Index-Calculus Algorithm, Pohlig-Hellman Algorithm, etc.

If the same $k$ and private key $x$ are used for encrypting different plaintexts, there are specific attack methods.

DSA—3/2/2022

Used for signing, not for encryption/decryption.

Key pair generation:
1. $p$ is a prime number of $L$ bits. $64 ∣ L$ and $512 \leq L \leq 1024$ , $2^{L - 1} < p < 2^{L}$ .
2. $q$ is a prime factor of $p - 1$ , $2^{159} < q < 2^{160}$ .
3. $g \equiv h^{(p - 1) / q} (\mod p)$ , with $h < p - 1$ , $g > 1$ .
4. $x$ is the private key, $0 < x < q$ .
5. $y \equiv g^{x} (\mod p)$ . $p$ , $q$ , $g$ , and $y$ are public keys.
6. $k$ is a random number, $0 < k < q$ , to be discarded after use.
Signature generation:
1. Input plaintext $M$ , public keys $p$ , $g$ , $q$ , private key $x$ , and random number $k$ .
2. r = g**k % p % q
3. s = inv(k) * (SHA-1(M) + x*r) % q, where inv(k) is the modular multiplicative inverse of $k$ modulo $q$ .
4. The signature is $(r, s)$ .
Signature verification:
1. Input plaintext $M^{'}$ , public keys $p$ , $g$ , $q$ , $y$ , and signature $r^{'}$ , $s^{'}$ .
2. First, ensure $r^{'} < q$ and $s^{'} < q$ .
3. w = inv(s') % q
4. u1 = ((SHA-1(M')) * w) % q
5. u2 = (r' * w) % q
6. v = (g**u1 * y**u2) % p % q
7. If $v = r^{'}$ , the signature verification is successful.

$x$ and $y$ need to be updated periodically for the same reasons as in ElGamal.

ECC with GF(p)——March 3, 2022

Due to the high mathematical knowledge requirements, it is mainly used in Crypto challenges and is less likely to be encountered in Reverse Engineering. Therefore, the specific principles are not provided here.

Algorithm principles from Wikipedia: https://en.wikipedia.org/wiki/Elliptic-curve_cryptography

Algorithm principles + Python implementation from Zhihu: https://zhuanlan.zhihu.com/p/101907402

Potentially useful ECC template:

import collections
import random

EllipticCurve = collections.namedtuple('EllipticCurve', 'name p a b g n h')

curve = EllipticCurve(
   'secp256k1',
   # Field characteristic.
   p=int(input('p=')),
   # Curve coefficients.
   a=int(input('a=')),
   b=int(input('b=')),
   # Base point.
   g=(int(input('Gx=')),
      int(input('Gy='))),
   # Subgroup order.
   n=int(input('k=')),
   # Subgroup cofactor.
   h=1,
)
# Modular arithmetic ##########################################################

def inverse_mod(k, p):
   """Returns the inverse of k modulo p.
  This function returns the only integer x such that (x * k) % p == 1.
  k must be non-zero and p must be a prime.
  """
   if k == 0:
       raise ZeroDivisionError('division by zero')
   if k < 0:
       # k ** -1 = p - (-k) ** -1 (mod p)
       return p - inverse_mod(-k, p)
   # Extended Euclidean algorithm.
   s, old_s = 0, 1
   t, old_t = 1, 0
   r, old_r = p, k

   while r != 0:
       quotient = old_r // r
       old_r, r = r, old_r - quotient * r
       old_s, s = s, old_s - quotient * s
       old_t, t = t, old_t - quotient * t
   gcd, x, y = old_r, old_s, old_t

   assert gcd == 1
   assert (k * x) % p == 1
   return x % p

# Functions that work on curve points #########################################

def is_on_curve(point):
   """Returns True if the given point lies on the elliptic curve."""
   if point is None:
       # None represents the point at infinity.
       return True
   x, y = point
   return (y * y - x * x * x - curve.a * x - curve.b) % curve.p == 0

def point_neg(point):
   """Returns -point."""
   assert is_on_curve(point)
   if point is None:
       # -0 = 0
       return None
   x, y = point
   result = (x, -y % curve.p)
   assert is_on_curve(result)
   return result

def point_add(point1, point2):
   """Returns the result of point1 + point2 according to the group law."""
   assert is_on_curve(point1)
   assert is_on_curve(point2)
   if point1 is None:
       # 0 + point2 = point2
       return point2
   if point2 is None:
       # point1 + 0 = point1
       return point1
   x1, y1 = point1
   x2, y2 = point2

   if x1 == x2 and y1 != y2:
       # point1 + (-point1) = 0
       return None
   if x1 == x2:
       # This is the case point1 == point2.
       m = (3 * x1 * x1 + curve.a) * inverse_mod(2 * y1, curve.p)
   else:
       # This is the case point1 != point2.
       m = (y1 - y2) * inverse_mod(x1 - x2, curve.p)

   x3 = m * m - x1 - x2
   y3 = y1 + m * (x3 - x1)
   result = (x3 % curve.p, -y3 % curve.p)
   assert is_on_curve(result)
   return result

def scalar_mult(k, point):
   """Returns k * point computed using the double and point_add algorithm."""
   assert is_on_curve(point)
   if k < 0:
       # k * point = -k * (-point)
       return scalar_mult(-k, point_neg(point))
   result = None
   addend = point
   while k:
       if k & 1:
           # Add.
           result = point_add(result, addend)
       # Double.
       addend = point_add(addend, addend)
       k >>= 1
   assert is_on_curve(result)
   return result

# Keypair generation and ECDHE ################################################
def make_keypair():
   """Generates a random private-public key pair."""
   private_key = curve.n
   public_key = scalar_mult(private_key, curve.g)
   return private_key, public_key

private_key, public_key = make_keypair()
print("private key:", hex(private_key))
print("public key: (0x{:x}, 0x{:x})".format(*public_key))

SM2

A national cryptographic algorithm based on ECC.

SM2~SM4 encryption and decryption tool: https://github.com/ASTARCHEN/snowland-smx-python

Other Algorithms — March 1, 2022

CRC32

Can only be used for file verification, not for encryption.

The key lies in recognizing the initialization-generated crctab table.

The algorithm is just a few lines of code:

#include <stdio.h>
int crctab[256]; 
void gentable() {
    for(int i = 0; i < len; i++) {
        int crc = i;
        for(int j = 0; j < 8; j++) {
            if(crc & 1)
                crc = (crc >> 1) ^ 0xedb88320; // or 04c11db7h
            else
                crc >>= 1;
        }
        crctab[i] = crc;
    }
}
int main()
{
    gentable();
    int dwCRC = 0xffffffff;
    for(int i = 0; i < Len; i++) {
        dwCRC = crctab[(dwCRC ^ Data[i]) & 0xff] ^ (dwCRC >> 8); // Data is the bytevalue of your file. 
    }
    dwCRC = ~dwCRC;
    return 0;
}

Base64

Since the encoding table might be changed during competitions, or even some logic of the algorithm might be modified, it is necessary to understand its implementation process.

Because $3 \times 8 = 4 \times 6$ and $64 = 2^{6}$ , the core idea of Base64 is to map three bytes of data to four bytes using a code table.

The mapping method is straightforward: list the 24 bits of the three bytes as a binary string (in big-endian order), then divide it into four groups of 6 bits each.

Each 6 bits can only represent values from 0 to 63, which correspond to the 64 characters in the Base64 table (array). These bits are replaced by their corresponding characters.

To ensure the encoded string length is a multiple of 4, if there are fewer than 6 bits to fill (note that this is different from having a value of 0), they are replaced with = as padding.

An example is shown in the figure below:

Core code:

for(i=0,j=0;i<len-2;j+=3,i+=4)  
{  
    res[i]=base64_table[str[j]>>2]; // Extract the first 6 bits of the first byte and find the corresponding character  
    res[i+1]=base64_table[(str[j]&0x3)<<4 | (str[j+1]>>4)]; // Combine the last 2 bits of the first byte with the first 4 bits of the second byte and find the corresponding character  
    res[i+2]=base64_table[(str[j+1]&0xf)<<2 | (str[j+2]>>6)]; // Combine the last 4 bits of the second byte with the first 2 bits of the third byte and find the corresponding character  
    res[i+3]=base64_table[str[j+2]&0x3f]; // Extract the last 6 bits of the third byte and find the corresponding character  
}

Common Encryption Library Interfaces and Their Identification – 3/3/2022

Miracl, FGInt, Crypto++, OpenSSL, and more.

Application of Encryption Algorithms in Software Protection - 3/3/2022

~~Many software products demonstrate that security and user experience are often contradictory.~~

Do not rely on self-designed algorithms.
Use mature and highly secure cryptographic algorithms whenever possible.
Regularly update encryption keys.
Update algorithms or security mechanisms periodically, if cost permits.
Strictly follow standard-recommended security parameters and use standardized security algorithms or protocols.
Examine self-designed security mechanisms from an attacker's perspective.
Remove information prompts useful to attackers when using open-source cryptographic algorithm libraries.
Stay updated with the latest advancements in cryptographic algorithms.

Windows Kernel Fundamentals

Kernel Theory Fundamentals - 1/4/2023

Virtual memory of user-mode programs is isolated from each other, while kernel-mode programs share a common virtual memory space. Hence, if a kernel driver crashes, it results in a blue screen.
User-mode programs cannot access kernel-mode memory, but the reverse is possible.

User-mode programs operate at privilege level R3, while kernel drivers run at R0 (the highest level).
In Windows x64, the virtual memory range used by user-mode programs is from 0x000'00000000 to 0x7FFF'FFFFFFFF. Memory beyond 0xffff800000000000 is reserved for kernel-mode.
The Windows driver framework is divided into NT drivers, WDM drivers, and KMDF drivers.
Each driver object creates one or more device objects, and each device object contains a pointer to the next device object, forming a device chain.
Windows organizes devices in a tree structure known as the device tree. Nodes in the device tree are called "device nodes," and the root node is referred to as the "root device node." Typically, the root device node is depicted at the bottom of the device tree.
Device objects in the device stack are connected via Filter Device Objects (Filter DO), Function Device Objects (FDO), and Physical Device Objects (PDO). The first device object is at the bottom of the device stack, and the last created device object is at the top.
Communication between R3 and R0 occurs through IRPs (similar to packets in network communication). IRPs are passed down the device stack.
IRQL levels are defined as follows (higher values indicate higher priority):

#if defined(_AMD64_) 
//
// Interrupt Request Level definitions
//

#define PASSIVE_LEVEL 0                 // Passive release level
#define LOW_LEVEL 0                     // Lowest interrupt level
#define APC_LEVEL 1                     // APC interrupt level
#define DISPATCH_LEVEL 2                // Dispatcher level
#define CMCI_LEVEL 5                    // CMCI handler level

#define CLOCK_LEVEL 13                  // Interval clock level
#define IPI_LEVEL 14                    // Interprocessor interrupt level
#define DRS_LEVEL 14                    // Deferred Recovery Service level
#define POWER_LEVEL 14                  // Power failure level
#define PROFILE_LEVEL 15                // timer used for profiling.
#define HIGH_LEVEL 15                   // Highest interrupt level

#endif

One blue screen error code is irql_not_less_or_equal

The IRQL_NOT_LESS_OR_EQUAL bug check has a value of 0x0000000A. This bug check indicates that Microsoft Windows or a kernel-mode driver accessed paged memory at an invalid address while at a raised interrupt request level (IRQL). The cause is typically a bad pointer or a pageability problem.

Core Important Data Structures — 1/5/2023

Kernel Objects

Common types include Dispatcher objects, I/O objects, process objects, and thread objects.

SSDT

https://www.ired.team/miscellaneous-reversing-forensics/windows-kernel-internals/glimpse-into-ssdt-in-windows-x64-kernel#finding-address-of-all-ssdt-routines

https://m0uk4.gitbook.io/notebooks/mouka/windowsinternal/ssdt-hook

System Service Dispatch Table or SSDT, simply is an array of addresses to kernel routines for 32-bit operating systems or an array of relative offsets to the same routines for 64-bit operating systems.

SSDT is the first member of the Service Descriptor Table kernel memory structure as shown below:

typedef struct tagSERVICE_DESCRIPTOR_TABLE {
    SYSTEM_SERVICE_TABLE nt; //effectively a pointer to Service Dispatch Table (SSDT) itself
    SYSTEM_SERVICE_TABLE win32k;
    SYSTEM_SERVICE_TABLE sst3; //pointer to a memory address that contains how many routines are defined in the table
    SYSTEM_SERVICE_TABLE sst4;
} SERVICE_DESCRIPTOR_TABLE;

In x64, the relation between SSDT and its function address is shown below:

FuncAddr = ([KeServiceDescriptortable+index*4]>>4 + KeServiceDescriptortable)

SSDT lookup：

.foreach /ps 1 /pS 1 ( offset {dd /c 1 nt!KiServiceTable L poi(nt!KeServiceDescriptorTable+10)}){ r $t0 = ( offset >>> 4) + nt!KiServiceTable; .printf "%p - %y\n", $t0, $t0 }

SSDT(shadow) struct：

struct SSDTStruct
{
    LONG* pServiceTable;
    PVOID pCounterTable;
#ifdef _WIN64
    ULONGLONG NumberOfServices;
#else
    ULONG NumberOfServices;
#endif
    PCHAR pArgumentTable;
};

Function Index to real function address:

readAddress = (ULONG_PTR)(ntTable[FunctionIndex] >> 4) + SSDT(Shadow)BaseAddress;

SSDT(shadow) lookup：

In x64 there is no symbols.

0: kd> !process 0 0 mspaint.exe
PROCESS ffff850e48ee1080
    SessionId: 1  Cid: 0adc    Peb: 219f280000  ParentCid: 12c8
    DirBase: 28b00002  ObjectTable: ffffe5088ad08e80  HandleCount: 296.
    Image: mspaint.exe

0: kd> .process /p ffff850e48ee1080
Implicit process is now ffff850e`48ee1080
.cache forcedecodeuser done
    
0: kd> dps nt!KeServiceDescriptorTableShadow
fffff806`451da980  fffff806`45095570 nt!KiServiceTable       # SSDT base address
fffff806`451da988  00000000`00000000
fffff806`451da990  00000000`000001cf
fffff806`451da998  fffff806`45095cb0 nt!KiArgumentTable
fffff806`451da9a0  fffff528`64b6b000 win32k!W32pServiceTable # SSDT Shadow base address
fffff806`451da9a8  00000000`00000000
fffff806`451da9b0  00000000`000004da
fffff806`451da9b8  fffff528`64b6c84c win32k!W32pArgumentTable
fffff806`451da9c0  00000000`00111311
fffff806`451da9c8  00000000`00000000
fffff806`451da9d0  ffffffff`80000010
fffff806`451da9d8  00000000`00000000
fffff806`451da9e0  00000000`00000000
fffff806`451da9e8  00000000`00000000
fffff806`451da9f0  00000000`00000000
fffff806`451da9f8  00000000`00000000

0: kd> dd /c 1 win32k!W32pServiceTable l10
fffff528`64b6b000  ff972820                                  # GDI Function offset
fffff528`64b6b004  ff972940
fffff528`64b6b008  ff972a60
fffff528`64b6b00c  ff972b80
fffff528`64b6b010  ff972ca2
fffff528`64b6b014  ff972dc0
fffff528`64b6b018  ff972ee0
fffff528`64b6b01c  ff973000
fffff528`64b6b020  ff973120
fffff528`64b6b024  ff973240
fffff528`64b6b028  ff973363
fffff528`64b6b02c  ff973487
fffff528`64b6b030  ff9735a0
fffff528`64b6b034  ff9736c0
fffff528`64b6b038  ff9737e0
fffff528`64b6b03c  ff973900

TEB

TEB (Thread Environment Block) stores frequently used thread-related data in the system. It resides in user address space at a lower address than the PEB. Each thread in a process has its own TEB. All TEBs of a process are stored in a stack-like manner in linear memory starting at 0x7FFDE000, with each full TEB occupying 4KB. However, this memory region expands downward. In user mode, the current thread's TEB is located in a separate 4KB segment, accessible via the CPU's FS register, typically stored at FS:[0]. In user mode, the WinDbg command $thread can be used to obtain the TEB address.

FS:[000] Points to the SEH chain pointer
FS:[004] Thread stack top
FS:[008] Thread stack bottom
FS:[00C] SubSystemTib
FS:[010] FiberData
FS:[014] ArbitraryUserPointer
FS:[018] Points to the TEB itself
FS:[020] Process PID
FS:[024] Thread ID
FS:[02C] Points to the thread local storage pointer
FS:[030] PEB structure address (process structure)
FS:[034] Last error number

// Thread Environment Block (TEB)
typedef struct _TEB
{
    NT_TIB Tib;                             /* 00h */
    PVOID EnvironmentPointer;               /* 1Ch */
    CLIENT_ID Cid;                          /* 20h */
    PVOID ActiveRpcHandle;                  /* 28h */
    PVOID ThreadLocalStoragePointer;        /* 2Ch */
    struct _PEB *ProcessEnvironmentBlock;   /* 30h */
    ULONG LastErrorValue;                   /* 34h */
    ULONG CountOfOwnedCriticalSections;     /* 38h */
    PVOID CsrClientThread;                  /* 3Ch */
    struct _W32THREAD* Win32ThreadInfo;     /* 40h */
    ULONG User32Reserved[0x1A];             /* 44h */
    ULONG UserReserved[5];                  /* ACh */
    PVOID WOW32Reserved;                    /* C0h */
    LCID CurrentLocale;                     /* C4h */
    ULONG FpSoftwareStatusRegister;         /* C8h */
    PVOID SystemReserved1[0x36];            /* CCh */
    LONG ExceptionCode;                     /* 1A4h */
    struct _ACTIVATION_CONTEXT_STACK *ActivationContextStackPointer; /* 1A8h */
    UCHAR SpareBytes1[0x28];                /* 1ACh */
    GDI_TEB_BATCH GdiTebBatch;              /* 1D4h */
    CLIENT_ID RealClientId;                 /* 6B4h */
    PVOID GdiCachedProcessHandle;           /* 6BCh */
    ULONG GdiClientPID;                     /* 6C0h */
    ULONG GdiClientTID;                     /* 6C4h */
    PVOID GdiThreadLocalInfo;               /* 6C8h */
    ULONG Win32ClientInfo[62];              /* 6CCh */
    PVOID glDispatchTable[0xE9];            /* 7C4h */
    ULONG glReserved1[0x1D];                /* B68h */
    PVOID glReserved2;                      /* BDCh */
    PVOID glSectionInfo;                    /* BE0h */
    PVOID glSection;                        /* BE4h */
    PVOID glTable;                          /* BE8h */
    PVOID glCurrentRC;                      /* BECh */
    PVOID glContext;                        /* BF0h */
    NTSTATUS LastStatusValue;               /* BF4h */
    UNICODE_STRING StaticUnicodeString;     /* BF8h */
    WCHAR StaticUnicodeBuffer[0x105];       /* C00h */
    PVOID DeallocationStack;                /* E0Ch */
    PVOID TlsSlots[0x40];                   /* E10h */
    LIST_ENTRY TlsLinks;                    /* F10h */
    PVOID Vdm;                              /* F18h */
    PVOID ReservedForNtRpc;                 /* F1Ch */
    PVOID DbgSsReserved[0x2];               /* F20h */
    ULONG HardErrorDisabled;                /* F28h */
    PVOID Instrumentation[14];              /* F2Ch */
    PVOID SubProcessTag;                    /* F64h */
    PVOID EtwTraceData;                     /* F68h */
    PVOID WinSockData;                      /* F6Ch */
    ULONG GdiBatchCount;                    /* F70h */
    BOOLEAN InDbgPrint;                     /* F74h */
    BOOLEAN FreeStackOnTermination;         /* F75h */
    BOOLEAN HasFiberData;                   /* F76h */
    UCHAR IdealProcessor;                   /* F77h */
    ULONG GuaranteedStackBytes;             /* F78h */
    PVOID ReservedForPerf;                  /* F7Ch */
    PVOID ReservedForOle;                   /* F80h */
    ULONG WaitingOnLoaderLock;              /* F84h */
    ULONG SparePointer1;                    /* F88h */
    ULONG SoftPatchPtr1;                    /* F8Ch */
    ULONG SoftPatchPtr2;                    /* F90h */
    PVOID *TlsExpansionSlots;               /* F94h */
    ULONG ImpersionationLocale;             /* F98h */
    ULONG IsImpersonating;                  /* F9Ch */
    PVOID NlsCache;                         /* FA0h */
    PVOID pShimData;                        /* FA4h */
    ULONG HeapVirualAffinity;               /* FA8h */
    PVOID CurrentTransactionHandle;         /* FACh */
    PTEB_ACTIVE_FRAME ActiveFrame;          /* FB0h */
    PVOID FlsData;                          /* FB4h */
    UCHAR SafeThunkCall;                    /* FB8h */
    UCHAR BooleanSpare[3];                  /* FB9h */
} TEB, *PTEB;

PEB

https://www.cnblogs.com/viwilla/p/5109966.html
The content is outdated. The current PEB offset in Windows 10/11 has changed to 0x60.

PEB (Process Environment Block) stores process information, and each process has its own PEB data. It resides in user address space. In Windows 2000, the address of the Process Environment Block is fixed for each process at 0x7FFDF000, which is within the user address space, allowing programs to access it directly.

The exact PEB address should be obtained from the 0x1b0 offset of the system's EPROCESS structure. However, since EPROCESS is located in the system address space, accessing this structure requires ring0 privileges.

Alternatively, the PEB location can be retrieved from the TEB structure at offset 0x30. The FS segment register points to the current TEB structure:

mov eax, dword ptr fs:[0x30]

Or via the TEB pointer:

mov eax, dword ptr fs:[0x18] ;eax = *TEB
mov eax, dword ptr [eax+0x30] ;eax = *PEB

In user mode, the WinDbg command $proc can be used to obtain the PEB address.

//Process Environment Block
typedef struct _PEB
{
    UCHAR InheritedAddressSpace; // 00h
    UCHAR ReadImageFileExecOptions; // 01h
    UCHAR BeingDebugged; // 02h
    UCHAR Spare; // 03h
    PVOID Mutant; // 04h
    PVOID ImageBaseAddress; // 08h
    PPEB_LDR_DATA Ldr; // 0Ch
    PRTL_USER_PROCESS_PARAMETERS ProcessParameters; // 10h
    PVOID SubSystemData; // 14h
    PVOID ProcessHeap; // 18h
    PVOID FastPebLock; // 1Ch
    PPEBLOCKROUTINE FastPebLockRoutine; // 20h
    PPEBLOCKROUTINE FastPebUnlockRoutine; // 24h
    ULONG EnvironmentUpdateCount; // 28h
    PVOID* KernelCallbackTable; // 2Ch
    PVOID EventLogSection; // 30h
    PVOID EventLog; // 34h
    PPEB_FREE_BLOCK FreeList; // 38h
    ULONG TlsExpansionCounter; // 3Ch
    PVOID TlsBitmap; // 40h
    ULONG TlsBitmapBits[0x2]; // 44h
    PVOID ReadOnlySharedMemoryBase; // 4Ch
    PVOID ReadOnlySharedMemoryHeap; // 50h
    PVOID* ReadOnlyStaticServerData; // 54h
    PVOID AnsiCodePageData; // 58h
    PVOID OemCodePageData; // 5Ch
    PVOID UnicodeCaseTableData; // 60h
    ULONG NumberOfProcessors; // 64h
    ULONG NtGlobalFlag; // 68h
    UCHAR Spare2[0x4]; // 6Ch
    LARGE_INTEGER CriticalSectionTimeout; // 70h
    ULONG HeapSegmentReserve; // 78h
    ULONG HeapSegmentCommit; // 7Ch
    ULONG HeapDeCommitTotalFreeThreshold; // 80h
    ULONG HeapDeCommitFreeBlockThreshold; // 84h
    ULONG NumberOfHeaps; // 88h
    ULONG MaximumNumberOfHeaps; // 8Ch
    PVOID** ProcessHeaps; // 90h
    PVOID GdiSharedHandleTable; // 94h
    PVOID ProcessStarterHelper; // 98h
    PVOID GdiDCAttributeList; // 9Ch
    PVOID LoaderLock; // A0h
    ULONG OSMajorVersion; // A4h
    ULONG OSMinorVersion; // A8h
    ULONG OSBuildNumber; // ACh
    ULONG OSPlatformId; // B0h
    ULONG ImageSubSystem; // B4h
    ULONG ImageSubSystemMajorVersion; // B8h
    ULONG ImageSubSystemMinorVersion; // C0h
    ULONG GdiHandleBuffer[0x22]; // C4h
    PVOID ProcessWindowStation; // ???
} PEB, *PPEB;

Kernel Debugging Basics – 1/10/2023

See link: http://blog.junyu33.me/2023/01/10/winkernel_environ.html