(archived) OS Notes

Only the theoretical portion is provided for final exam review.

Computer System Overview

More focused on computer organization, not tested in the exam.

Fetch-Execute.
Interruption: program flow chart, ISR (Interrupt Handler) level.

Operating System Overview

OS: a program that controls the execution of application programs and acts as an interface between applications and the computer hardware.

Interrupts: Early computer models did not have this capability. This feature gives the OS more flexibility in relinquishing control to and regaining control from user programs.

Process Description and Control

Process

Interleave, Strategy, Deadlock.

Definition:

A program that is currently running (executing).
An instance of a program executing on a computer.
An entity that can be assigned to a processor and executed by it.
An active unit characterized by a set of executing instructions, a current state, and a collection of associated system resources.

Components:

ID
State

Threads

Thread: Lightweight process.

Linux does not have kernel threads, while Windows does. The advantages of not having kernel threads include being lightweight and easier to manage.

Concurrency

Terminology

Atomic operation
Critical section
Deadlock
Livelock
Mutual exclusion
Race condition
Starvation

Characteristics of single-program systems:

Isolation
Continuity
Reproducibility

Characteristics of multi-program systems:

Loss of isolation
Intermittency
Non-reproducibility

algorithm

Dekker

Prone to livelock.
Imposes a fixed execution order on processes, limiting flexibility.
Complex implementation, making it difficult to verify.

Peterson's Algorithm

Hardware Disabling

Disadvantages:

The processor is limited in its ability to interleave programs.
Disabling interrupts on one processor does not guarantee mutual exclusion in a multi-processor environment.

Special Instructions

Advantages: No limit on the number of processes, simple and easy to verify, supports multiple critical sections.

Disadvantages: Busy waiting, starvation, deadlock.

Semaphores

Terminology

Binary Semaphore
Counting Semaphore
Mutex
Strong Semaphore
Weak Semaphore

Operations

semwait: Request a resource / wait for a condition
semsignal: Release a resource / generate a condition
If both operations occur in the same process, it is called mutual exclusion; if they occur in different processes, it is called synchronization.
For semwait, handle synchronization first, then mutual exclusion.

Steps

Determine the number of processes
Analyze the nature of the problem
Define the semaphores and initialize their values
Call semwait and semsignal in the processes

Semaphore(a, b, c)
{
  a = 1;
  b = 2;
  c = 3;
}

void p_1...p_n()
{
  // do something
}

void main()
{
  Parbegin(p(1), p(2), p(3), ..., p(n));
}

Problems

Producer-Consumer Problem:

void producer() {
  while (1) {
    produce();
    semwait(empty);
    semwait(mutex);
    // put item in store
    semsignal(mutex);
    semsignal(product);
  }
}
void consumer() {
  while(1) {
    semwait(product);
    semwait(mutex);
    // take item from store
    semsignal(mutex);
    semsignal(empty);
    use();
  }
}

Reader-Writer Problem:

Mutual exclusion between readers and writers, and between writers and writers.

Mutual exclusion among readers (race condition).

Reader-Priority:

Semaphore mutex = 1; // File mutual exclusion
Semaphore read_mutex = 1; // Reader count mutual exclusion

int counter = 0;

void reader() {
  int m;
  semwait(read_mutex);
  counter++;
  semsignal(read_mutex);
  if (counter == 1) semwait(mutex); // the first to read, compete with writer
  read();
  semwait(read_mutex);
  counter--;
  semsignal(read_mutex);
  if (counter == 0) semsignal(mutex);
}

void writer() {
  semwait(mutex);
  write();
  semsignal(mutex);
}

Equal-Priority:

Semaphore mutex = 1; // File mutual exclusion
Semaphore read_mutex = 1; // Reader count mutual exclusion
Semaphore queue = 1; // Reader and writer queuing mutual exclusion

int readcount = 0;

void reader() {
  int m = 0;
  semwait(queue);
  semwait(read_mutex);
  readcount++;
  semsignal(read_mutex);
  if (readcount == 1) semwait(mutex);
  semsignal(queue);
  read();
  semwait(read_mutex);
  readcount--;
  semsignal(read_mutex);
  if (readcount == 0) semsignal(mutex);
}
void writer() {
  semwait(queue);
  semwait(mutex);
  write();
  semsignal(mutex);
  semsignal(queue);
}

Monitors

Composition

Local data
Condition variables
Waiting area (queue) cwait(cond) csignal(cond)
Procedures
Initialization code

Inherently mutually exclusive. (At most one active procedure at any given time.)

Message Passing

send(dest, msg)

receive(src, msg)

Inherently synchronous.

Synchronous blocking: You're hungry, ask your mom to cook, and then you wait continuously.

Synchronous non-blocking: You're hungry, ask your mom to cook, you go play games, and occasionally ask if the food is ready.

Asynchronous blocking: You're hungry, ask your mom to cook, you go play games, your mom tells you when the food is ready, and you go eat.

Asynchronous non-blocking: You're hungry, ask your mom to cook, you go play games, your mom tells you the food is ready and brings it to you.

Advantage: Cross-host communication, usable in network programming.

Deadlock & Starvation

There is no universally effective solution.

Why Deadlock Occurs

Limited resources
Improper order of resource requests and releases

Resources

Reusable resources (CPU, memory, databases, semaphores, etc.)
Consumable resources (interrupts, signals, etc.)

Requirements

Mutual exclusion
Hold and wait
No preemption
Circular wait (the first three are necessary conditions; adding this one makes it sufficient)

resolve

Allow Deadlock

Ostrich Algorithm (Ignore)
Deadlock Detection

Specific algorithm for deadlock detection:

Example:

Find a zero vector in the Allocation matrix.

Set w = v.

Look for a row vector in the Request matrix that is less than w, mark it, clear A, and update the w vector.

Repeat step 3 until no more comparisons can be made (deadlock) or all processes are cleared (no deadlock).

Deadlock Elimination (Terminate the process with the least CPU time used, least output, longest waiting time, or lowest priority)

Do Not Allow

Deadlock Prevention (Static)
- No mutual exclusion (cost issues)
- No hold and wait; request all resources at once (feasible, but cannot determine which resources a process needs)
- Preemption (feasible, requires ensuring that preempted processes can be easily saved and restored)
- Define a linear ordering of circular wait (feasible, but does not consider the actual needs of processes)
Deadlock Avoidance (Dynamic)
- Process initiation denial: for all j, $R_{j} \geq C_{(n + 1) j} + \sum_{i = 1}^{n} C_{i j}$ .
- Banker's Algorithm (drawbacks: maximum resources, isolated consideration, fixed number of allocated resources, child processes cannot exit, high time and space complexity)

Banker's Algorithm Details:

Example:

a) $6 + 2 + 0 + 4 + 1 + 1 + 1 = 15, 3 + 0 + 1 + 1 + 0 + 1 + 0 = 6$ — Simply verify all A, B, C, D.

b) $N e e d = C - A$

c) (Safety detection algorithm) Compare a row of $C - A$ with the $V$ vector. If $V$ is greater, update $V = V + A$ , until no more comparisons can be made (unsafe) or all processes are cleared (safe).

d) (Banker's algorithm) Add the new request to the $A$ matrix (in this case, add to P5 to get $[4, 2, 4, 4]$ ), then update the $V$ vector to $[3, 1, 2, 1]$ , and perform step c again.

Dining Philosopher's Problem

Initial Solution (Incorrect)

Problem: Possible deadlock (all five philosophers take one fork)

sem p[i] = 1 from 0 to 4
void philosopher(int i) {
  think();
  wait(p[i]);
  wait(p[(i+1)%5]);
  eat();
  signal(p[(i+1)%5]); // routine
  signal(p[i]);
}

Solution 1:

sem p[i] = 1 from 0 to 4;
sem room = 4; // only four philosophers won't have deadlock
void philosopher(int i) {
  think();
  wait(room);
  wait(p[i]);
  wait(p[(i+1)%5]);
  eat();
  signal(p[(i+1)%5]); // routine
  signal(p[i]);
  signal(room);
}

Solution 2:

sem p[i] = 1 from 0 to 4;
void philosopher(int i) {
  think();
  min = min(i, (i+1)%5); // confines philosopher 0 and 4 (won't eat together)
  max = max(i, (i+1)%5); // so at most 4 procs
  wait(p[min]);
  wait(p[max]);
  eat();
  signal(p[max]); 
  signal(p[min]);
}

Q: What about 2 dynamic algorithms?

(TODO)

Assume there are $m$ processes, and the maximum request for each process is $x_{i}$ . Then, a minimum of $\sum_{i = 1}^{m} (x_{i} - 1) + 1$ resources will not cause deadlock.

Memory Management

Fixed Partitioning, Dynamic Partitioning: Entirely loaded, non-partitioned.
Simple Paging, Simple Segmentation: Entirely loaded, partitioned.
Virtual Memory Paging, Virtual Memory Segmentation: Partially loaded, partitioned.

Fixed Partitioning

Pros: Simple, low operating system overhead.
Cons: Limits the number of active processes, small jobs cannot efficiently utilize partition space (internal fragmentation).

Dynamic Partitioning

Pros: More efficient memory utilization, no internal fragmentation
Cons: Requires compaction to address external fragmentation, low processor utilization

Generates external fragmentation (solved using compaction algorithms)

Placement algorithms (Best Fit, First Fit, Next Fit)

A process may occupy different memory locations during its lifecycle (due to compaction and swapping techniques).

Buddy System

Similar to a binary tree

Superior to fixed and dynamic partitioning, but internal fragmentation exists

Simple Paging

Definitions: Frame, Page, Segment

Each process has a page table, which indicates the location of the corresponding frame (physical) for each page (logical) of the process.

A logical address consists of a page number and an offset within that page. The page number is used to look up the frame number (first 6 bits) in the page table, which is then concatenated with the offset (10 bits) to form the physical address.

A small amount of internal fragmentation exists.

Simple Segmentation

Unlike process page tables which only maintain frame numbers, process segment tables maintain both the length and base address of segments.

A logical address consists of a segment number and an offset. The segment number is used to look up the corresponding base address (16 bits) in the segment table, which is then added to the offset (12 bits) to obtain the physical address.

External fragmentation exists.

Memory Management Requirements

Relocation (physical address, logical address, relative address, base register, limit register)
Protection (satisfied by hardware)
Sharing (allowing multiple processes to access the same part of memory)
Logical organization (programs are written using modules, which can be compiled separately and provide different protections)
Physical organization (programmers are unaware of the client's memory limitations)

Homework:

Correction: Due to a mechanism called relocation (if X was originally at that location and later freed, reallocating with malloc will still place it at the same location instead of shifting left), 7.6.a remains 8MB. The second question should be corrected to [3M, 8M].

Virtual Memory

Fixed partitioning, dynamic partitioning: fully loaded, non-partitioned.

Simple paging, simple segmentation: fully loaded, partitioned.

Virtual memory paging, virtual memory segmentation: partially loaded, partitioned.

Virtual Memory Paging

In simple terms, it involves moving a portion of memory to the hard disk and loading it back into memory when needed.

If the page to be read is located on the disk, a page fault interrupt is triggered, blocking the process. After the page is loaded from the disk into memory, another interrupt is generated to resume execution. (Thus, a single page fault results in two interrupts. If the interrupt time exceeds the instruction execution time, it can cause "thrashing.")

For 32-bit systems with 4GB of memory, the operating system maintains a 4KB root page table in memory (each entry is 4 bytes, totaling 1024 entries). Each entry in this root page table corresponds to a 4KB page table (each entry is 4 bytes, totaling 1024 entries per page table, resulting in 1,048,576 entries in total, occupying 4MB of virtual memory space). Each of these 4MB virtual memory entries maps to a 12-bit virtual memory address (thus, the 1,048,576 entries can map to the entire 4GB virtual memory space).

For pwners: This is why the last three bits of an address remain unchanged under ASLR. ASLR only randomizes at the page level, not the offset within a page.

The method for converting a virtual address to a physical address is as follows: First, use the first 10 bits as an offset into the root page table to locate the corresponding root page table entry. Then, determine whether the user page table (the 4MB one) corresponding to that root page table entry is in memory. If it is, use the next 10 bits to locate the corresponding user page table entry, and combine the page frame number from that entry with the offset to obtain the physical address. If it is not in memory, a page fault interrupt is triggered.

The Translation Lookaside Buffer (TLB) can cache some page number-to-frame number mappings, thereby reducing interrupts and improving memory access speed.

The relationship between page size and page fault rate first increases and then decreases (due to the principle of locality; the entire process resides within pages). The relationship between the number of page frames and the page fault rate is monotonically decreasing (if every page has a corresponding page frame, there would be no page faults).

Virtual Memory Segmentation

A virtual address consists of a page number and an offset. The method to convert it into a physical address is as follows: the page number is used as an offset to look up the corresponding segment number in the segment table, and the base address of that segment is added to the offset of the virtual address to obtain the physical address.

OS Level

Read Strategies

On-demand Paging: Allocate only when required. (Results in many page faults at startup.)
Pre-paging: Allocate a batch of pages at startup. (May include pages that are never accessed.)

Placement Strategy (Same as the Previous Placement Algorithm)

Determines the physical location where a process block resides.

Page Replacement Policies

Frame Locking: Pages that cannot be swapped out (e.g., kernel, etc.)

OPT: Replace the page that will be accessed farthest in the future (ideal method, minimizes page faults, requires future access prediction)
LRU: Least Recently Used (high overhead, requires timestamping each page)
FIFO: First-In-First-Out (simplest, replaces the page that has been in memory the longest)
CLK: Clock algorithm with a used bit (a compromise between LRU and FIFO)
PB: Prioritizes reducing I/O operation time

CLK Implementation Code:

pageSize = 3
pagePointer = 0
pages = [[0, 0] for i in range(pageSize)] # [page, used]

accessSeq = [2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2]

def printPages():
    print('pagePointer:', pagePointer)
    for i in range(pageSize):
        print(pages[i][0], pages[i][1])

for newPage in accessSeq:
    hasPage = False
    for i in range(pageSize):
        if pages[i][0] == newPage:
            pages[i][1] = 1
            hasPage = True
            break

    if hasPage == True:
        printPages()
        continue
    
    times = 0
    for i in range(pageSize):
        times += 1
        if pages[pagePointer][1] == 0:
            pages[pagePointer][0] = newPage
            pages[pagePointer][1] = 1
            pagePointer += 1
            pagePointer %= pageSize
            break

        else:
            pages[pagePointer][1] = 0
            pagePointer += 1
            pagePointer %= pageSize

    if times == pageSize:
        pages[pagePointer][0] = newPage
        pages[pagePointer][1] = 1
        pagePointer += 1
        pagePointer %= pageSize

    printPages()

Resident Set Management

Fixed Allocation: The number of page frames allocated to a process is fixed.
Variable Allocation: More frames are allocated if the page fault rate is high; fewer are allocated if the page fault rate is low.
Local Replacement: Replace a page within the resident set of the faulting process.
Global Replacement: Replace any unlocked page in the entire memory.

Clearance Strategies

On-Demand Clearance: Pages are written back to auxiliary storage only when they are replaced.
Preemptive Clearance: Pages are written back to auxiliary storage in advance.

Load Control

Affects the number of processes residing in memory (system concurrency) and processor utilization.

As the level of concurrency increases, processor utilization also increases. Beyond a certain point, the average resident set size becomes insufficient, leading to an increase in interruptions and a subsequent decrease in processor utilization.

Processor Scheduling

Types

Long-term scheduling (job scheduling)
Medium-term scheduling (swapping out processes)
Short-term scheduling (system calls, I/O)
I/O scheduling: deciding which process can use an available I/O device.

Algorithm Design

Requirements

(Non)preemptive: A process can be dispatched even if the currently executing process has not yet finished.

Performance:

Turnaround time & response time
Throughput & deadline
Utilization

Fairness & starvation

algos

non-preemptive
- FCFS: first come first serve
  - not optimized for average turnaround time, throughput
  - no starvation but biased toward long processes
- SPN: shortest process next
  - scheduling repetitive tasks (OS doesn't know how long a process will execute)
  - optimized turnaround time, throughput
  - long process may starve, biased toward short processes
- HRRN: highest response ratio next
  - $R = \frac{w + \overset{―}{t}}{\overset{―}{t}}$
  - equally treats performance and fairness
  - avoids starvation of long processes to some extent
preemptive
- RR: round robin
  - depends on slice thickness
  - short slice means quick response but long turnaround, and vice versa.
- SRT: shortest remaining time
  - scheduling repetitive tasks (OS doesn't know how long a process will execute)
  - more optimized turnaround time, throughput
  - long process may starve, biased toward short processes
- MLFQ: multi-level feedback queue
  - one queue for each priority level, new process priority is 0 (maximum)
  - if released before interrupted, stay at same priority
  - if preempted/interrupted, move to the next priority

Real-time scheduling

EDF

Earliest Deadline First

for non-scheduled

RM

Rate Monotonic (higher rate/frequency first)

for scheduling

I/O

ALL OPERATIONS ARE NON-PREEMPTIVE!!

Buffering

Aim: Reduce process stuck time to improve efficiency $\to$ smooth I/O performance.

Type

Single buffer: Suitable for sequential data access, concurrent read/write not allowed.
Double buffer: Enables concurrent read/write, still stuck if $v_{w r i t e} \geq v_{r e a d}$ & a new I/O request occurs.
Circular buffer: Circular queue ( $> 2$ ).

Disk Scheduling

$T = T_{s e e k} + T_{r o t a t i o n} + T_{t r a n s f e r}$

transfer (fixed)

$T_{r} = \frac{1}{2 r}$

rotation (fixed)

$b$ stands for number of bytes in a sector
$N_{s}$ stands for number of sectors in a track
$B = N_{s} b$ stands for number of bytes in a track
$T_{d} = \frac{b}{B r} = \frac{1}{N_{s} r}$

Disk Scheduling Algorithms

Policies:

Priority (may lead to starvation for low-priority processes)
FIFO (not optimized for performance)
LIFO (not optimized, but suitable for transaction processing)
SSTF (Shortest Seek Time First, faster than FIFO)
SCAN (moves the arm in one direction, selects the closest request in the same direction, reverses direction upon reaching the end)

File

What

An abstraction to manage and access data.

Components

Field:

Basic data element
Contains a single value
Of fixed or variable length

Record:

Collection of related fields
Basic unit of application data
Of fixed or variable length

File:

Collection of related records
Managed as a single entity
Identified and referenced by a name

Block:

A continuous collection of data
Contains part of a file
Of fixed or variable length
Basic unit of file storage allocation
Basic unit of I/O buffering

requirements

Persistent
Shareable
Structured

File Operation

Basic: create, delete, open, close, read, write

Access Control

Typical levels:

None
Knowledge
Execution
Reading
Appending
Updating
Changing protection
Deletion

Access rights:

Owner or privileged user (root): Full rights, grant rights to others
Specific user: Individual user (associated with user ID)
User group: Set of users
Public

In *nix systems

Access modes: Read, Write, eXecute (RWX)

Access rights: Owner, group, public

Access control grant: rwxrwxrwx

File Organization

Criteria:

Short access time
Ease of update
Economy of storage
Simple maintenance
Reliability (x)

Pile File

Files are organized as not-fixed records

Easy to insert, difficult to search

Sequential File

Files are organized as fixed, ordered records
Has a key field
New records are placed in a log file

Performance is still low

Index Sequential

Add indexes to entries in a sequential file for quick searching
Two-step search: index and record
Overflow file

Indexed

Sequential file builds indexes for one key field
Index file builds indexes for multiple key fields
No constraints on record position, no overflow file

After an operation, all records need modification

Direct File or Hashed File

Indexed by a hash table, NOT sorted
Update and search for a single record is easy

Storage Management

Goal: Manage the disk space to support a stream of file operations

Two basic tasks:

File Allocation

FAT: File Allocation Table

Contiguous
- Good locality
- External fragmentation (improved by compression)
Chained
- Each block has a pointer to the next block
- Poor locality (improved by consolidation)
- High utilization and flexibility
Indexed (variable/block)
- FAT has a separate one-level index for each file
- The index contains an entry for each variable portion/block allocated to the file
- FAT contains the block number for the index block

Free Space Management

DAT: Disk Allocation Table
- $S = \frac{N}{8} = \frac{C}{8 s}$
- Bit table size (S), number of blocks (N), disk capacity (C), block size (s)
Chained
- No DAT needed
- Fragmentation
- Long allocation/recycle time
Indexed
Free Block List
- Store the list of available blocks on the disk
- Consumes a small portion of disk space
- Some parts can be kept in memory to speed up operations

Review Outline

2

Concept of an Operating System (OS), and its distinction from application programs
Functions and objectives of an operating system
Core components of an operating system, with a brief description of their functions
Evolution of operating systems (batch processing, time-sharing, real-time systems)
Concepts of time-sharing systems and batch processing systems, and their differences
Roles of interrupts and exceptions in time-sharing and batch processing systems
How to understand the operating system as a virtual machine

3

What hardware support (interrupts and exceptions) does the CPU need to implement processes?
What is a process, and what is its relationship and difference with a program?
The concept and function of the Process Control Block (PCB), and a description of its composition
Describe how a program becomes a process from the perspective of its lifecycle
What primitives are related to process control?
The five-state diagram of a process (diagram-related)
Explain process execution from three perspectives (discussion topic)
The process creation flow, combined with Linux's fork (one call, two returns)
The flow and timing of process switching (wait)

4

Concept of Threads, Differences Between Processes and Threads
Three Implementation Methods of Threads
Advantages and Disadvantages of ULT and KLT

5

Four problems that concurrency needs to solve (synchronization, mutual exclusion, starvation, deadlock)
Why synchronization and mutual exclusion mechanisms are introduced (race condition)
What solutions exist (software-based, hardware-based, system-based (monitors and semaphores))
Drawbacks of hardware-based solutions
Concept, essence, and implementation of semaphores
What is a monitor, and why is it introduced

6

Concept and Background of Deadlock
Conditions for Deadlock (Three Sufficient Conditions, Four Necessary and Sufficient Conditions)
Solutions to Deadlock (Prevention, Avoidance, Detection, Ignoring)
Relationship Between Deadlock and Starvation

7

Seven Methods of Storage Management
Similarities and Differences Between Fixed Partitioning, Paging, Dynamic Partitioning, and Segmentation
Structure and Usage of Segment Tables and Page Tables

8

Three Issues to Consider in Implementing Virtual Memory
Three Methods for Implementing Virtual Memory (Virtual Segmentation, Virtual Paging, Segmented Paging)
Page Fault Handling Process in Virtual Memory
Address Translation Calculations
Briefly Describe the Relationship Between Page Fault Rate, Page Size, and Number of Page Frames

9

Descriptions of Long, Medium, and Short-Term Scheduling (Graph-Related)

10

Background of IObuffer and Its Operational Mechanism

12

File System Implementation Approaches
At the logical level, there are five implementation methods.
At the physical level, there are three implementation methods.

DIY Notes (Chapter 9)

DIY Textbook: Link to Operating Systems: Three Easy Pieces: https://pages.cs.wisc.edu/~remzi/OSTEP/

Chapter 7: Introduction

Response Time—A New Metric

The STCF algorithm primarily optimizes turnaround time, an approach mainly targeted at early batch systems. However, for today's time-shared systems, response time is the key factor determining real-time performance.

The formula for response time is as follows:

T_{r e s p o n s e} = T_{f i r s t r u n} - T_{a r r i v a l}

Suppose three processes A, B, and C all enter the queue at time $t = 0$ , and each has an execution time of 5. The scheduling using STCF/SJF is illustrated in the figure below:

Thus, the average response time is:
${\overset{―}{T}}_{r e s p o n s e} = \frac{0 + 5 + 3}{3} = 5$

Round Robin

The Round-Robin algorithm is a preemptive algorithm that significantly improves the response time of each process. Instead of running a job to completion, RR runs it for a time slice (also known as a scheduling quantum). It is important to note that the length of the time slice must be a multiple of the timer interrupt period. For example, if the timer interrupts every 10 milliseconds, the time slice could be 10, 20, or any other multiple of 10ms.

Using the same example as before, the scheduling scenario with Round-Robin (with a time slice of 1 unit) is as follows:

Therefore, the average response time is:
${\overset{―}{T}}_{r e s p o n s e} = \frac{0 + 1 + 2}{3} = 1$
The improvement is significant.

Advantages:

Significantly reduces response time
Fair to processes

Disadvantages:

Slow turnaround time

Considering I/O Situations

Conclusion: When using non-preemptive scheduling algorithms (such as STCF), the time chunks of processes that frequently use I/O should be split apart to improve resource utilization.

Example: Both A and B are processes requiring 50ms, but A calls I/O every 10ms, taking 10ms each time.

Without splitting time chunks, total time used is 140ms:

With splitting time chunks, total time used is 100ms:

By treating each CPU burst as a separate job, the scheduler ensures that "interactive" processes run frequently. When these interactive jobs perform I/O, other CPU-intensive jobs can run, thereby making better use of the processor.

What to Do If the Future Cannot Be Predicted

The STCF/SJF algorithms have limited utility in real operating systems because the OS has little prior knowledge about each process. Potential improvements may include:

Attempting to enhance algorithms like Round-Robin, which do not require prior knowledge, such as by adjusting the length of time slices to balance response time and turnaround time.
Trying to predict the time required by processes.

Summary

In this chapter, we learned about:

Optimizing response time: Round-Robin
Handling I/O behavior in scheduling

Chapter 8: Multi-Level Feedback Queue

Problems MLFQ Attempts to Optimize

Optimizing turnaround time without prior knowledge of how long a job will run.
Reducing response time for interactive users while optimizing turnaround time.

Basic Concepts of MLFQ

MLFQ has multiple different queues, each assigned a different priority level. At any given time, a ready job resides in one of these queues, and MLFQ uses priority to decide which job to run.

Basic Rules of MLFQ

Rule 1: If Job A has a higher priority than Job B, run Job A.
Rule 2: If Job A and Job B have the same priority, run them alternately.
Rule 3: When a job enters the system, it is placed in the highest priority queue (the topmost queue).
Rule 4: Once a job uses up its time slice at its current priority level, its priority is decreased.
Rule 5: After a specific period S, all jobs are moved to the highest priority queue.

Specific Implementation of MLFQ

Step 1

When a job first enters, it is placed in the highest priority queue for execution.

Step 2

Each queue is assigned a specific quantum length. If a job consumes its entire quantum, it is demoted to the next lower priority queue.

The quantum length increases progressively from the highest priority queue to the lowest priority queue.
If a job voluntarily relinquishes the CPU (e.g., due to I/O), it remains at the same priority level upon resumption and continues using the remaining quantum.

Step 3

After a certain period, all jobs are moved back to the highest priority queue.

Issues with MLFQ

Starvation Problem: If there are "too many" interactive jobs in the system, they can collectively consume all CPU time.
Gaming the Scheduler: Processes may attempt to trick the scheduler to gain more resources.
Changing Behavior Over Time: A program's behavior may evolve, affecting its scheduling needs.

Summary of MLFQ

MLFQ does not require prior knowledge; it determines priority by observing execution behavior.
Its overall performance is close to that of SJF/STCF.
It is well-optimized for short, interactive tasks.
It remains fair to long-running tasks.

Objective: To ensure that each process receives a certain percentage of CPU time.

Lottery Scheduling

Each process holds a number of tickets, and the system randomly selects one ticket from all available tickets; the process that owns the selected ticket gets to execute.

According to the law of large numbers, as execution time increases, the percentage of execution time allocated to each process will eventually converge to the proportion of tickets it holds relative to the total number of tickets.

Basic Concepts

Currency: Each user can define their own ticket unit and allocate tickets to their subprocesses based on an exchange rate.
Transfer: A process can temporarily transfer its tickets to another process, effectively relinquishing its execution rights.
Inflation: In an environment where processes trust each other, a process can inflate its ticket count to gain more CPU time without notifying other processes.

Implementation

A random number generator is used to select the winning ticket, along with a data structure (such as a list) that records all processes in the system and the total number of tickets.

Assume we use a list to track processes. There are three processes: A, B, and C. Before making a scheduling decision, a random number (the winning ticket) is selected from the total number of tickets, say 400. Suppose the number 300 is chosen. Then, we traverse the linked list, using a simple counter to help identify the winner.

We traverse the process list from the beginning, adding the value of each ticket to the counter until the counter exceeds the winning number. At that point, the current list element corresponds to the winning process. For example, if the winning ticket is 300, we first add A's tickets: the counter increases to 100. Since 100 < 300, we continue. Then, the counter increases to 150 (B's tickets), still less than 300, so we continue. Finally, when the counter increases to 400 (greater than 300), we stop, and the current pointer indicates that C is the winner.

Advantages: Lightweight, requiring almost no state recording; fast execution.
Disadvantages: Cannot guarantee correct proportional allocation when job runtimes are very short.

Stride Scheduling

Eliminates randomness by replacing it with a fixed stride, resulting in a deterministic and fair allocation algorithm.

Steps

For example, if tickets for A, B, C are 100, 50, 250, respectively, set their strides as the inverse of their tickets (a simple approach is to divide a large number by each ticket). If this large number is 10000, then the strides for A, B, C are 100, 200, 40.
Initialize the total strides for A, B, C to 0.
Select the process with the smallest total stride and add its stride value to it.
Repeat step 3.

The final scheduling result is shown in the figure below:

Advantage: The percentage of execution time for each process exactly matches the proportion of its tickets (rather than gradually converging to this value).
Disadvantage: Inconvenient for adding new processes.

Linux CFS (Optional Topic)

For each process, balance efficiency and fairness by adjusting sched_latency and min_granularity (consider the drawbacks of too small or too large time slices in Round-Robin).
Adjust the weight of a process using the nice shell command. Specifically, the time slice for each process can be calculated using the following formula:

t i m e_s l i c e_{k} = \frac{w e i g h t_{k}}{\sum_{i = 0}^{n - 1} w e i g h t_{i}} \cdot s c h e d_l a t e n c y

Refer to the following definition for specific weight constants:

// the map between nice param to weight
static const int prio_to_weight[40] = {
/* -20 */ 88761, 71755, 56483, 46273, 36291,
/* -15 */ 29154, 23254, 18705, 14949, 11916,
/* -10 */  9548,  7620,  6100,  4904,  3906,
/* -5 */   3121,  2501,  1991,  1586,  1277,
/* 0 */    1024,   820,   655,   526,   423,
/* 5 */     335,   272,   215,   172,   137,
/* 10 */    110,    87,    70,    56,    45,
/* 15 */     36,    29,    23,    18,    15,
};

Track the current runtime of a program using vruntime (which may not be real time but can be understood as the stride mentioned above). Select the process with the smallest vruntime for the next run. Specifically, for every runtime amount of time executed, update vruntime using the following formula:

v r u n t i m e_{i} = v r u n t i m e_{i} + \frac{w e i g h t_{0}}{w e i g h t_{i}} \cdot r u n t i m e_{i}

Here, the value of weight_0 can be found in the code above and is 1024.

Use a red-black tree for process management. Specifically, this data structure ensures that the time complexity for process insertion, query (finding the process with the smallest vruntime), modification, and deletion is uniformly O(log n), as opposed to O(n) with a regular list.
For processes performing I/O operations or sleeping, the system sets their vruntime to the minimum vruntime among all other processes.

Summary

In this chapter, we learned about:

Two methods of proportional scheduling: lottery scheduling and stride scheduling.
Linux CFS: dynamic time slices, weights, red-black trees, and I/O handling.

(archived) OS Notes

Computer System Overview

Operating System Overview

Process Description and Control

Process

Threads

Concurrency

Terminology

algorithm

Dekker

Peterson's Algorithm

Hardware Disabling

Special Instructions

Semaphores

Terminology

Operations

Steps

Problems

Monitors

Composition

Message Passing

Deadlock & Starvation

Why Deadlock Occurs

Resources

Requirements

resolve

Allow Deadlock

Do Not Allow

Dining Philosopher's Problem

Memory Management

Fixed Partitioning

Dynamic Partitioning

Buddy System

Simple Paging

Simple Segmentation

Memory Management Requirements

Virtual Memory

Virtual Memory Paging

Virtual Memory Segmentation

OS Level

Read Strategies

Placement Strategy (Same as the Previous Placement Algorithm)

Page Replacement Policies

Resident Set Management

Clearance Strategies

Load Control

Processor Scheduling

Types

Algorithm Design

Requirements

algos

Real-time scheduling

EDF

RM

I/O

Buffering

Type

Disk Scheduling

transfer (fixed)

rotation (fixed)

Disk Scheduling Algorithms

File

What

Components

requirements

File Operation

directory

Single Level

Two-Level Structure

hierarchical

Access Control

File Organization

Pile File

Sequential File

Index Sequential

Indexed

Direct File or Hashed File

Storage Management

File Allocation

Free Space Management