(archived) Probability and statistics notes

Also using this as $L A T E X$ practice.

Grade Distribution

Exam Score: 50%

Usual Performance: 50%

Homework
Midterm + Quizzes
Non-standard assessments
In-class performance

Features

Emphasis on comprehension and memorization
Requires strong calculus foundation
Fixed question types

Probability Section

Fundamentals of Probability Theory

Use $A, B, C$ , etc., to denote events.

$\overset{―}{A}$ denotes the complementary event of $A$ (complementarity is a sufficient condition for mutual exclusivity).

$A B = A \cap B$

$A - B = A \overset{―}{B} = A - A B$

$\overset{―}{A \cap B} = \overset{―}{A} \cup \overset{―}{B}$

$\overset{―}{A \cup B} = \overset{―}{A} \cap \overset{―}{B}$

The probability of event $A$ occurring after event $B$ has occurred is $P (A | B) = \frac{P (A B)}{P (B)}$ , where $P (B) > 0$ .

Law of Total Probability: Let $B_{1}, \dots, B_{m}$ be a complete set of events, then $P (A) = \sum_{i = 1}^{m} P (A | B_{i}) P (B_{i})$ .

Bayes' Theorem: Let $B_{1}, B_{2}, \dots, B_{m}$ be a complete set of events, then
$P (B_{i} | A) = \frac{P (B_{i} A)}{P (A)} = \frac{P (B_{i}) P (A | B_{i})}{\sum_{i = 1}^{m} P (A | B_{i}) P (B_{i})}$ ,
i.e., Bayes' Theorem is a combination of the multiplication rule and the law of total probability.

If $P (A B) = P (A) P (B)$ , then $A$ and $B$ are independent. Similarly, $A$ and $\overset{―}{B}$ , $\overset{―}{A}$ and $\overset{―}{B}$ , and $\overset{―}{A}$ and $B$ are also independent.

Binomial distribution probability: $P (k) = C_{n}^{k} p^{k} (1 - p)^{n - k}$

Multinomial distribution: Suppose a random experiment has $k$ possible outcomes $C_{1} \sim C_{k}$ , with probabilities $p_{1} \sim p_{k}$ . In $N$ independent trials, let the random variables $x_{1} \sim x_{k}$ denote the number of occurrences of $C_{1} \sim C_{k}$ , respectively. The probability that $C_{1}$ occurs $x_{1}$ times, $C_{2}$ occurs $x_{2}$ times, ..., $C_{k}$ occurs $x_{k}$ times is

$\frac{N!}{x_{1}! x_{2}! \dots x_{k}!} p_{1}^{x_{1}} p_{2}^{x_{2}} \dots p_{k}^{x_{k}}$ , where $\sum_{i = 1}^{k} x_{i} = N$ and $\sum_{i = 1}^{k} p_{i} = 1$ .

Random Variables and Their Distributions (Memorize)

A random variable $X (ω)$ is a function, where $ω$ is the sample space, and $X$ is a mapping from the sample space to the real numbers.

The distribution function $F (x) = P (X \leq x)$ has a domain of $R$ , is non-decreasing (left limit is 0, right limit is 1), and is right-continuous.

The density function $f (x)$ satisfies $F (x) = \int_{- \infty}^{x} f (x) d x$ .

When $f (x)$ is continuous, it simplifies to $F^{'} (x) = f (x)$ . Also, $f (x) \geq 0$ and $\int_{- \infty}^{\infty} f (x) d x = 1$ .

Probability problems can be transformed into integrals of the probability density function: $P (X \in G) = \int_{G} f (x) d x$ .

Non-continuous Distributions

Geometric and hypergeometric distributions are omitted.

Binomial distribution: $X \sim B (n, p)$ , $P (X = k) = C_{n}^{k} p^{k} (1 - p)^{n - k}$

The binomial distribution is unimodal. Since

$\frac{P {X = k}}{P {X = k - 1}} = 1 + \frac{(n + 1) p - k}{k q}$ , the maximum value is attained when $k$ is closest to $(n + 1) p$ .

Poisson distribution: $X \sim P (λ)$ , $P (X = k) = \frac{λ^{k}}{k!} e^{- λ}$

The Poisson distribution corresponds to the normalization of the Taylor expansion terms of $e^{x}$ .

Poisson's theorem: The binomial distribution converges to the Poisson distribution. Specifically, for $X \sim B (n, p)$ , when $n$ is sufficiently large (≥100), it can be approximated as $X \sim P (λ)$ , where $λ = n p$ .

Continuous Distributions

Uniform Distribution

f (x) = {\begin{matrix} \frac{1}{b - a} & x \in [a, b] \\ 0 & o t h e r s \end{matrix}, x \sim U (a, b)

Exponential Distribution

f (x) = {\begin{matrix} λ e^{- λ x} & x > 0 \\ 0 & o t h e r s \end{matrix}, x \sim e (λ)

Gamma Distribution

f (x) = {\begin{matrix} \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x} & x > 0 \\ 0 & o t h e r s \end{matrix}, x \sim Γ (α, β)

$Γ (x) = \int_{0}^{\infty} t^{x - 1} e^{- t} d t$

Methods for Finding the Distribution of a Function of a Random Variable Y = f(X)

Taking the probability density function $f (x)$ as an example:

Determine the range of values corresponding to $Y$ .
Express $X$ in terms of $Y$ , and replace $F (X)$ with $F (Y)$ .
Differentiate $F (Y)$ to obtain $f (y)$ .

For problems requiring case-by-case discussion:

Determine the range of values corresponding to $Y$ and identify the breakpoints.
Within each interval, solve the inequality to obtain $X \in g (Y)$ , and compute $F (Y) = \int_{x \in g (Y)} f (x) d x$ .
Differentiate $F (Y)$ to obtain $f (y)$ , and summarize the results.

Note: When integrating $+ C$ , pay attention to the continuity of the distribution function.

Multidimensional (Two-Dimensional) Random Variables and Their Distributions

Two-Dimensional Discrete Random Variable Distribution

It is essentially a table where the sum of all entries is 1. To find probabilities, simply add the probabilities at the corresponding positions.

Marginal Distributions:

$p_{i ∙} = \sum_{j} p_{i j}$
$p_{∙ j} = \sum_{i} p_{i j}$

Conditional Distribution Laws:

$P (Y = y_{j} | X = x_{i}) = \frac{p_{i j}}{p_{i ∙}}$
$P (X = x_{i} | Y = y_{j}) = \frac{p_{i j}}{p_{∙ j}}$

Two-Dimensional Continuous Random Variable Distribution

Equivalent to evaluating double integrals, with attention to the domain of integration.

Often, the property that the integral over the domain sums to 1 is used to find parameters, and then the double integral over a specified region is evaluated to compute probabilities.

Two-dimensional distribution function and density function:
$F (x, y) = P (X \leq x, Y \leq y) = \int_{- \infty}^{x} \int_{- \infty}^{y} f (u, v) d u d v$

P (x_{1} < X \leq x_{2}, y_{1} < Y \leq y_{2}) = F (x_{2}, y_{2}) - F (x_{1}, y_{2}) - F (x_{2}, y_{1}) + F (x_{1}, y_{1})

Marginal densities:

$f_{X} (x) = f (x, + \infty) = \int_{- \infty}^{\infty} f (x, y) d y$
$f_{Y} (y) = f (+ \infty, y) = \int_{- \infty}^{\infty} f (x, y) d x$

Conditional distribution functions:

$F_{X | Y} (x | y) = \int_{- \infty}^{x} \frac{f (u, y)}{f_{Y} (y)} d u$ (the other half is omitted for brevity)
$f_{X | Y} (x | y) = \frac{f (x, y)}{f_{Y} (y)}, f_{Y | X} (y | x) = \frac{f (x, y)}{f_{X} (x)}$

Common question types:

One density, two conditionals, two marginals

Given one density, find the remaining four.

Given one conditional and its corresponding marginal, find the remaining three.

Given the two-dimensional density function $f (x, y)$ and $Z = g (X, Y)$ , find the density function $f_{Z} (z)$ of $Z$ :

Determine the effective region $R$ of $f (x, y)$ .

Compute $F_{Z} (z) = \iint_{(x, y) \in R} f (x, y) d x d y$ , paying attention to case analysis.

Differentiate with respect to $z$ to obtain the density function $f_{Z} (z)$ .

~~Farewell to convolution~~

If $Z = max (X, Y)$ and $X, Y$ are independent, then $F_{Z} (z) = F_{X} (z) F_{Y} (z)$ .

If $Z = min (X, Y)$ and $X, Y$ are independent, then $F_{Z} (z) = 1 - (1 - F_{X} (z)) (1 - F_{Y} (z))$ .

Mathematical Expectation

Definition of Mathematical Expectation

Expectation for discrete variables: $E (X) = Σ_{k = 1}^{\infty} x_{k} P_{k}$

Expectation for continuous variables: $E (X) = \int_{- \infty}^{\infty} x f (x) d x$

The mathematical expectation exists when the sum of the series (or the integral) is absolutely convergent.

For the two-dimensional case, it can also be calculated as:

$E (X) = \int_{- \infty}^{\infty} x f_{X} (x) d x$

$E (Y) = \int_{- \infty}^{\infty} y f_{Y} (y) d y$

Expectation of a Function of a Random Variable

$E (g (x)) = Σ_{k = 1}^{\infty} g (x_{k}) P_{k} = \int_{- \infty}^{\infty} g (x) f (x) d x$

$E (g (x, y)) = Σ Σ g (x_{i}, y_{j}) P_{i j} = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} g (x, y) f (x, y) d x d y$

Properties of Mathematical Expectation

$E (C) = C$

$E (C_{1} X + C_{2} Y) = C_{1} E (X) + C_{2} E (Y)$

If $X$ and $Y$ are independent, then $E (X Y) = E (X) E (Y)$

Variance

Definition of Variance

$D (X) = E (X - E (X))^{2} = E (X^{2}) - E (X)^{2}$

Let $E (X) = c$ ,

$= E (X - c)^{2} = E (X^{2} - 2 c X + c^{2})$

$= E (X^{2}) - 2 c E (x) + c^{2}$

$= E (X^{2}) - c^{2} = E (X^{2}) - E (X)^{2}$

The standard deviation is $\sqrt{D (X)}$

Properties of Variance

$D (C) = 0$

$D (a X) = a^{2} D (X)$

When $X$ and $Y$ are independent, $D (X \pm Y) = D (X) + D (Y)$

When $X$ and $Y$ are independent, $D (Σ_{i = 1}^{n} c_{i} X_{i}) = Σ_{i = 1}^{n} c_{i}^{2} D (X_{i})$

$D (X) = 0 ⟺ \exists c, P (X = c) = 1$ , but this does not imply $X = c$ (Similarly, an event with probability 1 is not necessarily a certain event)

Coefficient of variation: $C_{v} = \frac{\sqrt{D (X)}}{| E (X) |}$

Common Distributions: Expectations and Variances (Memorize)

$E (Γ (α, β)) = \frac{α}{β}, D (Γ (α, β)) = \frac{α}{β^{2}}$

Raw Moments and Central Moments

$m_{k} = E (X^{k})$

$μ_{k} = E (X - E (X))^{k}$

Therefore, variance is the second-order central moment.

Covariance and Correlation Coefficient

Definition of Covariance

$Cov (X, Y) = E ((X - E X) (Y - E Y)) = E (X Y) - E (X) E (Y)$

The proof is similar to that of variance and is omitted here.

Properties of Covariance

$Cov (X, X) = D (X)$

$Cov (X, Y) = Cov (Y, X)$

$Cov (X, a) = 0$

$Cov (a X, b Y) = a b Cov (X, Y)$

$Cov (X + Y, Z) = Cov (X, Z) + Cov (Y, Z)$

$D (X \pm Y) = D (X) + D (Y) \pm 2 Cov (X, Y)$

If $X$ and $Y$ are independent, then $Cov (X, Y) = 0$

Proof: This follows directly from the properties of expectation.

Consequently, if $X$ and $Y$ are independent, $D (X \pm Y) = D (X) + D (Y)$

Standardization of Random Variables

$X^{*} = \frac{X - E (X)}{\sqrt{D (X)}}$

It has an expectation of 0, a variance of 1, and is dimensionless.

Definition of Correlation Coefficient (Memorize)

The correlation coefficient $(X, Y) = Cov (X^{*}, Y^{*}) = \frac{1}{\sqrt{D (X)}} \frac{1}{\sqrt{D (Y)}} Cov (X, Y)$

Obviously, $X$ and $Y$ cannot be constants.

To compute the correlation coefficient, five expectations are required: $E (X), E (Y), E (X^{2}), E (Y^{2}), E (X Y)$

That is, $R (X, Y) = \frac{E (X Y) - E (X) E (Y)}{\sqrt{E (X^{2}) - E (X)^{2}} \sqrt{E (Y^{2}) - E (Y)^{2}}}$

Properties of the Correlation Coefficient

$0 \leq | R (X, Y) | \leq 1$

When $R (X, Y) = 1$ , $\exists t_{0} > 0, P (Y = t_{0} X) = 1$ , meaning $X$ and $Y$ are positively correlated.

When $R (X, Y) = - 1$ , $\exists t_{0} < 0, P (Y = t_{0} X) = 1$ , meaning $X$ and $Y$ are negatively correlated.

$R (X, Y) = 0$ indicates that $X$ and $Y$ are uncorrelated, which is a necessary condition for the independence of $X$ and $Y$ .

To prove that $X$ and $Y$ are not independent, one should select an appropriate interval such that $P (X, Y) \neq P (X) P (Y)$ .

Normal Distribution

Standard Normal Distribution

$N (0, 1) = φ (x) = \frac{1}{\sqrt{2 π}} e^{- \frac{x^{2}}{2}}, x \in R$

Even function, bell-shaped curve.

$Φ (x) = \int_{- \infty}^{x} φ (x) d x$

$Φ (0) = \frac{1}{2}$
$Φ (x) + Φ (- x) = 1$
These two properties are often used in exams when calculating probabilities, and $Φ (x)$ should be retained instead of $Φ (- x)$ in the answer (or when looking up values in a table).

Normal Distribution

Since $\frac{X - μ}{σ} \sim N (0, 1)$ , it follows that $X \sim N (μ, σ^{2})$ , and thus $F (x) = Φ (\frac{x - μ}{σ})$ .

Differentiating yields $N (μ, σ^{2}) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}, x \in R$ .

$μ$ determines the horizontal shift of the graph (expectation), and $σ$ determines the "peakedness" of the graph (standard deviation).

A linear combination of multiple independent normal distributions is still a normal distribution.

In particular, if they are all $N (μ, σ^{2})$ , their average is $N (μ, \frac{σ^{2}}{n})$ .

Bivariate Normal Distribution

Special Case:

$(X, Y) \sim (μ_{1}, μ_{2}; σ_{1}^{2}, σ_{2}^{2}) = N (μ_{1}, σ_{1}^{2}) N (μ_{2}, σ_{2}^{2})$

General Case (Including Correlation Coefficient):

$(X, Y) \sim (μ_{1}, μ_{2}; σ_{1}^{2}, σ_{2}^{2}; r)$

$= \frac{1}{σ_{1} σ_{2} 2 π \sqrt{1 - r^{2}}} e^{- \frac{1}{2 (1 - r^{2})} ((\frac{x - μ_{1}}{σ_{1}})^{2} + (\frac{y - μ_{2}}{σ_{2}})^{2} - 2 (\frac{x - μ_{1}}{σ_{1}}) (\frac{y - μ_{2}}{σ_{2}}) r)}$

For the bivariate normal distribution, its marginal and conditional distributions are normal. Additionally, being uncorrelated is equivalent to being independent.

Natural Exponential Family

$f (x, θ) = e^{θ x - φ (θ)} h (x)$

Among common distributions, all except the uniform distribution can be expressed in this form.

Its mean parameter (expectation $m$ ) is $φ^{'} (θ)$ , and the variance function is $φ^{(2)} (θ)$ .

Limit Theorems

Chebyshev's Inequality

$P (| X - E (X) | \geq ε) \leq \frac{D (x)}{ε^{2}}$

$P (| X - E (X) | < ε) \geq 1 - \frac{D (x)}{ε^{2}}$

Law of Large Numbers

Consider a sequence of random variables ${X_{n}}$ and their mean $\overset{―}{X}$ .

If $\overset{―}{X} - E (\overset{―}{X}) \overset{P}{⟶} 0$ , then ${X_{n}}$ satisfies the Law of Large Numbers.

Chebyshev's Law of Large Numbers:

A necessary and sufficient condition is that the variances of the random variables are uniformly bounded, i.e., there exists a constant $C$ such that $D (X_{k}) \leq C$ , which ensures compliance with the Law of Large Numbers.
Law of Large Numbers for Independent and Identically Distributed (i.i.d.) Variables:

As long as the $X_{k}$ are independent and identically distributed with $E (X_{k}) = μ$ and $D (X_{k}) = σ^{2}$ , the Law of Large Numbers is guaranteed to hold.
Bernoulli's Law of Large Numbers:

If $X_{k} \sim B (n, p)$ , then $\frac{n_{A}}{n} \overset{P}{⟶} p$ .

Central Limit Theorem

Let $X_{i}$ be independent and identically distributed. Then $Z_{n} = \sum_{i = 1}^{n} X_{i}$ approximately follows a normal distribution.

$E (X_{k}) = μ$ , $D (X_{k}) = σ^{2} \to P (\frac{n \overset{―}{X} - n μ}{σ \sqrt{n}} \leq x) = Φ (x)$

It follows that $Z$ approximately follows $N (n μ, n σ^{2})$ , and $\overset{―}{X}$ approximately follows $N (μ, \frac{σ^{2}}{n})$ .

Suppose $X \sim B (n, p)$ , and let $Z_{n} = \sum_{i = 1}^{n} X_{i}$

$lim_{n \to \infty} P (\frac{X_{n} - n p}{\sqrt{n p q}} \leq x) = Φ (x)$

That is, $Z$ approximately follows $N (n p, n p q)$

Statistics Section

Common Distributions

Chi-Square Distribution — Sum of Squares of Normals

The sum of squares of normally distributed variables $X_{i} \sim N (0, 1)$ .

$χ^{2} = Σ_{i = 1}^{n} X_{i}^{2}$ follows a chi-square distribution with $n$ degrees of freedom, denoted as $χ^{2} (n)$ .

$χ^{2} (n) = Γ (\frac{n}{2}, \frac{1}{2})$ , hence $E (χ^{2}) = n$ , $D (χ^{2}) = 2 n$ .

The chi-square distribution satisfies additivity.

t-Distribution — Normal Divided by a Number

$X \sim N (0, 1), Y \sim χ^{2} (n)$

$t (n) = \frac{X}{\sqrt{\frac{Y}{n}}}$

The distribution is similar to the normal distribution but has thicker tails.

F-Distribution — Ratio of Normal Sums of Squares

$X \sim χ^{2} (n), Y \sim χ^{2} (m)$

$F (n, m) = \frac{X / n}{Y / m}$

$\frac{1}{F} = F (m, n)$

Common Statistical Measures

Sample Mean

$\overset{―}{X} = \frac{1}{n} Σ_{i = 1}^{n} X_{i}$

Sample Variance

$S^{2} = \frac{1}{n - 1} Σ_{i = 1}^{n} (X_{i} - \overset{―}{X})^{2}$ (Note: it is n-1, not n)

Sampling Distribution Theorem

Case 1 (Known Variance)

Let the sample $X_{1}, X_{2}, \dots, X_{n}$ be drawn from $N (μ, σ^{2})$ . Then,

$χ^{2} = \frac{(n - 1) S^{2}}{σ^{2}} = \frac{1}{σ^{2}} \sum (X_{i} - \overset{―}{X})^{2} \sim χ^{2} (n - 1)$ ,

and $\overset{―}{X}$ and $χ^{2}$ are independent.

Part 2 (Known Mean)

The sample $X_{1}, X_{2}, \dots, X_{n}$ is drawn from $N (μ, σ^{2})$ . Then,

$t = \frac{\overset{―}{X} - μ}{S \sqrt{\frac{1}{n}}} \sim t (n - 1)$

Third (Multiple Populations)

Probability and Statistics has successfully turned into a liberal art

If $X \sim N (μ_{1}, σ^{2})$ and $Y \sim N (μ_{2}, σ^{2})$ , then:

$\overset{―}{X} - \overset{―}{Y} \sim N (μ_{1} - μ_{2}, \frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}})$

$\frac{(n_{1} - 1) S_{1}^{2} + (n_{2} - 1) S_{2}^{2}}{σ^{2}} \sim χ^{2} (n_{1} + n_{2} - 2)$

$\frac{(\overset{―}{X} - \overset{―}{Y}) - (μ_{1} - μ_{2})}{S_{w} \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} \sim t (n_{1} + n_{2} - 2)$

where $S_{w}^{2} = \frac{(n_{1} - 1) S_{1}^{2} + (n_{2} - 1) S_{2}^{2}}{n_{1} + n_{2} - 2}$

If $X \sim N (μ_{1}, σ_{1}^{2})$ and $Y \sim N (μ_{2}, σ_{2}^{2})$ , then:

$\frac{S_{1}^{2} / σ_{1}^{2}}{S_{2}^{2} / σ_{2}^{2}} \sim F (n_{1} - 1, n_{2} - 1)$

Point Estimation

Method of Moments Estimation

Set the sample mean equal to the expectation function (which can also be the squared expectation), express the parameter $θ$ of the expectation function in terms of the expected value $m$ to obtain $θ (m)$ , and then substitute $m$ with $\overset{―}{x}$ to obtain $\hat{θ} = θ (\overset{―}{x})$ .

Maximum Likelihood Estimation

Discrete Case: Write the probability of the observed event as a function of the parameter $θ$ (usually a product of some discrete distribution), and assume this function reaches its maximum. Solve the logarithmic maximum likelihood equation $\frac{d L (x)}{d θ} = 0$ to obtain the corresponding parameter estimate $\hat{θ}$ .

Continuous Case: The function to be considered is $\prod_{i = 1}^{n} f (x_{i}, θ)$ , where $x_{i}$ is treated as a constant. The subsequent steps are the same.

The maximum likelihood estimate is the maximum value of $x_{i}$ , and the maximum likelihood estimator is the maximum value of $X_{i}$ .

Criteria for Estimator Evaluation

Unbiased Estimator: An estimator is unbiased if the expected value of the estimate equals the true parameter, i.e., $E (\hat{θ}) = θ$ ; otherwise, it is biased.

Asymptotically Unbiased Estimator: An estimator is asymptotically unbiased if $lim_{n \to \infty} E (\hat{θ}) - θ = 0$ .

Efficiency Criterion: If $D ({\hat{θ}}_{1}) \leq D ({\hat{θ}}_{2})$ , then ${\hat{θ}}_{1}$ is more efficient than ${\hat{θ}}_{2}$ .

Consistency Criterion: If ${\hat{θ}}_{n} \overset{P}{⟶} θ$ , then ${\hat{θ}}_{n}$ is a consistent estimator of $θ$ .

Mean Squared Error (MSE) Criterion: If $E ({\hat{θ}}_{1} - θ)^{2} \leq E ({\hat{θ}}_{2} - θ)^{2}$ , then ${\hat{θ}}_{1}$ is more efficient than ${\hat{θ}}_{2}$ in terms of MSE.

Specifically, $B_{2}$ is more efficient than $S^{2}$ in terms of MSE.

Interval Estimation (Memorize)

Only for normal distribution

Two-Sided Confidence Interval

Estimate the $(1 - α)$ confidence interval for $μ$ .

If $σ^{2}$ is known, it is $(\overset{―}{X} - \frac{σ}{\sqrt{n}} u_{1 - \frac{α}{2}}, \overset{―}{X} + \frac{σ}{\sqrt{n}} u_{1 - \frac{α}{2}})$ .

If $σ^{2}$ is unknown, it is $(\overset{―}{X} - \frac{S}{\sqrt{n}} t_{1 - \frac{α}{2}} (n - 1), \overset{―}{X} + \frac{S}{\sqrt{n}} t_{1 - \frac{α}{2}} (n - 1))$ .

Estimate the $(1 - α)$ confidence interval for $σ^{2}$ .

$(\frac{(n - 1) s^{2}}{χ_{1 - \frac{α}{2}}^{2} (n - 1)}, \frac{(n - 1) s^{2}}{χ_{\frac{α}{2}}^{2} (n - 1)})$

One-Sided Confidence

Estimate the $(1 - α)$ confidence interval for $μ$ .

If $σ^{2}$ is known, the one-sided lower confidence limit is $\overset{―}{X} - \frac{σ}{\sqrt{n}} u_{1 - α}$ , and the upper limit is $\overset{―}{X} + \frac{σ}{\sqrt{n}} u_{1 - α}$ .

If $σ^{2}$ is unknown, the one-sided lower confidence limit is $\overset{―}{X} - \frac{S}{\sqrt{n}} t_{1 - α} (n - 1)$ , and the upper limit is $\overset{―}{X} + \frac{S}{\sqrt{n}} t_{1 - α} (n - 1)$ .

Estimate the $(1 - α)$ confidence interval for $σ^{2}$ .

The one-sided lower confidence limit is $\frac{(n - 1) s^{2}}{χ_{1 - α}^{2} (n - 1)}$ , and the upper limit is $\frac{(n - 1) s^{2}}{χ_{α}^{2} (n - 1)}$ .

Hypothesis Testing

Process

Proof by contradiction with a probabilistic nature

First, state the null hypothesis $H_{0}$ and the alternative hypothesis $H_{1}$ .
Under the assumption that $H_{0}$ holds, construct the distribution satisfied by the sample.
Determine the rejection region $W$ based on the value of $α$ .
Substitute the observed value $u$ . If $u \in W$ , reject the null hypothesis.

Type I and Type II Errors and Their Probabilities

Type I Error (False Positive): When $H_{0}$ is true, the sample value $u \in W$ .
Type II Error (False Negative): When $H_{1}$ is true, the sample value $u \notin W$ .

The probability of a Type I error, $P_{1} = P (W)$ , is less than or equal to $α$ .

The probability of a Type II error is $P_{2} = P (\overset{―}{W})$ .

$P_{1}$ and $P_{2}$ cannot be reduced simultaneously. However, by fixing one of them and increasing the sample size $n$ , the other can be reduced. The choice of which one to reduce depends on the severity of their respective consequences.