现充|junyu33

(archived) Probability and statistics notes

Also using this as LATEX practice.

Grade Distribution

Exam Score: 50%

Usual Performance: 50%

Features

Probability Section

Fundamentals of Probability Theory

Use A,B,C, etc., to denote events.

A denotes the complementary event of A (complementarity is a sufficient condition for mutual exclusivity).

AB=AB

AB=AB=AAB

AB=AB

AB=AB

The probability of event A occurring after event B has occurred is P(A|B)=P(AB)P(B), where P(B)>0.

Law of Total Probability: Let B1,,Bm be a complete set of events, then P(A)=i=1mP(A|Bi)P(Bi).

Bayes' Theorem: Let B1,B2,,Bm be a complete set of events, then
P(Bi|A)=P(BiA)P(A)=P(Bi)P(A|Bi)i=1mP(A|Bi)P(Bi),
i.e., Bayes' Theorem is a combination of the multiplication rule and the law of total probability.

If P(AB)=P(A)P(B), then A and B are independent. Similarly, A and B, A and B, and A and B are also independent.

Binomial distribution probability: P(k)=Cnkpk(1p)nk

Multinomial distribution: Suppose a random experiment has k possible outcomes C1Ck, with probabilities p1pk. In N independent trials, let the random variables x1xk denote the number of occurrences of C1Ck, respectively. The probability that C1 occurs x1 times, C2 occurs x2 times, ..., Ck occurs xk times is

N!x1!x2!xk!p1x1p2x2pkxk, where i=1kxi=N and i=1kpi=1.

Random Variables and Their Distributions (Memorize)

A random variable X(ω) is a function, where ω is the sample space, and X is a mapping from the sample space to the real numbers.

The distribution function F(x)=P(Xx) has a domain of R, is non-decreasing (left limit is 0, right limit is 1), and is right-continuous.

The density function f(x) satisfies F(x)=xf(x)dx.

When f(x) is continuous, it simplifies to F(x)=f(x). Also, f(x)0 and f(x)dx=1.

Probability problems can be transformed into integrals of the probability density function: P(XG)=Gf(x)dx.

Non-continuous Distributions

Geometric and hypergeometric distributions are omitted.

Binomial distribution: XB(n,p), P(X=k)=Cnkpk(1p)nk

The binomial distribution is unimodal. Since

P{X=k}P{X=k1}=1+(n+1)pkkq, the maximum value is attained when k is closest to (n+1)p.

Poisson distribution: XP(λ), P(X=k)=λkk!eλ

The Poisson distribution corresponds to the normalization of the Taylor expansion terms of ex.

Poisson's theorem: The binomial distribution converges to the Poisson distribution. Specifically, for XB(n,p), when n is sufficiently large (≥100), it can be approximated as XP(λ), where λ=np.

Continuous Distributions

Uniform Distribution

f(x)={1bax[a,b]0others,xU(a,b)

Exponential Distribution

f(x)={λeλxx>00others,xe(λ)

Gamma Distribution

f(x)={βαΓ(α)xα1eβxx>00others,xΓ(α,β)

Γ(x)=0tx1etdt

Methods for Finding the Distribution of a Function of a Random Variable Y = f(X)

Taking the probability density function f(x) as an example:

For problems requiring case-by-case discussion:

Note: When integrating +C, pay attention to the continuity of the distribution function.

Multidimensional (Two-Dimensional) Random Variables and Their Distributions

Two-Dimensional Discrete Random Variable Distribution

It is essentially a table where the sum of all entries is 1. To find probabilities, simply add the probabilities at the corresponding positions.

Marginal Distributions:

Conditional Distribution Laws:

Two-Dimensional Continuous Random Variable Distribution

Equivalent to evaluating double integrals, with attention to the domain of integration.

Often, the property that the integral over the domain sums to 1 is used to find parameters, and then the double integral over a specified region is evaluated to compute probabilities.

Two-dimensional distribution function and density function:
F(x,y)=P(Xx,Yy)=xyf(u,v)dudv

P(x1<Xx2,y1<Yy2)=F(x2,y2)F(x1,y2)F(x2,y1)+F(x1,y1)

Marginal densities:

Conditional distribution functions:

Common question types:

One density, two conditionals, two marginals

  • Given one density, find the remaining four.
  • Given one conditional and its corresponding marginal, find the remaining three.

Given the two-dimensional density function f(x,y) and Z=g(X,Y), find the density function fZ(z) of Z:

  • Determine the effective region R of f(x,y).
  • Compute FZ(z)=(x,y)Rf(x,y)dxdy, paying attention to case analysis.
  • Differentiate with respect to z to obtain the density function fZ(z).

Farewell to convolution

If Z=max(X,Y) and X,Y are independent, then FZ(z)=FX(z)FY(z).

If Z=min(X,Y) and X,Y are independent, then FZ(z)=1(1FX(z))(1FY(z)).

Mathematical Expectation

Definition of Mathematical Expectation

Expectation for discrete variables: E(X)=Σk=1xkPk

Expectation for continuous variables: E(X)=xf(x)dx

The mathematical expectation exists when the sum of the series (or the integral) is absolutely convergent.

For the two-dimensional case, it can also be calculated as:

E(X)=xfX(x)dx

E(Y)=yfY(y)dy

Expectation of a Function of a Random Variable

E(g(x))=Σk=1g(xk)Pk=g(x)f(x)dx

E(g(x,y))=ΣΣg(xi,yj)Pij=g(x,y)f(x,y)dxdy

Properties of Mathematical Expectation

E(C)=C

E(C1X+C2Y)=C1E(X)+C2E(Y)

If X and Y are independent, then E(XY)=E(X)E(Y)

Variance

Definition of Variance

D(X)=E(XE(X))2=E(X2)E(X)2

Let E(X)=c,

=E(Xc)2=E(X22cX+c2)

=E(X2)2cE(x)+c2

=E(X2)c2=E(X2)E(X)2

The standard deviation is D(X)

Properties of Variance

D(C)=0

D(aX)=a2D(X)

When X and Y are independent, D(X±Y)=D(X)+D(Y)

When X and Y are independent, D(Σi=1nciXi)=Σi=1nci2D(Xi)

D(X)=0c,P(X=c)=1, but this does not imply X=c (Similarly, an event with probability 1 is not necessarily a certain event)

Coefficient of variation: Cv=D(X)|E(X)|

Common Distributions: Expectations and Variances (Memorize)

E(Γ(α,β))=αβ,D(Γ(α,β))=αβ2

Raw Moments and Central Moments

mk=E(Xk)

μk=E(XE(X))k

Therefore, variance is the second-order central moment.

Covariance and Correlation Coefficient

Definition of Covariance

Cov(X,Y)=E((XEX)(YEY))=E(XY)E(X)E(Y)

The proof is similar to that of variance and is omitted here.

Properties of Covariance

Cov(X,X)=D(X)

Cov(X,Y)=Cov(Y,X)

Cov(X,a)=0

Cov(aX,bY)=abCov(X,Y)

Cov(X+Y,Z)=Cov(X,Z)+Cov(Y,Z)

D(X±Y)=D(X)+D(Y)±2Cov(X,Y)

If X and Y are independent, then Cov(X,Y)=0

Proof: This follows directly from the properties of expectation.

Consequently, if X and Y are independent, D(X±Y)=D(X)+D(Y)

Standardization of Random Variables

X=XE(X)D(X)

It has an expectation of 0, a variance of 1, and is dimensionless.

Definition of Correlation Coefficient (Memorize)

The correlation coefficient (X,Y)=Cov(X,Y)=1D(X)1D(Y)Cov(X,Y)

Obviously, X and Y cannot be constants.

To compute the correlation coefficient, five expectations are required: E(X),E(Y),E(X2),E(Y2),E(XY)

That is, R(X,Y)=E(XY)E(X)E(Y)E(X2)E(X)2E(Y2)E(Y)2

Properties of the Correlation Coefficient

0|R(X,Y)|1

When R(X,Y)=1, t0>0,P(Y=t0X)=1, meaning X and Y are positively correlated.

When R(X,Y)=1, t0<0,P(Y=t0X)=1, meaning X and Y are negatively correlated.

R(X,Y)=0 indicates that X and Y are uncorrelated, which is a necessary condition for the independence of X and Y.

To prove that X and Y are not independent, one should select an appropriate interval such that P(X,Y)P(X)P(Y).

Normal Distribution

Standard Normal Distribution

N(0,1)=φ(x)=12πex22,xR

Even function, bell-shaped curve.

Φ(x)=xφ(x)dx

Φ(0)=12
Φ(x)+Φ(x)=1
These two properties are often used in exams when calculating probabilities, and Φ(x) should be retained instead of Φ(x) in the answer (or when looking up values in a table).

Normal Distribution

Since XμσN(0,1), it follows that XN(μ,σ2), and thus F(x)=Φ(xμσ).

Differentiating yields N(μ,σ2)=1σ2πe(xμ)22σ2,xR.

μ determines the horizontal shift of the graph (expectation), and σ determines the "peakedness" of the graph (standard deviation).

A linear combination of multiple independent normal distributions is still a normal distribution.

In particular, if they are all N(μ,σ2), their average is N(μ,σ2n).

Bivariate Normal Distribution

Special Case:

(X,Y)(μ1,μ2;σ12,σ22)=N(μ1,σ12)N(μ2,σ22)

General Case (Including Correlation Coefficient):

(X,Y)(μ1,μ2;σ12,σ22;r)

=1σ1σ22π1r2e12(1r2)((xμ1σ1)2+(yμ2σ2)22(xμ1σ1)(yμ2σ2)r)

For the bivariate normal distribution, its marginal and conditional distributions are normal. Additionally, being uncorrelated is equivalent to being independent.

Natural Exponential Family

f(x,θ)=eθxφ(θ)h(x)

Among common distributions, all except the uniform distribution can be expressed in this form.

Its mean parameter (expectation m) is φ(θ), and the variance function is φ(2)(θ).

Limit Theorems

Chebyshev's Inequality

P(|XE(X)|ε)D(x)ε2

P(|XE(X)|<ε)1D(x)ε2

Law of Large Numbers

Consider a sequence of random variables {Xn} and their mean X.

If XE(X)P0, then {Xn} satisfies the Law of Large Numbers.

Central Limit Theorem

Let Xi be independent and identically distributed. Then Zn=i=1nXi approximately follows a normal distribution.

E(Xk)=μ, D(Xk)=σ2P(nXnμσnx)=Φ(x)

It follows that Z approximately follows N(nμ,nσ2), and X approximately follows N(μ,σ2n).

Suppose XB(n,p), and let Zn=i=1nXi

limnP(Xnnpnpqx)=Φ(x)

That is, Z approximately follows N(np,npq)

Statistics Section

Common Distributions

Chi-Square Distribution — Sum of Squares of Normals

The sum of squares of normally distributed variables XiN(0,1).

χ2=Σi=1nXi2 follows a chi-square distribution with n degrees of freedom, denoted as χ2(n).

χ2(n)=Γ(n2,12), hence E(χ2)=n, D(χ2)=2n.

The chi-square distribution satisfies additivity.

t-Distribution — Normal Divided by a Number

XN(0,1),Yχ2(n)

t(n)=XYn

The distribution is similar to the normal distribution but has thicker tails.

F-Distribution — Ratio of Normal Sums of Squares

Xχ2(n),Yχ2(m)

F(n,m)=X/nY/m

1F=F(m,n)

Common Statistical Measures

Sample Mean

X=1nΣi=1nXi

Sample Variance

S2=1n1Σi=1n(XiX)2 (Note: it is n-1, not n)

Sampling Distribution Theorem

Case 1 (Known Variance)

Let the sample X1,X2,,Xn be drawn from N(μ,σ2). Then,

χ2=(n1)S2σ2=1σ2(XiX)2χ2(n1),

and X and χ2 are independent.

Part 2 (Known Mean)

The sample X1,X2,,Xn is drawn from N(μ,σ2). Then,

t=XμS1nt(n1)

Third (Multiple Populations)

Probability and Statistics has successfully turned into a liberal art

If XN(μ1,σ2) and YN(μ2,σ2), then:

XYN(μ1μ2,σ12n1+σ22n2)

(n11)S12+(n21)S22σ2χ2(n1+n22)

(XY)(μ1μ2)Sw1n1+1n2t(n1+n22)

where Sw2=(n11)S12+(n21)S22n1+n22

If XN(μ1,σ12) and YN(μ2,σ22), then:

S12/σ12S22/σ22F(n11,n21)

Point Estimation

Method of Moments Estimation

Set the sample mean equal to the expectation function (which can also be the squared expectation), express the parameter θ of the expectation function in terms of the expected value m to obtain θ(m), and then substitute m with x to obtain θ^=θ(x).

Maximum Likelihood Estimation

Discrete Case: Write the probability of the observed event as a function of the parameter θ (usually a product of some discrete distribution), and assume this function reaches its maximum. Solve the logarithmic maximum likelihood equation dL(x)dθ=0 to obtain the corresponding parameter estimate θ^.

Continuous Case: The function to be considered is i=1nf(xi,θ), where xi is treated as a constant. The subsequent steps are the same.

The maximum likelihood estimate is the maximum value of xi, and the maximum likelihood estimator is the maximum value of Xi.

Criteria for Estimator Evaluation

Unbiased Estimator: An estimator is unbiased if the expected value of the estimate equals the true parameter, i.e., E(θ^)=θ; otherwise, it is biased.

Asymptotically Unbiased Estimator: An estimator is asymptotically unbiased if limnE(θ^)θ=0.

Efficiency Criterion: If D(θ^1)D(θ^2), then θ^1 is more efficient than θ^2.

Consistency Criterion: If θ^nPθ, then θ^n is a consistent estimator of θ.

Mean Squared Error (MSE) Criterion: If E(θ^1θ)2E(θ^2θ)2, then θ^1 is more efficient than θ^2 in terms of MSE.

Specifically, B2 is more efficient than S2 in terms of MSE.

Interval Estimation (Memorize)

Only for normal distribution

Two-Sided Confidence Interval

Estimate the (1α) confidence interval for μ.

If σ2 is known, it is (Xσnu1α2,X+σnu1α2).

If σ2 is unknown, it is (XSnt1α2(n1),X+Snt1α2(n1)).

Estimate the (1α) confidence interval for σ2.

((n1)s2χ1α22(n1),(n1)s2χα22(n1))

One-Sided Confidence

Estimate the (1α) confidence interval for μ.

If σ2 is known, the one-sided lower confidence limit is Xσnu1α, and the upper limit is X+σnu1α.

If σ2 is unknown, the one-sided lower confidence limit is XSnt1α(n1), and the upper limit is X+Snt1α(n1).

Estimate the (1α) confidence interval for σ2.

The one-sided lower confidence limit is (n1)s2χ1α2(n1), and the upper limit is (n1)s2χα2(n1).

Hypothesis Testing

Process

Proof by contradiction with a probabilistic nature

  1. First, state the null hypothesis H0 and the alternative hypothesis H1.
  2. Under the assumption that H0 holds, construct the distribution satisfied by the sample.
  3. Determine the rejection region W based on the value of α.
  4. Substitute the observed value u. If uW, reject the null hypothesis.

Type I and Type II Errors and Their Probabilities

The probability of a Type I error, P1=P(W), is less than or equal to α.

The probability of a Type II error is P2=P(W).

P1 and P2 cannot be reduced simultaneously. However, by fixing one of them and increasing the sample size n, the other can be reduced. The choice of which one to reduce depends on the severity of their respective consequences.