# Objectives 目标

  • Understand Uniform Distribution and Normal Distribution
    了解均匀分布正态分布
  • Know how to generate the uniform and normal random numbers
    知道如何生成均匀和正态随机数
  • Know how to use q- function to find the quantiles of the normal distribution
    知道如何使用 q- 函数找到正态分布的分位数

# Uniform Distribution 均匀分布

Uniform Random Variable 均匀随机变量
A uniform random variable is a continuous random variable for which every outcome in an interval is equally likely.
均匀随机变量是一个连续的随机变量,在一个区间内,每个结果的可能性都相等。
  • Example: XX is a random real number taken from [0,1][0,1].
    XX 是一个随机实数取自 [0,1][0,1]

  • We can use runif(n) to generate random number taken from [0,1][0,1].
    我们可以 runif(n) 用来生成随机数取自 [0,1][0,1]

n <- 10 # number of sample size
x <- runif(n) # generate random numbers taken from the interval [0,1]
hist(x,probability=TRUE,col=gray(.9),main="uniform on [0,1]")
curve(dunif(x,0,1),add=T,col = "red")

What will happen if increasing the number of the sample size?
如果增加样本数量会发生什么?

n <- 1e6 # number of sample size
x <- runif(n) # generate random numbers taken from the interval [0,1]
hist(x,probability=TRUE,col=gray(.9),main="uniform on [0,1]")
curve(dunif(x,0,1),add=T,col = "red")

  • Notation: XU[a,b]X\sim U[a,b]

  • Probability Density Function pdf :
    f(x)=1baf(x) = \frac{1}{b-a} for axba\leq x\leq b

  • Cumulative Distribution Function cdf :
    For axba\leq x\leq b

    F(x)=P(aXx)=axf(t)dt=ax1badt=xaba.F(x)= P(a \leq X \le x) = \int_a^x f(t)d t=\int_a^x \frac{1}{b-a}d t=\frac{x-a}{b-a}.

  • Expectation & Variance 期望值与方差

μ=E(X)=abx1badx=x22(ba)ab=a+b2\mu = E(X) =\int_a^b x\frac{1}{b-a} d x=\left.\frac{x^2}{2(b-a)}\right|_{a}^b=\frac{a+b}{2}

σ2=var(X)=E(X2)[E(X)]2=abx21badx(a+b2)2=x33(ba)ab(a+b)24=(ba)212\sigma^2 = var(X) = E(X^2)-[E(X)]^2 = \int_a^b x^2\frac{1}{b-a} d x - \left(\frac{a+b}{2}\right)^2= \left.\frac{x^3}{3(b-a)}\right|_{a}^b-\frac{(a+b)^2}{4} = \frac{(b-a)^2}{12}

# Important R functions

  • To generate random numbers, pdf (aka pmf), cdf, and quantiles.
    生成随机数、pdf(又名 pmf)、cdf 和分位数。

  • The prefixes for these functions are:
    这些函数的前缀是:

rrandom number generation随机数生成
dprobability density function or probability mass function概率密度函数或概率质量函数
pcumulative distribution function累积分布函数
qquantiles分位数

# Example 1

Suppose XU[1,1]X \sim U[-1,1]

n <- 1e6 # number of sample size
x <- runif(n,-1,1) # generate random numbers taken from the interval [-1,1]
hist(x,probability=TRUE,col=gray(.9),main="uniform on [-1,1]") # histogram of the relative frequency - probability=TRUE
curve(dunif(x,-1,1),add=T,col = "red") #plot the probability density function of X

punif(0.5, min =-1, max = 1) # find the P(-1 <= X <= 0.5)
[1] 0.75
qunif(0.5, min = -1, max = 1) # find the median of X
[1] 0
mean(x) # find the sample mean
[1] 0.0002108174
var(x) # find the sample variance
[1] 0.3333544

# In-class Exercise: Uniform Distribution 均匀分布

  1. Suppose XU[2,3]X \sim U[-2,3], please generate the random numbers with sample size n=1e6n = 1e6.

    n <- 1e6 # number of sample size
    x <- runif(n, -2, 3) # generate random numbers taken from the interval [-2,3]
  2. Do a histogram for the instances you generated with the y-axis as relative frequency instead of frequency.
    使用 y 轴作为相对频率而不是频率为您生成的实例绘制直方图。

    hist(x, probability=TRUE, col=gray(.9), main="uniform on [-2,3]") # histogram of the relative frequency
  3. Find P(X0)P(X \le 0), P(X1)P(X \le 1), and P(X1)P(X \ge 1) (a little tricky).

    punif(0, min =-2, max = 3) # find the P( X <= 0 )
    [1] 0.4
    
    punif(1, min =-2, max = 3) # find the P( X <= 1 )
    [1] 0.6
    
    1 - punif(0, min =-2, max = 3) # find the P( X >= 1 )
    [1] 0.4
    
  4. Find Q1, median, Q3 of this random variable XX. What is the expectation value and variance of XX?
    找到这个随机变量XX 的 Q1、中位数、Q3。期望值和方差是多少?

    qunif(0.25, min = -2, max = 3) # Q1
    [1] -0.75
    
    qunif(0.5, min = -2, max = 3) # median
    [1] 0.5
    
    qunif(0.75, min = -2, max = 3) # Q3
    [1] 1.75
    

    The expectation is b+a2=3+(2)2=0.5\frac{b+a}{2}=\frac{3+(-2)}{2}=0.5 . The variance is b+a2=3+(2)2=0.5(ba)212=(3(2))212=2512=2.08333\frac{b+a}{2}=\frac{3+(-2)}{2}=0.5\frac{(b-a)^{2}}{12}=\frac{(3-(-2))^{2}}{12}=\frac{25}{12}=2.08333 .

  5. Find Q1, median, Q3, mean, and variance of the sample you generated. Compare your results with the answers in Ex.4.
    找出生成的样本的 Q1、中位数、Q3、均值和方差。将您的结果与例 4 中的答案进行比较。

    quantile(x)
            0%        25%        50%        75%       100% 
    -1.9999994 -0.7476943  0.5001483  1.7515364  2.9999982
    
    mean(x)
    [1] 0.5008553
    
    var(x)
    [1] 2.081609
    

# Normal Distribution 正态分布

  • The normal distribution (also be called Gaussian distribution) is a symmetric distribution that is centered around a mean and spreads out in both directions.
    正态分布(也被称为高斯分布)是围绕平均值和差居中出在两个方向上对称分布。

    Examples: Test scores for all ITM 514 students. 所有 ITM 514 学生的考试成绩。

  • Notation: XN(μ,σ2)X\sim \mathcal{N}(\mu, \sigma^2)
    符号

    • μ\mu is the mean of the distribution 分布的平均值
    • σ2\sigma^2 is the variance of the distribution 分布的方差
  • Probability Density Function pdf :

    f(x)=12πσe(xμ)22σ2,for<x<.f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}},\,\,\,\text{for } -\infty<x<\infty.

  • CDF: The CDF of normal distribution doesn't have a closed form, i.e., there is no analytic answer of the integral 正态分布的 CDF 没有封闭形式,即没有积分的解析答案

P(X<x)=P(Xx)=F(x)=xf(t)dt=12πσxe(tμ)22σ2dtP(X<x) = P(X \le x)=F(x) = \int_{-\infty}^x f(t)d t =\frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^x e^{-\frac{(t-\mu)^2}{2\sigma^2}} dt

P(x1<X<x2)=x1x2f(t)dt=12πσx1x2e(tμ)22σ2dt=F(x2)F(x1).P(x_1<X<x_2) = \int_{x_1}^{x_2}f(t)dt = \frac{1}{\sqrt{2\pi}\sigma}\int_{x_1}^{x^2} e^{-\frac{(t-\mu)^2}{2\sigma^2}} dt = F(x_2) -F(x_1).

# Example 2

Suppose ZZ is the standard normal random variable, i.e., ZN(0,1)Z \sim \mathcal{N}(0, 1)
假设 ZZ 是标准正态随机变量

n <- 1e6 # number of sample size
x <- rnorm(n, mean = 0, sd = 1) # generate the standard normal random numbers
hist(x, probability=TRUE, col=gray(.9), main="standard normal random numbers") # histogram of the relative frequency
curve(dnorm(x, mean = 0, sd =1),add=T,col = "red") #plot the probability density function of X

pnorm(0.5, mean =  0, sd = 1) # find the P(X <= 0.5) or P(X < 0.5)
[1] 0.6914625
qnorm(0.5, mean = 0, sd = 1) # find the median of X
[1] 0
mean(x) # find the sample mean
[1] -0.002525208
var(x)
[1] 1.002635

What's the relationship between the general normal random variable XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2) and the standard normal random variable?
一般正态随机变量和标准正态随机变量有什么关系

Z=XμσN(0,1).Z= \frac{X-\mu}{\sigma} \sim \mathcal{N}(0,1).

To calculate the probability involving XN(μ,σ2)X\sim \mathcal{N}(\mu,\sigma^2), 计算涉及的概率

P(aXb)=P(aμσXμσbμσ)=P(aμσZbμσ),

# Example 3

The achievement scores from a college entrance examination are normally distributed with mean 75 and standard deviation 10. What fraction of the scores lies between 80 and 90?
高考成绩的平均分是 75,标准差是 10 的正态分布。80 到 90 之间的分数是多少?

The scores XN(75,102)X\sim \mathcal{N}(75,10^2), then

P(80X90)=P(807510Z907510)=P(0.5Z1.5).P(80\leq X\leq 90) = P\left(\frac{80-75}{10}\leq Z\leq \frac{90-75}{10}\right) = P(0.5\leq Z\leq 1.5) .

p1<- pnorm(1.5, mean = 0, sd = 1) - pnorm(0.5, mean =  0, sd = 1) # find the P(0.5 <=Z <=1.5) 
p1 # display the answer
[1] 0.2417303
p2 <- pnorm(90, mean = 75, sd = 10) - pnorm(80, mean = 75, sd = 10) # find the P(80 <=X <=90)
p2 # display the answer
[1] 0.2417303

# Example 4

Find the value of z0z_0 such that 95%95\% of the standard normal ZZ values lie between z0-z_0 and z0z_0; that is, P(z0Zz0)=.95P(-z_0\leq Z\leq z_0) = .95.
找到 z0z_0 使 95 % 标准正常的 ZZ 值介于 z0-z_0z0z_0 之间

P(z0Zz0)=P(Zz0)P(Zz0)=(1P(Zz0))P(Zz0)=12P(Z<z0)=0.95P(Z<z0)=0.025.P(-z_0\le Z \le z_0) = P(Z \le z_0) -P(Z \le -z_0) = (1- P(Z \ge z_0)) - P(Z \le -z_0) = 1- 2P(Z <-z_0) = 0.95 \Rightarrow P(Z<-z_0) = 0.025.

How to find z0z_0 in R ?

p1 <- - qnorm(0.025, mean = 0, sd = 1)  # find the P(0.5 <=Z <=1.5) 
p1 # display the answer
[1] 0.2417303
p2 <- pnorm(90, mean = 75, sd = 10) - pnorm(80, mean = 75, sd = 10) # find the P(80 <=X <=90)
p2 # display the answer
[1] 0.2417303

# In-class Exercise

Suppose ZN(0,1)Z \sim \mathcal{N}(0,1),
can you find a y>0y>0 such P(yZy)=0.01P(-y \le Z \le y) = 0.01?
假设 ZN(0,1)Z \sim \mathcal{N}(0,1),你能找到一个 y>0y>0P(yZy)=0.01P(-y \le Z \le y) = 0.01?

Critical value zαz_{\alpha} of a standard normal distribution is the value on the measurement axis for which α\alpha of the are under the standard normal curve lies to the right of zαz_{\alpha}.
临界值 zαz_{\alpha} 的标准正态分布是在测量轴上的值,对于这个 α\alpha 位于标准正态曲线下方的 zαz_{\alpha}

# In-class Exercise: Normal Distribution

  1. Suppose XN(1,32)X \sim \mathcal{N}(-1,3^2), please generate the random numbers with sample size n=1e6n = 1e6.
    认为 XN(1,32)X \sim \mathcal{N}(-1,3^2),请生成样本大小的随机数 n=1e6n = 1e6.

    #1 Create the sample
    n <- 1e6
    x <- rnorm(n, mean = -1, sd = 3)
  2. Do a histogram for the instances you generated with the y-axis as relative frequency instead of frequency.
    使用 y 轴作为相对频率而不是频率为您生成的实例绘制直方图。

    #2 A histogram of x
    hist(x, probability = TRUE)
    curve(dnorm(x, mean = -1, sd = 3), add = T, col = "red")

  3. Find P(X0)P(X \le 0), P(X1)P(X \le 1), P(X1)P(X \ge 1), and P(0X1)P( 0\le X \le 1).
    P(X0)P(X \le 0), P(X1)P(X \le 1), P(X1)P(X \ge 1), and P(0X1)P( 0\le X \le 1)

    #3 P(X <=0)
    pnorm(0, mean = -1, sd = 3)
    [1] 0.6305587
    
    # P(X <= 1)
    pnorm(1, mean = -1, sd = 3)
    [1] 0.7475075
    
    # P(X >= 1)
    1 - pnorm(1, mean = -1, sd = 3)
    [1] 0.7475075
    
    # P(0 <= X <=1)
    pnorm(1, mean = -1, sd = 3) - pnorm(0, mean = -1, sd = 3)
  4. Find Q1, median, Q3 of this random variable XX. What is the expectation value and variance of XX?
    找到这个随机变量XX 的 Q1、中位数、Q3,XX 的期望值和方差是多少 XX?

    #4 Q1
    qnorm(0.25, min = -1, max = 3)
    [1] -3.023469
    
    #4 Median
    qnorm(0.5, min = -1, max = 3)
    [1] -1
    
    #4 Q3
    qnorm(0.75, min = -1, max = 3)
    [1] 1.023469
    
  5. Find Q1, median, Q3, mean, and variance of the sample you generated. Compare your results with the answers in Ex.4.
    找出您生成的样本的 Q1、中位数、Q3、均值和方差。将您的结果与第 4 题中的答案进行比较。

    #4 We know the mean is -1 and variance is 9
    #5 Q1, median, Q3 of the sample
    quantile(x)
             0%        25%        50%        75%       100% 
    -15.773765  -3.029712  -1.010223   1.014308  14.674138
    
    # Mean of the sample
    mean(x)
    [1] -1.008806
    
    #5 Variance of the sample
    var(x)
    [1] 8.978004
    

# Conclusion

Normal distribution is the most important distribution. When we talk about the Central Limit Theorem, confidence interval, and hypothesis testing, we will come back to the normal distribution.
正态分布是最重要的分布。当我们谈论中心极限定理、置信区间和假设检验时,我们会用到正态分布。

已有0条评论