dnorm(1, mean=0, sd=1)
[1] 0.2419707
There are many common families of probability distributions and we have discussed six so far. The discrete distributions include the discrete Uniform, Bernoulli, and Binomial. The continuous distributions include the continuous Uniform, Normal, and t.
This chapter provides a set of examples to show you how to compute probabilities from a few of these distributions in R.
R has four normal distribution functions: dnorm( )
, pnorm( )
, qnorm( )
, and rnorm( )
.
dnorm(x,mean,sd)
probability density function (PDF) - input:x
is the value at which you want to evaluate the normal PDF - output: a positive number since the PDF \(f(x)\) must be positive - example: evaluate \(f(x)\)
pnorm(q,mean,sd)
cumulative distribution function (CDF) - input:q
is the value for which you want to find the area below/above - output: a probability - example: compute \(P(X<q)\)
qnorm(p,mean,sd)
quantile function - input:p
is a probability - output: a real number since \(X\in(-\infty,\infty)\) - example: find the value \(q\) such that \(P(X<q)=p\)
rnorm(n,mean,sd)
random number generator - input:n
is the number of observations you want to generate - output: a vector of n real numbers - example: generate n independent \(N(\mu,\sigma^2)\) random variables
More information is also accessible in R if you type ?dnorm
, ?pnorm
, ?qnorm
, or ?rnorm
.
To learn how to use these functions, we’ll start with a few exercises on the standard normal distribution which is a normal distribution with mean 0 and standard deviation of 1. We will then move on to the more general \(N(\mu,\sigma^2)\) distribution.
dnorm
)When \(X\) is a continuous random variable, we know that \(P(X=x)=0\). Therefore, dnorm( )
does not return a probability, but rather the height of the PDF. Even though the height of the PDF is not a probability, we can still interpret density evaluations as the relatively likelihood of observing a certain value \(x\).
PROBLEM 1: Let \(X\sim N(0,1)\). Is the value \(x=1\) or \(x=-0.5\) more likely to occur under this normal distribution?
dnorm(1, mean=0, sd=1)
[1] 0.2419707
dnorm(-0.5, mean=0, sd=1)
[1] 0.3520653
pnorm
)The pnorm( )
function is useful for evaluating probabilities of the form \(P(X\leq x)\) or \(P(X \geq x)\).
PROBLEM 2: If \(X\sim N(0,1)\), what is \(P(X<0)\)?
pnorm(0, mean=0, sd=1)
[1] 0.5
PROBLEM 3: If \(X\sim N(0,1)\), what is \(P(X<1)\)?
pnorm(1, mean=0, sd=1)
[1] 0.8413447
PROBLEM 4: If \(X\sim N(0,1)\), what is \(P(X>1)\)?
We have two ways of answering this question. First, we can recognize that \(P(X>1)=1-P(X\geq 1)\).
1-pnorm(1, mean=0, sd=1)
[1] 0.1586553
A second approach is to use the lower.tail
option within the pnorm( )
function. When lower.tail=TRUE
then the pnorm( )
function returns the probability to the left of a given number \(x\) and if lower.tail=FALSE
then pnorm( )
returns the probability to the right of \(x\).
pnorm(1, mean=0, sd=1, lower.tail=FALSE)
[1] 0.1586553
PROBLEM 5: If \(X\sim N(0,1)\), what is \(P(0<X<1)\)
pnorm(1, mean=0, sd=1) - pnorm(0, mean=0, sd=1)
[1] 0.3413447
Once we understand how to use the pnorm( )
function to compute standard normal probabilities, extending the function to compute probabilities of any normal distribution is straightforward. All we have to do is change the mean
and sd
arguments.
Remember that the normal functions in R call for the standard deviation \(\sigma\), NOT the variance \(\sigma^2\)!
PROBLEM 6: If \(X\sim N(4,9)\), what is \(P(X<0)\)?
pnorm(0, mean=4, sd=3)
[1] 0.09121122
PROBLEM 7: If \(X\sim N(2,3)\), what is \(P(X>5)\)?
pnorm(5, mean=2, sd=sqrt(3), lower.tail=FALSE)
[1] 0.04163226
qnorm
)Next, let’s use the qnorm( )
function to find quantiles of the normal distribution.
PROBLEM 8: If \(X\sim N(0,1)\), find the value \(q\) such that \(P(X<q)=0.05\).
qnorm(0.05, mean=0, sd=1)
[1] -1.644854
PROBLEM 9: If \(X\sim N(0,1)\), find the value \(q\) such that \(P(X>q)=0.025\). That is, \(q\) is the value such that 2.5% of the area under the standard normal PDF is to its right.
qnorm(0.025, mean=0, sd=1, lower.tail=FALSE)
[1] 1.959964
PROBLEM 10: If \(X\sim N(-4,2)\), find the value \(q\) such that \(P(X>q)=0.1\). That is, \(q\) is the value such that 10% of the area under the \(N(-4,2)\) PDF is to its right.
qnorm(0.1, mean=-4, sd=sqrt(2), lower.tail=FALSE)
[1] -2.187612
rnorm
)Finally, let’s use rnorm( )
to generate random samples of size \(n\) from a normal distribution.
PROBLEM 11: Generate \(n=20\) random variables from a standard normal distribution.
= rnorm(20, mean=0, sd=1)
x x
[1] -1.08578052 -2.51525266 0.50112721 -1.46349046 0.89933666 0.42076457
[7] -1.53133499 -0.37842473 0.15526855 0.79426753 0.36203937 0.47682469
[13] 1.05620180 0.96245182 -1.20061008 0.31344427 0.84498530 -0.50238712
[19] -1.11387644 -0.04362956
hist(x)
PROBLEM 12: Generate \(n=100\) random variables from a \(N(10,2)\) distribution.
= rnorm(100, mean=10, sd=sqrt(2))
x x
[1] 9.842401 9.405289 10.929030 10.990341 9.995606 9.937003 10.490203
[8] 10.058287 10.126005 7.742390 8.562545 8.028502 8.526251 12.251231
[15] 12.776681 12.073214 9.947080 8.808330 9.165551 10.281743 10.138185
[22] 11.595435 9.225106 12.241392 8.305476 9.586018 9.037704 11.437506
[29] 9.690265 8.381058 8.197428 11.551766 10.321125 10.802257 10.741454
[36] 10.367206 11.089124 9.658220 7.639096 10.252741 11.213879 10.220065
[43] 12.600090 9.331105 9.405717 9.995660 10.861830 7.630178 8.720515
[50] 9.718929 8.737404 11.819054 9.184134 9.167343 9.985605 10.467358
[57] 7.016921 11.966441 10.050908 8.021857 9.986673 10.990380 11.675890
[64] 11.491785 11.609429 10.126801 10.391337 7.240403 10.486688 8.027451
[71] 9.153793 9.947466 9.864073 9.830305 9.834425 9.512957 7.433951
[78] 10.529513 9.008647 7.873238 8.589169 6.652862 11.163475 12.174325
[85] 9.562299 9.721394 14.426347 9.737223 11.073780 10.193520 8.374574
[92] 9.403898 7.350218 11.392020 9.132813 9.356953 7.932916 10.487301
[99] 10.494659 6.736746
hist(x)
The Bernoulli and Binomial distributions are intimately related: a Binomial random variable corresponds to the number of successes in \(n\) independent Bernoulli trials. For example, consider flipping a coin. Each coin flip can be modelled as a Bernoulli\((p)\) random variable with probability of success (heads) equal to \(p\). If you flipped a coin \(n=10\) times and wanted to model the number of sucesses (heads) in \(n=10\) trials, that would be a Binomial(\(n,p\)) random variable.
R has four functions that can be used to compute both Bernoulli and Binomial probabilities: dbinom( )
, pbinom( )
, qbinom( )
, rbinom( )
.
dbinom(x,size,prob)
probability mass function (PMF) - input:x
is the number of successes,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a probability since \(0\leq P(X=x)\leq1\) - example: evaluate \(P(X=x)\)
pbinom(q,size,prob)
probability distribution function (CDF) - input:q
is the value for which you want to find the area below/above,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a probability - example: evaluate \(P(X\leq x)\)
qbinom(p,size,prob)
quantile function
- input:p
is a probability,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a positive integer since \(X\in\{0,1,\dotsc,n\}\) - example: find \(q\) s.t. \(P(X\leq q)=p\)
rbinom(n,size,prob)
random number generator
- input:n
is the number of observations you want to generate,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a vector of n positive integers - example: generate \(n\) independent Binomial\((n,p)\) random variables
Note: These functions correspond to the Bernoulli distribution whenever size=1
.
More information is also accessible in R if you type ?dbinom
, ?pbinom
, ?qbinom
, or ?rbinom
.
dbinom
)PROBLEM 13: If you flip a coin \(n=5\) times and in each flip the probability of heads is \(p=0.5\), what is the chance that you get 2 successes?
Here, our random variable \(X\) is the number of successes in \(n\) independent trials, so \(X\sim\text{Binomial}(n,p)\) with \(n=5\) and \(p=0.5\).
dbinom(2, size=5, prob=0.5)
[1] 0.3125
We can also check our answer using the Binomial probability mass function: \(P(X=x)={n\choose x}p^x(1-p)^{n-x}\).
choose(5,2)*0.5^2*(1-0.5)^(5-2)
[1] 0.3125
pbinom
)PROBLEM 14: If you flip a coin \(n=5\) times and in each flip the probability of heads is \(p=0.5\), what is the chance that you get at most 2 successes?
Now we want to find \(P(X\leq2)\). We know that \(P(X\leq2)=P(X=2)+P(X=1)+P(X=0)\), so we could again use the dbinom( )
function.
dbinom(2, size=5, prob=0.5) + dbinom(1, size=5, prob=0.5) + dbinom(0, size=5, prob=0.5)
[1] 0.5
The problem is that this approach becomes cumbersome as the number of trials increases. A more efficient approach is to recognize that \(P(X\leq2)\) takes the form of the CDF and use pnorm( )
.
pbinom(2, size=5, prob=0.5)
[1] 0.5
PROBLEM 15: If you flip a coin \(n=100\) times and in each flip the probability of heads is \(p=0.25\), what is the chance that you get at most 20 successes?
pbinom(20, size=100, prob=0.25)
[1] 0.1488311
PROBLEM 16: If you flip a coin \(n=100\) times and in each flip the probability of heads is \(p=0.25\), what is the chance that you get at least 20 successes?
We have two ways to solve this problem. First, we can write \(P(X\geq 20)=1-P(X<20)=1-P(X\leq 19)\) where \(P(X<20)=P(X\leq 19)\) since \(X\) is discrete.
1-pbinom(19, size=100, prob=0.25)
[1] 0.9004696
Alternatively, we can use the lower.tail=FALSE
option to tell R we want the probability greater than x. However, note that this is strictly greater than, so we must again remember than \(P(X\geq 20)=P(X>19)\).
pbinom(19, size=100, prob=0.25, lower.tail=FALSE)
[1] 0.9004696
qbinom
)PROBLEM 17: Suppose you flip a coin \(n=20\) times where each flip has a probability of heads equal to \(p=0.5\). Find the value \(q\) such that the probability of getting at most \(q\) successes is equal to 0.25.
qbinom(0.25, size=20, prob=0.5)
[1] 8
rbinom
)PROBLEM 18: Generate \(n=50\) Bernoulli\((p)\) random variables with \(p=0.2\).
= rbinom(50, size=1, prob=0.2)
x x
[1] 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
[39] 0 0 0 1 0 1 1 0 1 0 0 0
barplot(table(x))
PROBLEM 19: Generate \(n=100\) Binomial\((n,p)\) random variables with \(p=0.4\).
= rbinom(100, size=100, prob=0.2)
x x
[1] 25 18 18 19 15 25 19 24 25 21 27 17 19 18 18 14 25 17 18 20 20 13 16 13 20
[26] 17 19 23 13 21 27 17 22 15 23 16 23 20 21 26 20 17 20 23 21 25 18 16 16 27
[51] 24 23 17 20 20 22 20 23 16 17 20 21 19 14 24 16 19 18 18 20 24 18 19 16 19
[76] 23 17 19 14 20 27 24 20 16 15 17 13 19 16 20 14 25 21 15 14 24 18 19 16 15
barplot(table(x))