dnorm(1, mean=0, sd=1)
[1] 0.2419707
There are many common families of probability distributions and we have discussed six so far. The discrete distributions include the discrete Uniform, Bernoulli, and Binomial. The continuous distributions include the continuous Uniform, Normal, and t.
This chapter provides a set of examples to show you how to compute probabilities from a few of these distributions in R.
R has four normal distribution functions: dnorm( )
, pnorm( )
, qnorm( )
, and rnorm( )
.
dnorm(x,mean,sd)
probability density function (PDF) - input:x
is the value at which you want to evaluate the normal PDF - output: a positive number since the PDF \(f(x)\) must be positive - example: evaluate \(f(x)\)
pnorm(q,mean,sd)
cumulative distribution function (CDF) - input:q
is the value for which you want to find the area below/above - output: a probability - example: compute \(P(X<q)\)
qnorm(p,mean,sd)
quantile function - input:p
is a probability - output: a real number since \(X\in(-\infty,\infty)\) - example: find the value \(q\) such that \(P(X<q)=p\)
rnorm(n,mean,sd)
random number generator - input:n
is the number of observations you want to generate - output: a vector of n real numbers - example: generate n independent \(N(\mu,\sigma^2)\) random variables
More information is also accessible in R if you type ?dnorm
, ?pnorm
, ?qnorm
, or ?rnorm
.
To learn how to use these functions, we’ll start with a few exercises on the standard normal distribution which is a normal distribution with mean 0 and standard deviation of 1. We will then move on to the more general \(N(\mu,\sigma^2)\) distribution.
dnorm
)When \(X\) is a continuous random variable, we know that \(P(X=x)=0\). Therefore, dnorm( )
does not return a probability, but rather the height of the PDF. Even though the height of the PDF is not a probability, we can still interpret density evaluations as the relatively likelihood of observing a certain value \(x\).
PROBLEM 1: Let \(X\sim N(0,1)\). Is the value \(x=1\) or \(x=-0.5\) more likely to occur under this normal distribution?
dnorm(1, mean=0, sd=1)
[1] 0.2419707
dnorm(-0.5, mean=0, sd=1)
[1] 0.3520653
pnorm
)The pnorm( )
function is useful for evaluating probabilities of the form \(P(X\leq x)\) or \(P(X \geq x)\).
PROBLEM 2: If \(X\sim N(0,1)\), what is \(P(X<0)\)?
pnorm(0, mean=0, sd=1)
[1] 0.5
PROBLEM 3: If \(X\sim N(0,1)\), what is \(P(X<1)\)?
pnorm(1, mean=0, sd=1)
[1] 0.8413447
PROBLEM 4: If \(X\sim N(0,1)\), what is \(P(X>1)\)?
We have two ways of answering this question. First, we can recognize that \(P(X>1)=1-P(X\geq 1)\).
1-pnorm(1, mean=0, sd=1)
[1] 0.1586553
A second approach is to use the lower.tail
option within the pnorm( )
function. When lower.tail=TRUE
then the pnorm( )
function returns the probability to the left of a given number \(x\) and if lower.tail=FALSE
then pnorm( )
returns the probability to the right of \(x\).
pnorm(1, mean=0, sd=1, lower.tail=FALSE)
[1] 0.1586553
PROBLEM 5: If \(X\sim N(0,1)\), what is \(P(0<X<1)\)
pnorm(1, mean=0, sd=1) - pnorm(0, mean=0, sd=1)
[1] 0.3413447
Once we understand how to use the pnorm( )
function to compute standard normal probabilities, extending the function to compute probabilities of any normal distribution is straightforward. All we have to do is change the mean
and sd
arguments.
Remember that the normal functions in R call for the standard deviation \(\sigma\), NOT the variance \(\sigma^2\)!
PROBLEM 6: If \(X\sim N(4,9)\), what is \(P(X<0)\)?
pnorm(0, mean=4, sd=3)
[1] 0.09121122
PROBLEM 7: If \(X\sim N(2,3)\), what is \(P(X>5)\)?
pnorm(5, mean=2, sd=sqrt(3), lower.tail=FALSE)
[1] 0.04163226
qnorm
)Next, let’s use the qnorm( )
function to find quantiles of the normal distribution.
PROBLEM 8: If \(X\sim N(0,1)\), find the value \(q\) such that \(P(X<q)=0.05\).
qnorm(0.05, mean=0, sd=1)
[1] -1.644854
PROBLEM 9: If \(X\sim N(0,1)\), find the value \(q\) such that \(P(X>q)=0.025\). That is, \(q\) is the value such that 2.5% of the area under the standard normal PDF is to its right.
qnorm(0.025, mean=0, sd=1, lower.tail=FALSE)
[1] 1.959964
PROBLEM 10: If \(X\sim N(-4,2)\), find the value \(q\) such that \(P(X>q)=0.1\). That is, \(q\) is the value such that 10% of the area under the \(N(-4,2)\) PDF is to its right.
qnorm(0.1, mean=-4, sd=sqrt(2), lower.tail=FALSE)
[1] -2.187612
rnorm
)Finally, let’s use rnorm( )
to generate random samples of size \(n\) from a normal distribution.
PROBLEM 11: Generate \(n=20\) random variables from a standard normal distribution.
= rnorm(20, mean=0, sd=1)
x x
[1] 0.91425791 -1.04958627 -0.93465403 0.19325380 -0.01823713 -0.86645369
[7] 0.63871980 -0.36962999 -0.43475418 -0.32424412 1.60020368 0.49327786
[13] 1.43441584 -0.78876451 0.30173806 0.08214899 -0.19969547 -1.04771367
[19] -0.18370675 -0.47987416
hist(x)
PROBLEM 12: Generate \(n=100\) random variables from a \(N(10,2)\) distribution.
= rnorm(100, mean=10, sd=sqrt(2))
x x
[1] 11.164460 7.929328 11.179405 8.442789 12.373270 13.401926 9.741446
[8] 8.115701 11.078015 10.069417 8.586165 9.757694 11.546480 10.702859
[15] 7.528640 8.491067 8.210394 10.115810 10.515493 10.387686 13.640579
[22] 8.160983 9.899891 10.368929 10.642952 6.823459 9.350985 8.447182
[29] 11.916924 10.787050 11.092626 11.272405 9.522846 10.015570 10.033683
[36] 9.189953 8.099563 7.760358 13.557964 9.905399 9.111984 8.341003
[43] 11.712643 7.689756 9.020224 12.640759 6.951699 9.412638 7.086170
[50] 8.268485 9.851610 11.758662 11.201609 13.633821 10.141695 9.738907
[57] 9.979552 11.420146 8.059217 11.456927 10.829690 9.594522 7.986263
[64] 10.746696 11.312949 9.622886 11.537176 7.548495 9.997797 9.787481
[71] 11.763016 10.036189 11.010225 7.933874 7.947485 10.684199 10.359082
[78] 12.162780 9.225362 8.881178 10.102968 6.136698 8.701801 9.142200
[85] 9.648291 11.063129 10.536869 10.850814 8.831312 10.358653 9.197854
[92] 9.373556 9.579944 12.025227 10.474767 9.070353 10.561072 11.444697
[99] 8.680786 10.359923
hist(x)
The Bernoulli and Binomial distributions are intimately related: a Binomial random variable corresponds to the number of successes in \(n\) independent Bernoulli trials. For example, consider flipping a coin. Each coin flip can be modelled as a Bernoulli\((p)\) random variable with probability of success (heads) equal to \(p\). If you flipped a coin \(n=10\) times and wanted to model the number of sucesses (heads) in \(n=10\) trials, that would be a Binomial(\(n,p\)) random variable.
R has four functions that can be used to compute both Bernoulli and Binomial probabilities: dbinom( )
, pbinom( )
, qbinom( )
, rbinom( )
.
dbinom(x,size,prob)
probability mass function (PMF) - input:x
is the number of successes,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a probability since \(0\leq P(X=x)\leq1\) - example: evaluate \(P(X=x)\)
pbinom(q,size,prob)
probability distribution function (CDF) - input:q
is the value for which you want to find the area below/above,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a probability - example: evaluate \(P(X\leq x)\)
qbinom(p,size,prob)
quantile function
- input:p
is a probability,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a positive integer since \(X\in\{0,1,\dotsc,n\}\) - example: find \(q\) s.t. \(P(X\leq q)=p\)
rbinom(n,size,prob)
random number generator
- input:n
is the number of observations you want to generate,size
is the number of trials \(n\),prob
is the probability of success \(p\) - output: a vector of n positive integers - example: generate \(n\) independent Binomial\((n,p)\) random variables
Note: These functions correspond to the Bernoulli distribution whenever size=1
.
More information is also accessible in R if you type ?dbinom
, ?pbinom
, ?qbinom
, or ?rbinom
.
dbinom
)PROBLEM 13: If you flip a coin \(n=5\) times and in each flip the probability of heads is \(p=0.5\), what is the chance that you get 2 successes?
Here, our random variable \(X\) is the number of successes in \(n\) independent trials, so \(X\sim\text{Binomial}(n,p)\) with \(n=5\) and \(p=0.5\).
dbinom(2, size=5, prob=0.5)
[1] 0.3125
We can also check our answer using the Binomial probability mass function: \(P(X=x)={n\choose x}p^x(1-p)^{n-x}\).
choose(5,2)*0.5^2*(1-0.5)^(5-2)
[1] 0.3125
pbinom
)PROBLEM 14: If you flip a coin \(n=5\) times and in each flip the probability of heads is \(p=0.5\), what is the chance that you get at most 2 successes?
Now we want to find \(P(X\leq2)\). We know that \(P(X\leq2)=P(X=2)+P(X=1)+P(X=0)\), so we could again use the dbinom( )
function.
dbinom(2, size=5, prob=0.5) + dbinom(1, size=5, prob=0.5) + dbinom(0, size=5, prob=0.5)
[1] 0.5
The problem is that this approach becomes cumbersome as the number of trials increases. A more efficient approach is to recognize that \(P(X\leq2)\) takes the form of the CDF and use pnorm( )
.
pbinom(2, size=5, prob=0.5)
[1] 0.5
PROBLEM 15: If you flip a coin \(n=100\) times and in each flip the probability of heads is \(p=0.25\), what is the chance that you get at most 20 successes?
pbinom(20, size=100, prob=0.25)
[1] 0.1488311
PROBLEM 16: If you flip a coin \(n=100\) times and in each flip the probability of heads is \(p=0.25\), what is the chance that you get at least 20 successes?
We have two ways to solve this problem. First, we can write \(P(X\geq 20)=1-P(X<20)=1-P(X\leq 19)\) where \(P(X<20)=P(X\leq 19)\) since \(X\) is discrete.
1-pbinom(19, size=100, prob=0.25)
[1] 0.9004696
Alternatively, we can use the lower.tail=FALSE
option to tell R we want the probability greater than x. However, note that this is strictly greater than, so we must again remember than \(P(X\geq 20)=P(X>19)\).
pbinom(19, size=100, prob=0.25, lower.tail=FALSE)
[1] 0.9004696
qbinom
)PROBLEM 17: Suppose you flip a coin \(n=20\) times where each flip has a probability of heads equal to \(p=0.5\). Find the value \(q\) such that the probability of getting at most \(q\) successes is equal to 0.25.
qbinom(0.25, size=20, prob=0.5)
[1] 8
rbinom
)PROBLEM 18: Generate \(n=50\) Bernoulli\((p)\) random variables with \(p=0.2\).
= rbinom(50, size=1, prob=0.2)
x x
[1] 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
[39] 0 0 0 0 1 1 0 0 1 0 0 0
barplot(table(x))
PROBLEM 19: Generate \(n=100\) Binomial\((n,p)\) random variables with \(p=0.4\).
= rbinom(100, size=100, prob=0.2)
x x
[1] 23 25 12 27 29 26 13 22 16 25 18 18 23 16 20 14 19 21 19 19 25 26 29 16 22
[26] 19 22 17 19 27 20 24 18 22 18 23 18 16 21 20 24 20 17 17 18 17 20 21 21 14
[51] 23 20 19 20 19 20 23 23 25 23 23 17 18 21 20 13 23 19 15 20 22 20 17 24 19
[76] 17 18 21 24 20 19 24 29 21 17 13 14 20 27 18 23 20 25 16 24 22 29 24 21 15
barplot(table(x))