MAST20006/MAST90057 – Module 2. Discrete Distributions
Module 2. Discrete Distributions
Chapter 2 in the textbook
Sophie Hautphenne and Feng Liu
The University of Melbourne
2023
1/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Overview
1 Discrete random variables
2 Mathematical expectation
3 Mean, variance and standard deviation
4 Bernoulli trials and the binomial distribution
5 The moment-generating function
6 The Poisson distribution
2/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
1. Discrete random variables
Recall that a fundamental objective of probability theory is to find
the probability of a given event B in the sample (outcome) space S.
It can be difficult to describe and analyse S, and accordingly B, if
the elements of S are not numerical.
However, one often deals with situations where one can associate
with each sample point (outcome) s in S a numerical measurement
x ; that makes life easier.
The numeric measurement x, when regarded as a function of sample
point s, is called a random variable, and is denoted as X or X(s).
3/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Definition 1
Given a random experiment with an outcome space S, a function X that
assigns to each element s in S a real number X(s) = x is called a
random variable (abbr. r.v.).
The range (or space) of X is the set of real numbers
{x : X(s) = x, s ∈ S}, where ‘s ∈ S’ means the element s belongs to the
set S.
Remarks : The range of X is often denoted as X(S) or SX .
Now each event (subset) B in S can be described by the subset
A := X(B) of real numbers assumed by some function (r.v.) X on
B.
Note that A is a subset of SX but not of S, and that X(B) does
not specify B for a general X.
4/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
So X : S → SX ? R, such that s 7→ X(s) = x, and for every
A ? SX , there exists B ? S such that
A = X(B) = {x : x = X(s), s ∈ B}
and therefore B = {s ∈ S : X(s) ∈ A}.
Namely, for A ? SX ,
PX(A) = P (X ∈ A) = P ({s ∈ S : X(s) ∈ A}) = P (B)
In particular,
PX(SX) = P (X ∈ SX) = P ({s : s ∈ S : X(s) ∈ SX}) = P (S) = 1.
i.e. the probability of the range of X equals 1.
5/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Assigning probability to A = X(B) ∈ SX can be easier than
assigning probability to B ∈ S, as A is of numerical nature, while B
is not necessarily numerical.
Difficulties still remain :
1 How to assign a probability to a subset A = X(B) ∈ SX ?
2 How to define a r.v. X as a function of s ∈ S ?
The response to 2) is determined by the problem under
consideration, and is not unique.
To answer 1) we will focus on the discrete sample space at this
stage.
If S is discrete, SX is also discrete. So we would be able to calculate
PX(A) for any subset A in SX if we have assigned a probability
for each element in SX .
(Remember there exists a B ? S such that A = X(B).)
6/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Specifically,
PX(A) = PX(X(B)) = P (B) =
∑
s∈B
P ({s}).
Also note
PX(A) =
∑
x∈A
PX(x) =
∑
x∈A
P (X = x).
Example 1. A marble is selected at random from a box containing 3 red,
4 yellow and 5 white marbles. The colour of the selected marble is
recorded.
The sample space is S = {R, Y,W}
And P ({R}) = 3
12
, P ({Y }) = 4
12
, P ({W}) = 5
12
.
Define a random variable
X = X(s) =
?????
1 if s = R,
2 if s = Y,
3 if s =W.
7/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Then the space of X is SX = {1, 2, 3}.
For A = {1, 2} which is an event in SX , there exists an event B in
S where B = {R, Y } such that
X(B) = X({R, Y }) = {X(R), X(Y )} = {1, 2} = A.
Note that both A and B represent the event that the selected
marble is not white.
Now,
PX(A) = PX({1, 2}) = PX(1) + PX(2) = P (X = 1) + P (X = 2)
= P (s = R) + P (s = Y ) = P ({R, Y }) = P (B)
=
3
12
+
4
12
=
7
12
Carefully read the above equation to make sure you understand
every step there.
The preceding discussions tell us that the set of probabilities
{P (X = x), x ∈ SX} are fundamental in that they determine the
probability of any event in SX .
8/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
We often write
f(x) := PX({x}) = P (X = x) for any x ∈ SX ;
we call f(x) the probability mass function (pmf) of X.
Definition 2
The pmf f(x) of a discrete random variable X is a function that satisfies
the following properties :
1 f(x) > 0 for any x ∈ SX ;
2
∑
x∈SX
f(x) = 1 ;
3 PX(A) = P (X ∈ A) =
∑
x∈A
f(x), for any A ? SX .
9/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Remarks :
1 Provided that no confusion will be created, SX can simply be
rewritten as S (the sample space for X), and PX as P (or even just
Pr).
2 Note that P (X = x) = 0 if x /∈ SX . Therefore we define f(x) = 0
for any x /∈ SX .
3 If f(x) is constant on SX , we say X has a uniform distribution, or
f(x) is a uniform pmf. For example, “f(x) = 1/6, x = 1, 2, . . . , 6”
is a uniform pmf.
4 The pmf f(x) can be expressed in different ways. It can be
expressed as either a mathematics formula, table, bar graph or
probability histogram. You can use any one (usually the simplest
one, for the given situation) of these four forms to express the pmf.
10/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Example 2. Roll a four-sided die twice.
Let a random variable X equal the larger of the two face numbers
appeared if they are different and the common value if they are the
same.
Thus the sample space is
S = {(d1, d2) : d1 = 1, 2, 3, 4; d2 = 1, 2, 3, 4}.
We have X = X(d1, d2) = max(d1, d2), and the space of X is
SX = {1, 2, 3, 4}.
It is not difficult to see that
P (X = 1) = P ({(1, 1)}) = 1
16
P (X = 2) = P ({(1, 2), (2, 1), (2, 2)}) = 3
16
P (X = 3) = P ({(1, 3), (2, 3), (3, 3), (3, 1), (3, 2)}) = 5
16
P (X = 4) = P ({(1, 4), (2, 4), (3, 4), (4, 4), (4, 1), (4, 2), (4, 3)}) = 7
16
11/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Therefore, the pmf of X can either be given by the following table
x 1 2 3 4
f(x) = P (X = x)
or by the following mathematical formula
f(x) = P (X = x) = , x = 1, 2, 3, 4,
or by the following bar graph or probability histogram :
12/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
R commands used for creating the above graphs :
13/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Example 3 : The hypergeometric distribution.
Let X be the number of “defective items” (“D”) in a sample of n items
randomly drawn without replacement from a population consisting of
N1 D’s and N2 G’s (“good items”). The population has in total
N1 +N2 = N items.
Assume that each item in the population has the same chance to be
drawn.
Then the possible values that the discrete r.v. X can take, i.e. the
space of X, are SX = {x : x ≥ 0, x ≤ n, x ≤ N1 and n? x ≤ N2}.
We say X has a hypergeometric distribution Hyper(N1, N2, n), with
the pmf being.
14/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Example 4 : Capture-recapture experiment. Ten animals of a certain
species have been captured, tagged, and released to mix into their
population. Suppose the population consists of 80 such animals. A new
sample of 15 animals is to be selected.
What is the probability that 3 in the new sample will come from the
tagged ?
Let X be the number of tagged animals in the new sample.
Then X has a hypergeometric distribution
Hyper(N1 = , N2 = , n = )
Therefore f(3) = P (X = 3) =
In R, use dhyper(x,N1, N2, n) to compute the hypergeometric pmf :
15/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
2. Mathematical expectation
The pmf f(x), x ∈ SX provides all the information about the
probability distribution of a random variable X.
Here we are interested in some numeric characteristics of X, which
are also numeric characteristics of f(x)
An important numeric characteristic is the mathematical
expectation of X.
16/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 5. A young man devises a game. The game is to let the
participant cast a fair die and then receive a payment according to the
outcome :
He pays 1¢ if the event A = {1, 2, 3} occurs ; 5¢ if B = {4, 5}
occurs ; and 35¢ if C = {6} occurs.
It is easy to see that P (A) =
3
6
, P (B) =
2
6
and P (C) =
1
6
.
The average payment per cast is 1× 3
6
+ 5× 2
6
+ 35× 1
6
= 8¢.
In the long run, this is how much is paid in one play (use the “long
term relative frequency” interpretation of probability !).
The charge per cast should be more than 8¢ if the young man
wants to make a profit from this game over the long term.
17/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
The above discussion can be formulated more formally :
Let X be the outcome of a cast.
The pmf of X is the uniform one given by f(x) =
1
6
, x = 1, 2, . . . , 6.
In terms of the observed value x, the payment per cast is given by
the function
u(x) =
?????
1, x = 1, 2, 3
5, x = 4, 5
35, x = 6.
The mathematical expectation of the payment per cast is then
equal to
6∑
x=1
u(x)f(x) = 1× 1
6
+ 1× 1
6
+ 1× 1
6
+ 5× 1
6
+ 5× 1
6
+ 35× 1
6
= 1× 3
6
+ 5× 2
6
+ 35× 1
6
= 8.
18/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Definition 3
Suppose f(x) is the pmf of a discrete random variable X with range SX ,
and u(X) is a function of X (note that u(X) is also a r.v.).
If the summation∑
x∈SX
u(x)f(x), which is sometimes written as
∑
SX
u(x)f(x), exists,
then the sum is called the mathematical expectation or the expected
value of the function u(X), and it is denoted by E[u(X)].
That is,
E[u(X)] =
∑
x∈SX
u(x)f(x).
19/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Remarks :
1 It is possible that E[u(X)] is different from u(X) for any x ∈ SX .
2 To be mathematically rigorous, the definition of E[u(X)] requires
that
∑
x∈SX |u(x)|f(x) converges and is finite (if SX is infinite, this
is a series).
3 There is another way to calculate E[u(X)] :
(a) Define Y = u(X) ; Y is also a random variable.
(b) Then find the pmf of Y ,
i.e. g(y) := P (Y = y) = P (u(X) = y) = P (X = u?1(y)).
(c) Then E[u(X)] = E[Y ] =
∑
y∈SY
yg(y).
(d) So
∑
x∈SX
u(x)f(x) =
∑
y∈SY
yg(y).
20/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 6. Let a r.v. X have the pmf f(x) = 13 , x ∈ S = {?1, 0, 1}.
Let u(X) = X2.
Then
E[u(X)] = E[X2] =
∑
x∈S
x2f(x) = (?1)2× 1
3
+02× 1
3
+12× 1
3
=
2
3
.
On the other hand, we can define Y = X2.
Then P (Y = 0) = P (X = 0) = 13 , and
P (Y = 1) = P (X = ?1) + P (X = 1) = 23 .
So the pmf of Y is
g(y) =
???
1
3 , y = 0
2
3 , y = 1,
and the space of Y is SY = {0, 1}.
Hence E[Y ] =
∑
y∈SY
y g(y) = 0× 1
3
+ 1× 2
3
=
2
3
.
21/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
In conclusion, we saw that there are two ways to compute E[u(X)],
and that in Example 6, E[u(X)] = E[Y ] = 23 .
Some useful properties about the mathematical expectation :
Theorem 1
When it exists, the mathematical expectation E satisfies the following
properties :
(a) If c is a constant, E(c) = c.
(b) If c is a constant and u is a function, E[c u(X)] = cE[u(X)].
(c) If c1 and c2 are constants and u1 and u1 are functions, then
E[c1 u1(X) + c2 u2(X)] = c1E[u1(X)] + c2E[u2(X)].
(d) Generalising part (c) above : E
[
k∑
i=1
ci ui(X)
]
=
k∑
i=1
ciE[ui(X)].
22/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 7. Let X have the pmf f(x) =
x
10
, x = 1, 2, 3, 4. Then
E(X) =
E(X2) =
E[X(5?X)] =
23/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 8. Let u(x) = (x? b)2, where b is an unknown constant.
Suppose E[(X ? b)2] exists. Find the value of b for which E[(X ? b)2]
is minimal.
First write g(b) = E[(X ? b)2]
Then g′(b) =
Set g′(b) = 0 and solve for b. It follows that b =
Since g′′(b) = , E[X] is the value of b that minimizes
E[(X ? b)2]
That is, E[(X ? E(X))2] ≤ E[(X ? b)2] for any b.
24/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 9 : The expectation of a hypergeometric random variable.
Let X have a hypergeometric distribution Hyper(N1, N2, n), with the
pmf given by
f(x) = P (X = x) =
(
N1
x
)(
N2
n?x
)(
N
n
) ,
where x ≥ 0, x ≤ n, x ≤ N1, n? x ≤ N2.
Then we can show that
E(X) =
∑
x∈S
x×
(
N1
x
)(
N2
n?x
)(
N
n
) = nN1
N
.
This agrees with the intuition : the number of ‘defective’ items in
the sample is expected to be equal to the sample size n multiplied
with
N1
N
, the proportion of ‘defective’ items in the population.
25/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
3. Mean, variance and standard deviation
For a discrete r.v. X with pmf f(x) and space
SX = {u1, u2, . . . , uk}, the expectation is
E(X) =
∑
x∈SX
xf(x) = u1f(u1) + u2f(u2) + . . . + ukf(uk).
The expectation can be regarded as a weighted mean of
u1, u2, . . . , uk, where the weights are f(u1), f(u2), . . . , f(uk).
For this reason, we also call E(X) the mean of the random variable
X, and also denote E(X) by the Greek letter μ.
In summary,
μ := E(X) =
∑
x∈SX
xf(x) = u1f(u1) + u2f(u2) + . . . + ukf(uk).
A third name for E(X) is the first moment of X as the expression
of E(X) has an interpretation of moment in mechanics.
26/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Similarly we call E(X2) the second moment of X.
Generally, for k ≥ 1, we call E(Xk) the k-th moment of X (about
the origin).
E[(X ? μ)k] is called the k-th moment of X about the mean μ
(central moment).
Statisticians find it valuable to compute E[(X ? μ)2] (the second
moment about the mean), because
E[(X ? μ)2] =
∑
x∈SX
(x? μ)2f(x)
= (u1 ? μ)2f(u1) + (u2 ? μ)2f(u2) + . . . + (uk ? μ)2f(uk)
is the weighted mean of the squares of the differences
u1 ? μ, u2 ? μ, . . . , uk ? μ, which measures the variability of X
about its mean.
27/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
For this reason, we call E[(X ? μ)2] the variance of X (or of the
pmf of X).
We also use σ2 or Var(X) to denote the variance, i.e.
σ2 := Var(X) = E[(X ? μ)2]
We call σ :=
√
E[(X ? μ)2] the standard deviation of X (or of
the pmf of X).
The following property is useful :
σ2 = Var(X) = E[(X ? μ)2] = E[X2]? μ2 = E[X2]? (E[X])2.
28/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 10. Let the pmf of X be defined as f(x) =
x
6
, x = 1, 2, 3.
Then
The mean of X is μ = E(X) = 1× 1
6
+ 2× 2
6
+ 3× 3
6
=
7
3
.
The second moment of X is
E(X2) = 12 × 1
6
+ 22 × 2
6
+ 32 × 3
6
= 6.
The variance of X is
σ2 = Var(X) = E(X2)? μ2 = 6?
(
7
3
)2
=
5
9
.
The standard deviation of X is σ =
√
Var(X) =
√
5
9
= 0.745.
29/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 11. Suppose the pmf of X is given by
x -1 0 1
fX(x) 1/3 1/3 1/3
It is easy to find that the mean of X is μX = 0, and the variance of
X is σ2X = 2/3.
Suppose the pmf of Y is given by
y -2 0 2
fY (y) 1/3 1/3 1/3
It is easy to find that the mean of Y is μY = 0, and the variance of
Y is σ2Y = 8/3.
We see that Y = 2X, μY = 2μX , σ
2
Y = 2
2σ2X and σY = 2σX .
30/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
In general, if Y = aX + b where a and b are constants, and Y and X are
two random variables, then we have the following
a) μY = aμX + b
b) σ2Y = a
2σ2X and σY = aσX .
Example 12. If X has a discrete uniform distribution on the first m
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 12 (cont.).
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 13 : Empirical distribution, sample mean and sample variance.
Consider performing a random experiment n times which gives n
observations of a r.v. X : x1, x2, . . . , xn ; this is referred to as a sample
from the distribution of X.
It is possible that some values in the sample are the same, but we do
not worry about it at this time.
Often we don’t know the probability distribution of X. But we can
(artificially) assign a probability 1n to each of x1, x2, . . . , xn . The
distribution determined by these equal probabilities is called the
empirical distribution since it is determined by a particular sample
x1, x2, . . . , xn acquired in an experiment.
That is, the pmf for the empirical distribution is
femp(x) =
1
n
, x = x1, x2, . . . , xn.
33/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
The mean of femp(x) is
n∑
i=1
xifemp(xi) =
which is just the sample mean of the data x1, x2, . . . , xn.
Likewise, the variance of the empirical distribution is
times the sample variance of the data defined as
s2 :=
1
n? 1
n∑
i=1
(xi ? xˉ)2.
This example shows us the relationship between the mean and
variance of the empirical distribution and the sample mean and
sample variance of the data.
34/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 14 : The mean and variance of a hypergeometric distribution.
Let X have a hypergeometric distribution Hyper(N1, N2, n), with the pmf
where x ≥ 0, x ≤ n, x ≤ N1, n? x ≤ N2.
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
4. Bernoulli trials and the binomial distribution
A random experiment with the below properties is called a binomial
experiment :
1 Each such experiment consists of n trials, with n being fixed in advance.
2 Each of the n trials has only two possible outcomes which are denoted by
‘success’ (S) and ‘failure’ (F). A trial of this type is called a Bernoulli
trial.
3 The n trials are independent of each other. That is, the outcome of one
trial does not affect the probability of occurrence of the outcome of other
trials.
4 The probability of ‘success’ (denoted by p) is the same for all the n trials.
36/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Let Xi be a random variable associated with the i-th Bernoulli trial,
which is defined as Xi(success) = 1 and Xi(failure) = 0.
Xi is called a Bernoulli random variable.
The pmf of Xi is given by
f(xi) = p
x
i (1? p)1?xi , xi = 0, 1,
and
μi = E(Xi) = p,
σ2i = Var(Xi) = p(1? p).
37/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 15. A coin is flipped independently five (n = 5) times. Call
outcome H (heads) as a “success” and T (tails) as a “failure”. Then this
is a binomial experiment.
Example 16. Suppose among 20 goblets in a box 2 have cosmetic flaws.
Now randomly take 10 goblets from the box without replacement. For
a selected goblet we are interested in whether it has any cosmetic flaws.
Then this is not a binomial experiment, because the outcomes of
the 10 trials are not independent with each other (it is a
hypergeometric experiment.)
If the 10 goblets are taken with replacement, then the experiment
is a binomial one.
38/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 17. Suppose 10% of a stock of 10,000 goblets have defects,
and randomly take 10 goblets without replacement for inspection. Then
the outcomes of the 10 trials are not independent of each other, but the
dependence is so weak that it can be ignored.
Therefore, Properties 1–4 of a binomial experiment are
approximately satisfied, and the experiment can be approximately
modelled by a binomial experiment.
In general, if an experiment involves a ‘without replacement’ sampling but
the sample size (number of trials) is < 5% of the population size, then
the experiment can be analysed as though it was a binomial experiment.
Example 18. A company that produces fine crystal knows from
experience that 10% of its goblets have cosmetic flaws and must be
classified as “seconds”. Now a sample of 10 goblets is randomly taken
from the production line for inspection. Knowing that the objective is just
to see whether any of them has any cosmetic flaws, this experiment is
approximately a binomial one.
39/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
In a binomial experiment, we often are interested in the total
number of ‘successes’, denoted by X, in the n Bernoulli trials.
We then call X a binomial random variable, and say that X has a
binomial distribution, denoted as X
d
= b(n, p), where n and p are
parameters indicating the number of Bernoulli trials and the
probability of ‘success’ in each trial respectively.
Note that we are not interested in the order of occurrences of the
‘successes’ for a binomial distribution.
The possible values of X are 0, 1, 2, . . . , n.
X = X1 +X2 + . . . +Xn, i.e. the sum of the n Bernoulli r.v’s.
Each Bernoulli r.v. Xi has a special binomial distribution
Xi
d
= b(1, p).
40/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Next we proceed to find the pmf and other characteristics of a binomial
r.v. X.
When n = 3, the probability for each possible outcome of X is given
below :
X Outcome Probability
3 SSS
2 SSF
SFS
FSS
1 SFF
FSF
FFS
0 FFF
41/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
From the above table we see that the pmf of b(3, p) is
P (X = 0) = (1? p)3,
P (X = 1) = 3p(1? p)2,
P (X = 2) = 3p2(1? p),
P (X = 3) = p3,
which can be equivalently expressed as
gives the number of ways of selecting x
positions for the x ‘successes’ in the n trials.
42/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
In general, the pmf for a binomial distribution b(n, p) is
f(x) = P (X = x) =
for x = 0, 1, 2, . . . , n.
Sometimes, it is of interest to find P (X ≤ x), the probability that
less than x or x ‘successes’ are obtained from n Bernoulli trials in a
binomial experiment.
We call the function defined by F (x) := P (X ≤ x) the cumulative
distribution function (or simply the distribution function) of X,
abbreviated as cdf of X.
43/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
For a r.v X having a binomial distribution b(n, p), the cdf is
np(1? p).
Remark : One can use the relation between binomial and Bernoulli
r.v.’s to find that
μX = E(X) = E(X1) +E(X2) + . . . +E(Xn) = p+ . . . + p = np.
44/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 18. Probability bargraphs for several binomial distributions of
different n and p values are listed below :
45/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
R commands for creating the above plots :
Example 19. Suppose the probability of germination of a beet seed is
0.8, and 10 seeds are planted. Let X be the number of seeds to
germinate. Assume independence of germination of one seed from that of
another seed. Then
P (X = 8) =
That is, the probability of 8 seeds to germinate is 0.302.
46/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
P (X ≤ 8) =
8∑
k=0
(
10
k
)
(0.8)k(1? 0.8)10?k, or
P (X ≤ 8) = 1? P (X ≥ 9) =
That is, the probability of no more than 8 germinations is 0.624.
μ = E(X) = .
That is, on average 8 seeds are expected to germinate.
σ2 = Var(X) =
P (6 ≤ X < 9) = P (X < 9)? P (X ≤ 5)
=
That is, the probability of at least 6 but smaller than 9 seeds to
germinate is 0.591.
47/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
What is the probability of 3 out of the 10 seeds not to germinate ?
P (3 do not germinate) =
Alternatively, let Y be the number of non-germinations. Then
Y d = b(10, 0.2). So
P (3 do not germinate) =
Suppose there are 1000 pots and 10 beet seeds are planted in each
pot, with the probability of germination of each seed still being 0.8.
The number of germinations in each pot is to be recorded.
What will the 1000 records look like ?
48/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
These 1000 recordings would be like 1000 observations from a
b(10, 0.8) random variable.
We can use R to simulate 1000 observations, plot their histogram and
compare the histogram with the pmf of b(10, 0.8).
The heights of the dots give the pmf of X.
49/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
R commands for creating the above plot :
R commands for pmf, cdf and random number generating of binomial
distribution :
dbinom(x, size, prob)
pbinom(q, size, prob)
rbinom(n, size, prob)
Type ‘help(dbinom)’ in R for more information.
50/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 20 : Comparison of binomial and hypergeometric distributions.
Suppose among 200 goblets in a box 20 have defects.
1 Randomly take 30 goblets from the box with replacement. Let X
be the number of defective goblets selected. It is easy to see that
X
d
= b(n = 30, p =
20
200
= 0.1).
2 Randomly take 30 goblets from the box without replacement. Let
Y be the number of defective goblets selected. Then
Y
d
= Hyper(N1 = 20, N2 = 180, n = 30).
51/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 20. (cont.)
We have learned that
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
A comparison of the pmf’s of X and Y is given below :
It can be shown that when n and p =
N1
N
are fixed but N tends to
be very large, the hypergeometric distribution will be very close to
relevant binomial distribution.
53/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
5. The moment-generating function
Mean, variance and standard deviation are important characteristics
of a distribution.
But it can be difficult to calculate E(X) and Var(X), e.g. when X
is binomial.
Here we introduce a function of t, called the moment-generating
function, which will help to generate the moments including mean
and variance of a distribution.
54/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Definition 4
Let X be a discrete random variable with pmf f(x) and range (or space)
S. If there is a positive number h such that
E(etX) =
∑
x∈S
etxf(x) is finite for t = ±h
(and hence for ?h < t < h), then the function of t defined by
M(t) := E(etX) (or MX(t) := E(e
tX))
is called the moment-generating function (mgf) of X (or of the
distribution of X).
55/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 21. Consider a random variable X with the following pmf :
x b1 b2 b3 . . .
f(x) = P (X = x) f(b1) f(b2) f(b3) . . .
The mgf of X is M(t) =
When t = 0, M(0) =
Example 22. If X has the mgf M(t) =
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Examples 21 and 22 show that the mgf can be derived from the
pmf, and vice versa.
The pmf uniquely determines the mgf, and it was proved that the
mgf also uniquely determines the pmf.
That is, the same pmf fX(x) = fY (x)? the same mgf
MX(t) =MY (t).
We see that the mgf, as the pmf, provides another tool for
describing the distribution of a r.v..
However, note that fX(x) = fY (x) or MX(t) =MY (t) does not
imply X = Y .
Another issue is that the mgf may not exist for some r.v.’s, while the
pmf always exists (for discrete r.v.’s, of course).
57/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 23. Suppose the mgf of X is M(t) =
1
2e
t
1? 12et
, t < ln(2).
We show how Taylor’s expansion can help to find the pmf of X.
This mgf does not have the form as given in Examples 21 and 22
which allowed us to find the pmf easily.
Note the Maclaurin’s (or Taylor’s) series expansion of (1? z)?1 is
(1? z)?1 = 1 + z + z2 + z3 + . . . , ?1 < z < 1.
Therefore,
M(t) =
et
2
(1?e
t
2
)?1 =
when e
t
2 < 1 and thus t < ln(2).
From the above expansion, P (X = x) =
So the pmf of X is f(x) =
58/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Now we proceed to see how the mgf and moments are related.
∑
x∈S
xrf(x) = E(Xr)
59/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
In particular,
μ =M ′(0) and σ2 =M ′′(0)? [M ′(0)]2
In order to make use of the above technique to find the moments of
X, the mgf M(t) needs to have a closed form instead of the
expansion form.
60/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 24. The pmf of the binomial distribution is known to be
f(x) = P (X = x) =
(
n
x
)
px(1? p)n?x = n!
x!(n? x)!p
x(1? p)n?x,
for x = 0, 1, 2, . . . , n.
Thus the corresponding mgf is
M(t) = E(etX) =
from the binomial expansion of (a+ b)n with a = 1? p and b = pet.
61/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 24 (cont.).
The first two derivatives of M(t) are
M ′(t) = n[(1? p) + pet]n?1(pet)
M ′′(t) = n(n? 1)[(1? p)+ pet]n?2(pet)2+n[(1? p)+ pet]n?1(pet)
So
μ = E(X) = M ′(0) = np,
E(X2) = M ′′(0) = n(n? 1)p2 + np, and
σ2 = Var(X) = E(X2)? [E(X)]2
= n(n? 1)p2 + np? (np)2
= np(1? p)
62/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Bernoulli distribution
When n = 1, the binomial distribution becomes the Bernoulli
distribution with the mgf being M(t) = (1? p) + pet.
It is easy to see M ′(t) =M ′′(t) =M (3)(t) = . . . = pet.
So E(X) = E(X2) = E(Xk) = p for any k = 1, 2, 3, . . . for the
Bernoulli distribution.
63/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Negative binomial distribution :
Consider observing a sequence of i.i.d. Bernoulli trials until exactly r
successes occur.
Let the r.v. X be the number of trials needed to obtain r successes,
i.e. X is the trial number on which the r-th success is observed.
Writing q = 1? p, it can be seen that
P (X = x)
= P (r ? 1 successes in the first x? 1 trails)
×P (success in the x-th trial)
=
64/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Thus the pmf of X is
f(x) =
(
x? 1
r ? 1
)
pr(1?p)x?r =
(
x? 1
r ? 1
)
prqx?r, x = r, r+1, r+2, . . .
We say that X has a negative binomial distribution, i.e.
X
d
= NB(r, p)
The reason it is called the negative binomial is that the pmf is
similar to each term in the Maclaurin’s series expansion of the
binomial function 1? w to the negative exponent ?r, that is,
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 27. Probability bargraphs for several negative binomial
distributions of different r and p values are listed below :
73/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Finding the mgf using all moments
We have seen how to obtain the moments from the mgf. We can also find
the mgf using all the moments, which is based on the following result :
If the Maclaurin’s series expansion for a mgf M(t) exists, then
M(t) =M(0) +M ′(0)
tk
k!
)
because M(0) = 1,M ′(0) = E(X), and M (k)(0) = E(Xk),
k = 1, 2, 3, . . ..
74/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 28. Suppose the moments of X are given by
E(Xk) = 0.8, k = 1, 2, 3, . . ..
Find the mgf of X and further the pmf of X.
M(t) = 1 +
= 0.2e0t + 0.8e1t
Therefore P (X = 0) = 0.2, and P (X = 1) = 0.8. This means that
X has Bernoulli distribution with p = 0.8.
75/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
6. The Poisson distribution
The Poisson (pronounced ‘pwa sohn’, a French name meaning ‘fish’)
distribution is used to model the probability of the number of
occurrences of particular events for a variety of phenomena.
Examples include
the number of fishes captured in a catch,
the number of phone calls arriving at a switchboard between 9 and
10 am ;
number of insurance claims during a year ;
number of car accidents in a city during a day ;
number and pattern of bomb droppings over London in World War
II ;
number of customers entering a specific shop during one day ;
etc.
The count numbers here are random variables (exactly or
approximately) possessing the three properties from the following
definition.
76/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Definition 5
Let the number of changes (or events) that occur in a given ”continuous
interval” be counted. We have a Poisson process with parameter λ > 0
if the following are satisfied :
(a) The numbers of changes occurring in non-overlapping intervals are
independent.
(b) The probability of exactly one change in a short interval of length h
is approximately λh.
(c) The probability of two or more changes in a short interval of length
h is much smaller than h.
Suppose we have a Poisson process, and let X denote the number of
changes in an interval of unit length.
We proceed to find the value P (X = x), where x is a nonnegative
integer.
We do this by calculating P (X = x) as the limit of binomial
probabilities.
77/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
We first partition the unit interval into n subintervals of equal length
1/n.
If n is sufficiently large, the probability that x changes occur in this
unit interval is basically the same as the probability of finding x of
the n subintervals each containing exactly one change and the other
subintervals containing no changes.
By (b) and (c), each subinterval contains either one change or no
change ; the probability of one change in a subinterval is
approximately λ(1/n), and the probability of no change in a
subinterval is 1? λ(1/n).
Thus, observing occurrence or nonoccurrence of a change in a
subinterval is a Bernoulli trial.
By (a), we have n independent Bernoulli trials here ; the probability
of ‘a change’ in each trial is λ/n.
78/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Hence,
t?1), ?∞ < t <∞.
The first and second derivatives of the mgf, and accordingly the
mean and variance, are
M ′(t) = λeteλ(e
t?1) ; so the mean of X is
μ = E(X) =M ′(0) = λ.
M ′′(t) = (λet)2eλ(e
t?1) + λeteλ(e
t?1) ; so
E(X2) =M ′′(0) = λ2 + λ.
So the variance of X is σ2 = Var(X) = E(X2)? [E(X)]2 = λ.
80/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
That shows that the parameter λ of a Poisson process can be
interpreted as the mean number of occurrences in a unit length
interval.
We often call λ the rate of occurrence (or intensity) parameter.
That is, if Y is the number of occurrences in an interval of length T
with rate of occurrence λ, then Y
d
= Poi(λT ) with
E(Y ) = Var(Y ) = λT .
Both the mean and the variance of a Poi(λ) distribution equal λ.
Determining a value for λ is the key step in calculating Poisson
probabilities.
81/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Example 29. Let X be the number of requests for assistance calls
received by a towering service during a peak-hour period (7am to 9am).
Suppose the average number of calls is 50 per hour. That is,
E(X) = λ = 2× 50 = 100 calls.
1 What is the probability that 120 calls will be received during the
peak-hour time ?
2 Find the probability that at most 108 calls will be received during
the 7am to 9pm?
3 What is the probability that no calls will be received during a 5-min.
break in this peak-hour time ?
Let Y denote the number of calls to be received during the break.
Then
82/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
R can be used to compute Poisson probabilities.
Given below are R commands for pmf, cdf and random number
generating of Poisson distribution :
dpois(x, lambda)
ppois(q, lambda)
rpois(n, lambda)
Type ‘help(dpois)’ for more information.
83/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Example 30. Suppose there are 300 misprints in a 500-page book. A
misprint is equally likely to occur on any page, and each page can contain
zero, one or more than one misprints.
What is the probability that 3 misprints are found on a specified page ?
The rate of occurrence parameter λ = 300500 = 0.6, i.e. 0.6 misprints
per page.
Let X be the number of misprints on a specified page. Then
X
d
= Poi(0.6)
Therefore, P (X = 3) =
0.63e?0.6
3!
= 0.0198.
84/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Example 31. To see the effect of λ on the pmf of a Poisson distribution,
we plot here a number of probability bargraphs of the pmf f(x) for 6
different values of λ.
85/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Approximating binomial by Poisson
From the derivation at the beginning of this section, we see that a
Poisson distribution can be used to approximate probabilities for a
binomial distribution.
.
Namely, the binomial probability from b(n, p) can be approximated by the
respective Poisson probability from Poi(np).
86/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Example 32. A manufacturer of Christmas tree light bulbs knows that
2% of its bulbs are defective. Let X be the number of defective bulbs in
a box of 100 of these bulbs. Assuming independence among defectives,
.859.
87/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The Poisson distribution
Plots of the pmf’s of binomial b(100, 0.02) and Poisson Poi(2) are
given below :
WX:codehelp mailto: [email protected]
标签:MAST20006,Module,离散,Discrete,pmf,分布,binomial,88MAST20006,MAST90057 From: https://www.cnblogs.com/qujava/p/17290046.html