Lecture 1

Didn't have the time to go back and fill in notes for this lecture - if you feel like contributing you're welcome to!

Lecture 2

The Counting Principle: Consider a process of \(r\) stages. Suppose that:

  1. There are \(n_1\) possible results at the first stage.
  2. For every possible result at the first stage, there are \(n_2\) possible results at the second stage.
  3. More generally, for any sequence of possible results at the first \(i - 1\) stages, there are \(n_1\) possible results at the \(i^{th}\) stage.
  • Then, the total number of possible results of the r-stage process is \[ n_1n_2 \cdots n_r \]

Big Idea: For problems without permutation or replacement, a series of decisions can be enumerated by multiplying the number of possible decisions at each stage.

Example: A pixel in a digital image is represented by 8 bits. How many encodings are possible for one pixel?

\[ 2^8 = 256 \]

Mutually Exclusive Counting: Suppose that \(X_1, \dots, X_t\) are sets and that the \(i^{th}\) set \(X_i\) has \(n_i\) elements. If \({X_1 \dots, X_i}\) is a pairwise disjoint family, the number of possible elements that can be selected from \(X_1\) or \(X_2\) or \( \dots \) or \(X_t\) is \[ n_1 + n_2 + \dots + n_t \]

Big Idea: Because the set family is pairwise disjoint, we can think of these sets like bubbles on a Venn Diagram that don't overlap. Counting total elements across these sets is the same as just adding the number of elements from each.

Example: A six person committee Alice, Ben, Connie, Dolph, Egbert, and Francisco is to select a chairperson, secretary, and treasurer.
  1. In how many ways can this be done? \[ 6\cdot5\cdot4=120 \]
  2. In how many ways can this be done if Alice or Ben must be chairman? \[ 5\cdot4 + 5\cdot 4 = 40 \]
  3. How many ways if Egbert holds one of the offices? \[ \begin{align} 5\cdot4 + 5\cdot4 + 5\cdot4 &= 60\\ 3(5\cdot4) &= 60 \end{align} \]
  4. How many ways if both Dolph and Francisco holds one of the offices? \[ 4\cdot3\cdot2=24 \]

Inclusion-Exclusion Principle for Two Sets: \[ |X \cup Y| = |X| + |Y| - |X \cap Y| \]

Big Idea: If two sets intersect, adding the cardinality of both individually would double-count the overlap so it's subtracted away.

Permutation: A permutation of \(n\) distinct elements \(x_1, \dots, x_n\) is an ordering of the \(n\) elements \(x_1, \dots, x_n\). There are \(n!\) permutations of \(n\) elements.

Example: ABCDEF Permutations
  1. How many permutations of the letters ABCDEF are there with the substring DEF?
    • Consider DEF to be a single element: \[ 4! = 24 \]
  2. How many permutations of the letters ABCDEF are there with DEF together in any order?
    • Number of ways to order DEF is \(3!\). The number of permutations with DEF together is \[ 3!4! = 144 \]
  3. How many ways can you seat 6 people around a round table? (if everybody moves \(n\) seats to the left or right, that is considered the same seating) \[ 5! \]

R-Permutation: An r-permutation of \(n\) distinct elements \(x_1, \dots, x_n\) is an ordering of an r-element subset of \(x_1, \dots, x_n\). The number of r-permutations of n distinct elements is \[ P(n, r) = \frac{n!}{(n-r)!} \]

Example: How many ways can we select a chairperson, vice-chairperson, secretary, and treasurer froma group of 10 persons?

\[ P(10, 4) = 10 \cdot 9 \cdot 8 \cdot 7 \]

Example: In how many ways can seven distinct Martians and five distinct Neptunians wait in line if no two Neptunians stand together?
  • The Martians can stand together however they'd like, so there are \(7!\) options for them. Because the Neptunians must have a Martian between them, there are 8 possible positions for them to stand. This means there are \(P(8, 5)\) possible orderings of Neptunians.

\[ 7! \cdot P(8, 5) = 33,868,800 \]

Combination: An r-combination of \(X\) is an unordered selection of r elements of \(X\). The number of r-combinations of \(n\) distinct objects is \[ C(n, r) = \frac{P(n, r)}{r!} = \frac{n!}{(n-r)!r!} \]

  • Notation: \(C(n, r) = \binom{n}{r} \)
Example: In how many ways can we select a committee of two women and three men from a group of 5 distinct women and 6 distinct men?

\[ C(5, 2) \cdot C(6, 3) = 200 \]

Example: How many 8-bit strings contain exactly 4 ones?

\[ C(8, 4) = 70 \]

Lecture 3

Sample Space & Events

Consider an experiment whose outcome is not predictable with certainty. The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by \(\Omega\).

Example: What is the sample space if the outcome of an experiment consists of the determination of the sex of a newborn child? What about the order of finish in a race among 7 horses numbered 1-7?

\[ \Omega = \{ g, b \}\\ \Omega = \{ \text{All } 7! \text{ permutations of } (1, 2, 3, 4, 5, 6, 7) \} \]

Any subset \(A\) of the sample space is known as an event.

The event \(A \cup B\) occurs if either \(A\) or \(B\) occurs.

For any two events \(A\) and \(B\), we define the event \(AB\) as the intersection of \(A\) and \(B\), or \(A \cap B\).

The event consisting of no outcomes is denoted as \(\emptyset\).

If \(AB = \emptyset\), the \(A\) and \(B\) are said to be mutually exclusive.

We define the event which consists of all outcomes in \(A_n\) for at least one value of \(n = 1, 2, \dots\) as: \[ A = \bigcup_{n=1}^\infty A_n \]

We define the event which consists of those outcomes in all events \(A_n\) for \(n=1, 2, \dots\) as: \[ A = \bigcap_{n=1}^\infty A_n \]

We define \(A^c\) as the complement of \(A\). These are all events in \(\Omega\) but not in \(A\).

If all events in \(A\) are also in \(B\), then we can write \(A \subset B\).

If \(A\) contains exactly the same events as \(B\), then \(A = B\).

Set Operations

\[ \left(\bigcup_{i=1}^n A_i\right)^c = \bigcap_{i=1}^n A_i^c \] In other words, the complement of the union of all \(A_i\) is the intersection of all of the complements of each \(A_i\).

\[ \left(\bigcap_{i=1}^n A_i\right)^c = \bigcup_{i=1}^n A_i^c \] In other words, the complement of the intersection of all \(A_i\) is the union of the all of the complements of each \(A_i\).

Axioms of Probability

We can define the probability of an event occurring by its relative frequency in \(n\) repetitions of an experiment as \(n \to \infty\).

Define \(P(A)\) as the probability that event \(A\) occurs and \(nA\) as the number of times in the first \(n\) repetitions of an experiment that the event \(A\) occurs.

\[P(A) = \lim_{n\to\infty}{\frac{nA}{n}} \] \[P(A) = \frac{\text{number of elements in A}}{\text{number of possible outcomes}}\]

Nonnegativity: \(P(A) \geq 0\) for every event \(A\).

Additivity: if \(A\) and \(B\) are two disjoint events, then the probability of their union satisfies \[ P(A \cup B) = P(A) + P(B) \]

This expression generalizes to \[ P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^{\infty} P\left(A_i\right) \]

In other words, the probability of the union of all events in \(A\) is the sum of the probability of each event \(A_i\).

Normalization: The probability of the entire sample space \(\Omega\) is equal to \(1\).

\[ P(\Omega) = 1 \]

Propositions

  1. \(P(A^c)=1-P(A)\) for every event \(A\).
  2. If \(A \subset B\), then \(P(A) \leq P(B)\).
  3. \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

*Note: proposition 3 can be generalized to any number of events.

Lecture 4

Conditional Probability

Conditional Probability: The conditional probability that \(A\) has occurred given that \(B\) has occurred with \(P(B) \geq 0\) is defined as \[ P(A|B) = \frac{P(A\cap B)}{P(B)} \]

Rather than thinking about \(P(A|B)\) within universe \(\Omega\), think of \(B\) as replacing \(\Omega\).

\(A\) can be thought of as an event that occurs within the universe \(B\).

If the possible outcomes are finitely many and equally likely, then \[ P(A|B) = \frac{\text{number of elements in }A\cap B}{\text{number of elements in }B} \]

Example: A bin contains 25 light bulbs, 5 of which are in good condition (last at least a month), 10 of which are partially defective (works initially for 2 days), and 10 which are totally defective. Given that a bulb is lit initially for 2 days, what is the probability that it will still be working after a week?
  • Because the bulb works initially for 2 days, we know it's not totally defective. We can consider this \(D^c\), or the complement of the defective set. \(D^c=\frac{15}{25}\) because \(\frac{10}{25}\) are totally defective.

  • We're looking for the probability that a bulb is perfectly good (\(G\)) given that it's not totally defective. The intersection of the non-defective bulbs and the perfectly good bulbs is \(\frac{5}{25}\) - in other words, 5 of the remaining bulbs are in good condition.

\[ P(G|D^c) = \frac{G\cap D^c}{P(D^c)} = \frac{\frac{5}{25}}{\frac{15}{25}}=\frac{1}{3} \]

The Law of Total Probability

Let \(A\) and \(B\) be events. We may express \(A\) as \[ A = (A \cap B) \cup (A \cap B^c) \]

We can also say that \[ \begin{align} P(A) &= P(AB) + P(AB^c)\\ &=P(A|B)P(B) + P(A|B^c)P(B^c) \end{align} \]

If the sample space \(\Omega\) can be partitioned into \(n\) mutually exclusive events \(A_1, A_2, \dots, A_n\), then \[ \begin{align} P(A) &= P(A \cap B_1) + P(A \cap B_2) + \dots + P(A \cap B_n)\\ &=P(A|B_1)P(B_1) + \dots + P(A|B_n)P(B_n) \end{align} \]

Bayes' Formula

We can find the conditional probability \(P(A_i|B)\), in terms of \(P(B|A)\) using the law of total probability and Bayes' rule.

Let \(A_1, A_2, \dots, A_n\) be disjoint sets that form a partition of the sample space, and assume that \(P(A_i) > 0\) for all \(i\). Then, for any event \(B | P(B) > 0\), we have \[ \begin{align} P(A_i|B) &= \frac{P(A_i)P(B|A_i)}{P(B)}\\ &=\frac{P(A_i)P(B|A_i)}{P(A_1)P(B|A_1) + \dots + P(A_n)P(B|A_n)} \end{align} \]

Example: A bowl contains three apples and four oranges. Bob walks by and randomly selects a fruit. Steve walks by 10 minutes later and also randomly selects a fruit from the bowl.

Let \(B_a\) be the event that Bob choses an apple, \(B_o\) the event that Bob chooses an orange, \(S_a\) the event that Steve chooses an apple, and \(S_o\) the event that Steve chooses an orange.

What is the probability \(P(S_a)\) that Steve selects an apple? Use the law of total probability.

\[ \begin{align} P(B_a) &= \tfrac{3}{7}\\ P(B_o) &= \tfrac{4}{7}\\ P(S_a | B_a) &= \tfrac{2}{6} = \tfrac{1}{3}\\ P(S_a | B_o) &= \tfrac{3}{6} = \tfrac{1}{2}\\ P(S_a) &= P(S_a | B_a)P(B_a) + P(S_a | B_o)P(B_o)\\ &= \tfrac{1}{3}\cdot\tfrac{3}{7} + \tfrac{1}{2}\cdot\tfrac{4}{7} = \tfrac{3}{7} \end{align} \]

If Steve selected an orange, what's the possibility that Bob had also selected an orange?

\[ P(B_o | S_o) = \frac{}{} = \frac{\left(\tfrac{1}{2}\right)\left(\tfrac{4}{7}\right)}{\left(\tfrac{1}{2}\right)\left(\tfrac{4}{7}\right) + \left(\tfrac{2}{3}\right)\left(\tfrac{3}{7}\right)} = \frac{1}{2} \]

Independent Events: Events \(A\) and \(B\) are said to be independent if

\[ \begin{align} P(A | B) &= \tfrac{P(AB)}{P(B)} = P(A)\text{, or}\\ P(AB)&=P(A)P(B) \end{align} \]

Lecture 5 - Random Variables

Random Variables

A random variable is a real-valued function of the outcome of an experiment.

A function of a random variable defines another random variable.

We can associate with each random variable certain "averages" of interest, such as the mean and variance.

A random variable can be conditioned on an event or on another random variable.

There is a notion of independence of a random variable from an event or from another random variable.

Discrete Random Variables

A random variable is called discrete if its range (the set of values that it can take) is either finite or countably infinite.

A discrete random variable is a real-valued function of the outcome of an experiment that can take a finite or countably infinite number of values.

A discrete random variable has an associated probability mass function (PMF), which gives the probability of each numerical value that the random variable can take.

A function of a discrete random variable defines another discrete random variable, whose PMF can be obtained from the PMF of the original random variable.

Probability Mass Functions

For a discrete random variable \(X\), we denote the PMF of \(X\) as \(P_x\).

If \(x\) is any real number, the probability mass of \(x\), denoted \(p_x(x)\), is the probability of the event \(\{ X=x \}\) consisting of all outcomes that give rise to a value \(X\) equal to \(x\): \[ p_x(x) = P(\{ X = x \}) \]

For each possible value \(x\) of \(X\),

  1. Collect all the possible outcomes that give rise to the event \(\{ X = x \}\).
  2. Add their probabilities to obtain \(p_x(x)\).
  3. Note that \[ \sum_x{p_x(x)}=1 \]
  4. For any set \(S\) of possible values of \(X\), we have \[ P(X \in S) = \sum_{x \in S}{p_x(x)} \]

For example, consider two tosses of a coin and let \(X\) be the number of heads. \[ p_x(x) = \begin{cases} \frac{1}{4} \text{ if } x \in \{ 0, 2 \}\\ \frac{1}{2} \text{ if } x = 1\\ 0 \text{ otherwise} \end{cases} \]

If \(X\) is the number of heads obtained on two independent tosses of a fair coin, the probability of at least one head is \[ P(X > 0) = \sum_{x=1}^{2}p_x(x) = \frac{1}{2}+\frac{1}{4}=\frac{3}{4} \]

The Bernoulli Random Variable

Suppose that a trial/experiment has an outcome that can be classified as either a success or a failure.

Let \(X\) be \(1\) if the outcome is a success and let \(X\) be \(0\) if the outcome is a failure. The PMF is: \[ \begin{align} p(0) &= P[X=0]=1-p\\ p(1) &= P[X=1]=p \end{align} \]

The Binomial Random Variable

Repeat the independent Bernoulli trials \(n\) times.

\(X\) is said to be binomial with parameters \((n, p)\).

The Bernoulli RV can be considered binomial \((1, p)\).

We use the binomial distribution to find \(k\) successes in \(n\) independent trials.

\[ p(k \text{ successes in } n \text{ trials}) = \binom{n}{k}p^k\left(1-p\right)^{n-k} \]

Example: Flip a fair coin 10 times. What is the probability that 6 heads appear?

\[ \begin{align} p(6\text{ successes in } 10 \text{ trials}) &= \binom{10}{6}\left(\tfrac{1}{2}\right)^6\left(\tfrac{1}{2}\right)^4\\ &=\binom{10}{6}\left(\tfrac{1}{2}\right)^{10} \end{align} \]

The Poisson Random Variable

Suppose the probability mass function of a random variable \(X\) is given by \[ p_x(k) = \frac{c\lambda^k}{k!}, k = 0, 1, 2, \dots, \text{where } \lambda > 0 \]

Knowing that all probabilities sum to 1, we must have \[ c\sum_{k=0}^{\infty}\frac{\lambda^k}{k!}=1 \]

Using the fact that the above property is the taylor series expansion of \(e^\lambda\), we know that \(c=e^{-\lambda}\).

Suppose we want to know \(P(X=0)\) and \(P(X>2)\). \[ \begin{align} P(X=0) &= e^{-\lambda}\\ P(X>2) &= 1-P(X=0)-P(X=1)-P(X=2)\\ &=1-e^{-\lambda}-\lambda e^{-\lambda}-\frac{\lambda^2e^{-\lambda}}{2} \end{align} \]

We'll see that the Poisson RV is closely related to the Exponential RV. The parameter \(\lambda\) in the exponential RV is a rate (mean rate), whereas the parameter \(\lambda\) in the Poisson is typically a number (mean number). To alleviate the confusion, the Poisson RV parameter is frequently referred to as \(\alpha\). \[ p_x(k) = e^{-\alpha}\frac{\alpha^k}{k!}, k = 0, 1, 2, \dots \]

The Geometric Random Variable

The number of Bernoulli trials \(k\) required for a success is geometrically distributed. \[ p_x(k) = (1-p)^{k-1}p, k=1, 2, \dots \]

Example:

If the probability of a station transmitting a packet successfully is \(\frac{1}{3}\), what is the probability that 2 transmissions are needed?

\[ p_x(2) = \left(1-\tfrac{1}{3}\right)\left(\tfrac{1}{3}\right) \]

Modified Geometric Random Variable

Suppose that we wanted the number of failures up until a success.

This is the modified geometric RV.

\[ p_x(k) = (1-p)^kp, k=1, 2, \dots \]

Example:

If the probability of a station transmitting a packet successfully is \(\tfrac{1}{3}\), what is the probability that there were 4 failures before a success?

\[ \left(\tfrac{2}{3}\right)^4\left(\tfrac{1}{3}\right) = \tfrac{16}{243} \]

Functions of Random Variables

Given a random variable \(X\), one may generate other random variables by applying various transformations on \(X\).

Let the random variable \(X\) be the temperature in degrees Celsius and consider the transformation \(Y = 1.8X + 32\), which gives the temperature in degrees Fahrenheit.

In this example, \(Y\) is a linear function of \(X\) of the form \[ Y = g(X) = aX + b \] where \(a\) and \(b\) are scalars.

We may also consider nonlinear functions of the general form \(Y = g(X)\).

To obtain \(p_Y(y)\) for any \(y\), we add the probabilities of all values of \(x\) such that \(g(x) = y\): \[ p_Y(y) = \sum_{\{ x|g(x)=y \}}{p_X(x)} \]

Let \(Y = |X|\) and let us apply the preceding formula for the PMF to the case where \[ \begin{cases} \frac{1}{9} \text{ if } x \in [-4, 4]\\ 0 \text{ otherwise} \end{cases} \]

The possible values of \(Y\) are \(y=0, 1, 2, 3, 4\).

We have \(p_Y(0) = p_X(0) = \frac{1}{9}\).

Note that two values of \(X\) correspond to each \(y=1, 2, 3, 4\). For example: \[ p_Y(1) = p_X(-1) + p_x(1) = \frac{2}{9} \]

Thus, the PMF of Y is \[ p_Y(y) = \begin{cases} \frac{2}{9} \text{ if } y=1, 2, 3, 4\\ \frac{1}{9} \text{ if } y=0\\ 0 \text{ otherwise} \end{cases} \]

For another related example, let \(Z = X^2\).

To obtain the PMF of \(Z\), we can view it either as the square of the random variable \(X\) or as the square of the random variable \(Y\).

If we let \(p_Z(z) = \sum{\{ y|y^2=z \}p_Y(y)}\), we obtain \[ p_Z(z) = \begin{cases} \frac{2}{9} \text{ if } z=1, 4, 9, 16\\ \frac{1}{9} \text{ if } z=0\\ 0 \text{ otherwise} \end{cases} \]

Expectation, Mean, and Variance

We define the expected value (also called the expectation or the mean) of a random variable \(X\) with PMF \(p_X(x)\) by \[ E[X] = \sum_{x}{xp_X(x)} \]

Consider two independent coin tosses, each with a \(\tfrac{3}{4}\) probability of a head, and let \(X\) be the number of heads obtained. This is a binomial random variable with parameters \(n = 2\) and \(p = \tfrac{3}{4}\). Its PMF is \[ p_X(k) = \begin{cases} \begin{align} &\left(\tfrac{1}{4}\right)^2 &\text{if } k=0\\ &2\cdot\left(\tfrac{1}{4}\right)\cdot\left(\tfrac{3}{4}\right) &\text{if } k=1\\ &\left(\tfrac{3}{4}\right)^2 &\text{if } k=2\\ &0 &\text{otherwise}\\ \end{align} \end{cases} \]

So, the mean is

\[ \begin{align} E[X] &= 0 \cdot \left(\tfrac{1}{4}\right)^2 + 1 \cdot \left(2 \cdot \tfrac{1}{4} \cdot \tfrac{3}{4}\right) + 2 \cdot \left(\tfrac{3}{4}\right)^2\\ &= \tfrac{24}{16}\\ &= \tfrac{3}{2} \end{align} \]

Moments & Variance

The \(n^{th}\) moment of the random variable \(X\), denoted as \(E[X^n]\) is the expected value of the random variable \(X^n\).

The most important quantity associated with a random variable \(X\) other than the mean is its variance, which is denoted by \(\text{var}(X)\) and is defined as the expected value of the random variable \(\left(X-E[X]\right)^2\).

Note that this means the variance is always nonnegative.

THe variance provides a measure of dispersion of \(X\) around its mean. Another measure of dispersion is the standard deviation of \(X\), which is defined as the square root of the variance and is denoted by \(\sigma_X\). \[ \sigma_X = \sqrt{\text{var}(X)} \]

Expected Value Rule for Random Variable Functions

Let \(X\) be a random variable with PMF \(p_X(x)\) and let \(g(X)\) be a real-valued function of \(X\). Then, the expected value of the random variable \(g(X)\) is given by \[ E[g(X)] = \sum_{x}{g(x)p_X(x)} \]

Let \(X\) be a random variable and let \(Y = aX+b\), where \(a\) and \(b\) are given scalars. Then, \(E[Y] = aE[X] + b\) and \(\text{var}(Y) = a^2\text{var}(X)\).

The variance of a random variable in terms of the moments expression is \[ \text{var}(X) = E[X^2] - \left(E[X]\right)^2 \]

Again, consider two independent coin tosses, each with a \(\tfrac{3}{4}\) probability of a head. Let \(X\) be the number of heads obtained. This is a binomial random variable with parameters \(n=2\) and \(p=\tfrac{3}{4}\). Its PMF is \[ p_X(k) = \begin{cases} \begin{align} &\left(\tfrac{1}{4}\right)^2 &\text{if } k=0\\ &2\cdot\left(\tfrac{1}{4}\right)\cdot\left(\tfrac{3}{4}\right) &\text{if } k=1\\ &\left(\tfrac{3}{4}\right)^2 &\text{if } k=2\\ &0 &\text{otherwise}\\ \end{align} \end{cases} \]

The first two moments and variance are \[ \begin{align} E[X] &= 0 \cdot \tfrac{1}{4}^2 + 1 \cdot \left(2 \cdot \tfrac{1}{4} \cdot \tfrac{3}{4}\right) + 2 \cdot \tfrac{3}{4}^2 = \tfrac{24}{16} = \tfrac{3}{2}\\ E[X] &= 0 \cdot \tfrac{1}{4}^2 + 1^2 \cdot \left(2 \cdot \tfrac{1}{4} \cdot \tfrac{3}{4}\right) + 2^2 \cdot \tfrac{3}{4}^2 = \tfrac{24}{16} = \tfrac{21}{8}\\ \text{var}(X) &= \tfrac{21}{8} - \tfrac{3}{2}^2 = \tfrac{3}{8} \end{align} \]

Discrete Uniform Random Variable

What is the mean and variance of the roll of a fair six-sided die? Its PMF is \[ p_X(k) = \begin{cases} \tfrac{1}{6} \text{ if } k=1, 2, 3, 4, 5, 6\\ 0 \text{ otherwise} \end{cases} \]

\[ \begin{align} E[X] &= 3.5\\ \text{var}(X) &= E[X^2]-\left(E[X]\right)^2\\ &= \tfrac{1}{6}\left(1^2+2^2+3^2+4^2+5^2+6^2\right) - (3.5)^2\\ &= \tfrac{35}{12} \end{align} \]

The above random variable is a special case of a discrete uniformly distributed random variable, or discrete uniform for short.

The discrete uniform random variable has a PMF of the form \[ p_X(k) = \begin{cases} \tfrac{1}{b-a+1} \text{ if } k \in [a, b]\\ 0 \text{ otherwise} \end{cases} \] where \(a, b \in \mathbb{Z}, a < b\).

The mean and variance are \[ \begin{align} E[X] &= \frac{a+b}{2}\\ \text{var}(X) &= E[X^2] - \left(E[X]\right)^2 = \frac{(b-a+1)^2-1}{12} \end{align} \]

Lecture 6

Continuous Random Variables

Continuous Random Variable: A random variable \(X\) is called continuous if its porbability law can be described in terms of a nonnegative function \(f_X\), called the probability density function of \(X\), or PDF for short.

A PDF satisfies \[ P(X \in B) = \int_B {f_X(x)}dx \]

for every subset \(B\) of the real line. In particular, the probability that the value of \(X\) falls within an interval is \[ P(a \leq X \leq b) = \int_a^b{f_X(x)}dx \]

Note: \[ P(a \leq X \leq b) = P(a < X < b) = P(a \leq X < b) = P(a < X \leq b) \]

Big Idea: The probability at an individual point on a continuous distribution is 0, so the equality of \(a\) or \(b\) is unimportant.

Note that to qualify as a PDF, a function \(f_X\) must be nonnegative, i.e. \(f_X(x) \geq 0\) for every \(x\), and mus talso satisfy the normalization equation \[ \int_{-\infty}^\infty{f_X(x)}dx = P(-\infty < X < \infty) = 1 \]

Continuous Uniform Random Variable

Consider a random variable \(X\) that takes values in an interval \([a, b]\) and assume that all subintervals of the same length are equally likely. We refer to this type of random variable as uniform or uniformly distributed. Its PDF has the form \[ f_X(x) = \begin{cases} c \text{ if } a \leq x \leq b\\ 0 \text{ otherwise} \end{cases} \]

This means that we can calculate the value of c via the following: \[ 1 = \int_a^b{c}dx = c\int_a^b{dx} = c(b-a) \] In other words, \[ c = \frac{1}{b-a} \]

Piecewise Constant PDF

Example: Alvin's driving time to work is between 15 and 20 minutes if the day is sunny, and between 20 and 25 minutes if the day is rainy, with all times being equally likely in each case.

Assume that a day is sunny with probability \(\tfrac{2}{3}\) and rainy with probability \(\tfrac{1}{3}\).

What is the PDF of the driving time viewed as a random variable \(X\)?

We interpret the statement that "all times are equally likely" in the sunny and rainy cases to mean that the PDF of \(X\) is constant in each of the intervals \([15, 20]\) and \([20, 25]\).

Furthermore, since these two intervals contain all possible driving times, the PDF should be zero everywhere else.

\[ f_X(x) = \begin{cases} c_1 \text{ if } 15 \leq x \leq 20\\ c_2 \text{ if } 20 \leq x \leq 25\\ 0 \text{ otherwise} \end{cases} \]

where \(c_1, c_2\) are some constants.

We can determine these constants by using the probabilities of a sunny and of a rainy day.

\[ \begin{align} \frac{2}{3} &= P(\text{sunny day}) = \int_{15}^{20}{f_X(x)}dx = \int_{15}^{20}{c_1}dx = 5c_1\\ \frac{1}{3} &= P(\text{rainy day}) = \int_{20}^{25}{f_X(x)}dx = \int_{20}^{25}{c_2}dx = 5c_2\\ \end{align} \]

So that \[ c_1 = \frac{2}{15}\quad c_2 = \frac{1}{15} \]

A PDF can be arbitrarily large

Consider a random variable \(X\) with PDF \[ f_X(x) = \begin{cases} \tfrac{1}{2\sqrt{x}} \text{ if } 0 < x \leq 1\\ 0 \text{ otherwise} \end{cases} \]

Even though \(f_X(x)\) becomes infinitely large as \(x\) approaches zero, this is still a valid PDF because \[ \int_{-\infty}^\infty{f_X(x)}dx = \int_0^1{\frac{1}{2\sqrt{x}}}dx = \sqrt{x}\Big|_0^1 = 1 \]

Summary of PDF Properties

Let \(X\) be a CRV with PDF \(f_X\). \[ \begin{gather} f_X(x) \geq 0 \text{ for all } x\\\\ \int_{-\infty}^\infty{f_X(x)}dx = 1\\\\ \text{If } \delta \text{ is very small, then } P(|x, x+\delta|) \approx f_X(x) \cdot \delta\\\\ \text{For any subset } B \text{ of the real line, } P(X \in B) = \int_B{f_X(x)}dx \end{gather} \]

Expectation

The expected value or mean of a CRV \(X\) is defined by \[ E[X] = \int_{-\infty}^\infty{xf_X(x)}dx \] We also have \[ E[g(X)] = \int_{-\infty}^\infty{g(x)f_X(x)}dx \] The variance of \(X\) is defined by \[ \text{var}(X) = E[(X - E[X])^2] = \int_{-\infty}^\infty{(X - E[X])^2f_X(x)}dx \] We have \[ 0 \leq \text{var}(X) = E[X^2] - (E[X])^2 \] If \(Y=ax+b\) where \(a\) and \(b\) are given scalars, then \[ E[Y] = aE[X] + b \quad \text{var}(Y) = a^2\text{var}(X) \]

Cumulative Distribution Functions

The CDF of a random variable \(X\) is denoted by \(F_X\) and provides the probability \(P(X \leq x)\). In particular, for every \(x\) we have

\[ F_X(x) = P(X \leq x) = \begin{cases} \sum_{k \leq x}{p_X(k)} \text{ if } X \text{ discrete}\\\\ \int_{-\infty}^x{f_X(t)}dt \text{ if } X \text{ continuous} \end{cases} \]

CDF Properties

The CDF \(F_X\) of a random variable \(X\) is defined by \[ F_X(x) = P(X \leq x) \text{ for all x} \]

  • \(F_X\) is monotonically nondecreasing: if \(x \leq y\), then \(F_X(x) \leq F_X(y)\)

  • \(F_X(x)\) tends to 0 as \(x \to -\infty\), and to 1 as \(x \to \infty\).

  • If \(X\) is discrete, then \(F_X\) has a piecewise constant and staircase-like form.

  • If \(X\) is continuous, then \(F_X\) has a continuously increasing form.

  • If \(X\) is discrete and takes integer values, the PMF and the CDF can be obtained from each other by summing or differencing: \[ \begin{gather} F_X(k) = \sum_{t=-\infty}^{k}{p_X(i)}\\\\ p_X(k) = P(X \leq k) - P(X \leq k-1) = F_X(k) - F_X(k-1) \end{gather} \] for all integers \(k\).

Conditional PDF and Expectation

The conditional PDF \(f_{X|A}\) of a continuous random variable \(X\) given an event \(A\) with \(P(A) > 0\) satisfies \[ P(X \in B|A) = \int_{B}{f_{X|A}(x)}dx \]

If \(A\) is a subset of the real line with \(P(X \in A) > 0\), then \[ P(X \in B|X \in A) = \frac{P(X \in B \cap X \in A)}{P(X \in A)} = \frac{\int_{A \cap B}{f_X(x)}dx}{P(X \in A )} \]

The resulting formula is \[ f_{X|A}(x) = \begin{cases} \frac{f_X(x)}{P(X \in A)} \text{ if } x \in A\\ 0 \text{ otherwise} \end{cases} \]

and \[ P(X \in B|X \in A) = \int_{B}{f_{X|A}(x)}dx \]

for any set \(B\).

The corresponding conditional expectation is defined by \[ E[X|A] = \int_{-\infty}^\infty{xf_{X|A}(x)}dx \]

The expected value rule remains valid: \[ E[g(X)|A] = \int_{-\infty}^\infty{g(x)f_{X|A}(x)}dx \]

If \(A_1, A_2, \dots, A_n\) are disjoint events with \(P(A_i) > 0\) for each \(i\) that form a partition of the sample space, then \[ f_X(x) = \sum_{i=1}^{n}P(A_i)f_X|A(x) \]

Law of total expectation: \[ \begin{align} E[X] &= \sum_{i=1}^{n}{P(A_i)E[X|A_i]}\\ E[g(X)] &= \sum_{i=1}^{n}{P(A_i)E[g(X)|A_i]} \end{align} \]

Multiple Continuous Random Variables

Two continuous random variables associated with a common experiment are jointly continuous and can be described in terms of a joint PDF \(f_{X,Y}\) if \(f_{X,Y}\) is a nonnegative function that satisfies \[ P((X, Y) \in B) = \int_{(x, y) \in B}{\int{f_{X,Y}(x, y)}dx}dy \]

For every subset \(B\) of the two-dimensional plane.

If \(B\) is a rectangle of the form \(B = [a, b] \times [c, d]\), we have \[ P(a \leq X \leq b, c \leq Y \leq d) = \int_{c}^{d}{\int_{a}^{b}{f_{X,Y}(x, y)}dx}dy \]

We have the normalization property \[ \int_{-\infty}^\infty{\int_{-\infty}^\infty{f_{X,Y}(x, y)}dx}dy = 1 \]

Marginal Distributions

Given the joint random variable \(f_{x, y}(x, y)\), the marginals are: \[ \begin{align} f_X(x) = \int_{-\infty}^\infty{f_{X,Y}(x, y)}dy\\ f_Y(y) = \int_{-\infty}^\infty{f_{X,Y}(x, y)}dx\\ \end{align} \]

Given the joint random variables of three variables \(f_{X,Y,Z}(x, y, z)\), the marginals are: \[ \begin{align} f_X(x) = \int_{-\infty}^\infty{\int_{-\infty}^\infty{f_{X,Y,Z}(x,y,z)}dy}dz\\ f_Y(y) = \int_{-\infty}^\infty{\int_{-\infty}^\infty{f_{X,Y,Z}(x,y,z)}dx}dz\\ f_Z(z) = \int_{-\infty}^\infty{\int_{-\infty}^\infty{f_{X,Y,Z}(x,y,z)}dx}dy\\ \end{align} \]

Expectation

If \(X\) and \(Y\) are jointly continuous random variables, and \(g\) is some function, then \(Z = g(X, Y)\) is also a random variable. \[ E[g(X, Y)] = \int_{-\infty}^\infty{\int_{-\infty}^\infty{g(x,y)f_{X,Y}(x,y)}dx}dy \]

As an important special case, for any scalars \(a, b\), we have \[ E[aX + bY] = aE[X] + bE[Y] \]

Conditioning

Let \(X\) and \(Y\) be continuous random variables with joint PDF \(f_{X,Y}\). For any fixed \(y\) with \(f_Y(y) > 0\), the conditional PDF of \(X\) given that \(Y=y\) is defined by \[ f_{X|Y}(x|y) = \frac{f_{X|Y}(x|y)}{f_Y(y)} \]

The variable \(y\) is usually a fixed number and \(f_{X|Y}(x|y)\) is a function of the single variable \(x\).

Note: \[ \int_{-\infty}^\infty{f_{X|Y}(x|y)}dx = 1 \]

So for any fixed \(y\), \(f_{X|Y}(x|y)\) is a legitimate PDF.

Conditional Expectation: \[ \begin{align} E[X|Y = y] &= \int_{-\infty}^\infty{xf_{X|Y}(x|y)}dx\\ E[g(X)|Y=y] &= \int_{-\infty}^\infty{g(x)f_{X|Y}(x|y)}dx \end{align} \]

Example:

Let \(X\) be exponentially distributed with mean 1. Once we observe the experimental value \(x\) of \(X\), we generate a normal random variable \(Y\) with zero mean and variance \(x+1\). What is the joint PDF of \(X\) and \(Y\)?

We have:

\[ f_X(x) = e^{-x} \text{ for } x \geq 0 \text{ and } f_{Y|X}(y|x) = \frac{1}{\sqrt{2\pi(x+1)}}e^\frac{{-y^2}}{2(x+1)} \]

The joint distribution is \[ f_{X,Y}(x,y) = f_X(x)f_{Y|X}(y|x) = e^{-x}\frac{1}{\sqrt{2\pi(x+1)}}e^\frac{{-y^2}}{2(x+1)} \] for all \(x \geq 0\) and all \(y\).

Discrete Uniform Random Variable

In many situations, we have a model of an underlying but unobserved phenomenon represented by a random variable \(X\) with PDF \(f_X\), and we make noisy measurements \(Y\) which is \(f_{Y|X}\).

Once the experimental value of \(Y\) is measured, what information does this provide on the unknown value of \(X\)?

Recall that when we used Bayes' rule for discrete distributions, we can go in both directions.

\[ f_Xf_{Y|X} = f_{X,Y} = f_Yf_{X|Y} \]

We have the expression \[ f_Xf_{Y|X} = \frac{f_X(x)f_{Y|X}(y|x)}{\int_{-\infty}^\infty{f_X(t)f_{Y|X}(y|t)}dt} \]

Example:

A lightbulb is known to have an exponentially distributed lifetime \(Y\). However, the company has been experiencing quality control problems. On any given day, the parameter \(\lambda\) of the PDF of \(Y\) is actually a random variable, uniformly distributed in the interval \([1, \tfrac{3}{2}]\). We test a lightbulb and record the experimental value \(y\) of its lifetime. What can we say about the underlying parameter \(\lambda\)?

We model the parameter \(\lambda\) as a uniform random variable \(\Lambda\) with PDF: \[ f_\Lambda(\lambda) = 2 \text{ for } 1 \leq \lambda \leq \tfrac{3}{2} \] All available information about \(\Lambda\) is contained int he conditional PDF \(f_{\Lambda|Y}(\lambda|y)\), which is given by:

\[ f_{\Lambda|Y}(\lambda|y) = \frac{2\lambda e^{-\lambda y}}{\int_{1}^{\frac{3}{2}}{2te^{-ty}}}dt \text{ for } 1 \leq \lambda \leq \frac{3}{2} \]

Lecture 7

Joint CDFs

The joint CDF of two random variables \(X\) and \(Y\) is defined as \[ F_{X,Y}(x,y) = P(X\leq x, Y\leq y) = \int_{-\infty}^{x}{\int_{-\infty}^{y}{f_{X,Y}(s, t)}dt}ds \]

Example

\[ F_{X,Y}(x,y)= \begin{cases} \frac{xy}{25} \text{ if } 0 \leq x, y \leq 5\\ 0 \text{ otherwise} \end{cases} \]

The sample space is the square \((0, 0)\) to \((5, 5)\).

Find \(P(X \leq 4, Y \leq 3)\). \[ P(X \leq 4, Y \leq 3) = F_{X,Y}(4,3)=\frac{12}{25} \]

The joint PDF \(f_{X,Y}(x,y)\) can be derived from the joint CDF by taking the derivatives with respect to both \(x\) and \(y\). \[ f_{X,Y}(x,y) = \frac{\delta^2}{\delta x\delta y}F_{X,Y}(x,y) \]

Example: Let the joint CDF be

\[ F_{X,Y}(x,y) = \begin{cases} \frac{xy}{25} \text{ if } 0 \leq x, y \leq 5\\ 0 \text{ otherwise } \end{cases} \]

The PDF will be

\[ \frac{delta^2}{\delta x \delta y} \cdot \frac{xy}{25} = \frac{1}{25} \]

Marginal Distributions

The marginal distributions \(f_X(x)\) and \(f_Y(y)\) can be calculated as follows: \[ \begin{align} f_X(x) &= \int_{-\infty}^{\infty}{f_{X,Y}(x,y)}dy\\ f_Y(y) &= \int_{-\infty}^{\infty}{f_{X,Y}(x,y)}dx \end{align} \]

The expectation of a function of a joint random variable is:

\[ E[g(X,Y)] = \int_{-\infty}^\infty{\int_{-\infty}^\infty{g(x,y)f_{X,Y}(x,y)}dx}dy \]

Conditional PDF

The conditional PDF of a continuous random variable \(X\), given an event \(A\) with \(P(A) > 0\), is defined as a nonnegative function

\[ P(X \in B|A) = f_{X|A}(x)dx \]

For any subset \(B\) of the real line,

\[ f_{X|{X \in A}} = \begin{cases} \frac{f_X(x)}{P(X \in A)} \text { if } x \in A\\ 0 \text{ otherwise } \end{cases} \]

Example:

The time \(T\) until a new light bulb burns out is an exponential random variable with parameter \(\lambda\). Ariadne turns the light on, leaves the room, and when she returns \(t\) time units later, finds the light bulb still on, which corresponds to the event \(A = \{T \in T\}\). Let \(X\) be the additional time until the light bulb burns out. What is the conditional CDF of \(X\) given event \(A\)?

\[ \begin{align} P(X > x|A) &= P(T > t + x|t > t)\\ &= \frac{P(T > t + x,T > t)}{P(T>t)}\\ &= \frac{P(T > t+x)}{P(T>t)}\\ &= \frac{e^{-\lambda(t+x)}}{e^{-\lambda t}}\\ &= e^{-\lambda x} \end{align} \]

Conditioning

For multiple random variables, if we condition on a positive probability event of the form \(C=\{(X,Y) \in A\}\), we have

\[ f_{XY|C}(x,y) = \begin{cases} \frac{f_{X,Y}(x,y)}{P(C)} \text{ if } (x,y) \in A\\ 0 \text{ otherwise } \end{cases} \]

In this case, the conditional PDF of \(X\) given this event can be obtained from the formula

\[ f_{X|C}(x) = \int_{-\infty}^\infty{f_{XY|C}(x,y)}dy \]

The above two formulas allow us to obtain the conditional PDF of a random variable \(X\) when the conditioning event is not of the form \(\{X \in A\}\) but is instead defined in terms of multiple random variables.

Let \(X\) and \(Y\) be continuous random variables with joint PDF \(f_{X,Y}\). For any \(y\) with \(f_Y(y) > 0\), the conditional PDF of \(X\) given that \(Y=y\) is defined by \[ f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} \]

Example:

Ben throws a dart at a circular target of radius \(r\). We assume that he always hits the target, and that all points of impact \((x, y)\) are equally likely so that the joint PDF of the random variables \(X\) and \(Y\) is uniform. Since the area of the circle is \(\pi r^2\), we have:

\[ f_{X,Y}(x,y) = \begin{cases} c=\frac{1}{\text{area}} \text{ if } (x, y) \text{ is in the circle}\\ 0 \text{ otherwise} \end{cases} \]

In other words, \[ f_{X,Y}(x,y) = \begin{cases} \frac{1}{\pi r^2} \text{ if } x^2+y^2 \leq r^2\\ 0 \text{ otherwise} \end{cases} \]

To calculate the conditional PDF \(f_{X|Y}(x|y)\), first find the marginal PDF \(f_Y(y)\). For \(|y| \leq r\), it is given by

\[ \begin{align} f_Y(y) &= \int_{-\infty}^\infty{f_{X,Y}(x,y)}dx\\ &= \frac{1}{\pi r^2}\int_{x^2+y^2 \leq r^2}dx\\ &= \frac{1}{\pi r^2} \int_{\sqrt{r^2-y^2}}^{\sqrt{r^2-y^2}}dx\\ &= \frac{1}{\pi r^2}\sqrt{r^2-y^2} \text{ if } |y| \leq r \end{align} \]

Note that the marginal PDF \(f_Y\) is not uniform. The conditional PDF is:

\[ f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} = \frac{\frac{1}{\pi r^2}}{\frac{2}{\pi r^2}\sqrt{r^2-y^2}} \text{ if } x^2+y^2 \leq r^2 \]

Assignment 2, #8

Poker dice is played by simultaneously rolling 5 dice. Show each of the following.

  1. \(P(\text{no two alike}) = 0.0926\) \[ \frac{\binom{6}{5}\cdot 5!}{6^5} = \frac{5}{54} \approx 0.0926 \]
Explanation

There are \(\binom{6}{5}\) ways to choose a roll containing 5 distinct numbers and \(5!\) ways to order those rolls. This is taken out of \(6^5\) possible rolls for five dice.

  1. \(P(\text{one pair}) = 0.4630\) \[ \frac{\binom{6}{1}\binom{5}{2}\cdot 3!\binom{5}{3}}{6^5} = \frac{3600}{6^5} \approx 0.4360 \]
Explanation

There are \(\binom{6}{1}\) choices for which number has a pairing and \(\binom{5}{2}\) ways to choose that pair from a roll of 5 dice. There are \(\binom{6-1}{3}\) ways to choose the remaining values (one value from the original 6 is off-limits) with a total of \(3!\) orderings, again taken out of \(6^5\) possible events.

  1. \(P(\text{two pairs}) = 0.2315\) \[ \frac{\binom{6}{2}\binom{5}{2}\binom{3}{2}\binom{4}{1}}{6^5} = \frac{1800}{6^5} \approx 0.2315 \]
Explanation

There are \(\binom{6}{2}\) ways to choose the two numbers that make up each pair. From here, we have \(\binom{5}{2}\) ways to choose the first pair (choosing 2 dice from 5) and \(\binom{3}{2}\) ways to choose the second (choosing another 2 dice from remaining 3).

From here our only remaining choice is the value of the final die, choosing 1 value from 4 remaining options, or \(\binom{4}{1}\).

  1. \(P(\text{three alike}) = 0.1543\) \[ \frac{\binom{6}{1}\binom{5}{3}\cdot 2!\binom{5}{2}}{6^5} = \frac{1200}{6^5} \approx 0.1543 \]
Explanation

There are \(\binom{6}{1}\) ways to choose which number makes up our three-alike and \(\binom{5}{3}\) ways to select three dice from five. \(2!\binom{6-1}{2}\) accounts for the selection of the remaining two from five potential values with \(2!\) potential orderings.

  1. \(P(\text{full house}) = 0.0386\) \[ \frac{\binom{6}{1}\binom{5}{3}\binom{5}{1}}{6^5} = \frac{300}{6^5} \approx 0.0386 \]
Explanation

There are \(\binom{6}{1}\) ways to select which value has a three-alike pairing and \(\binom{5}{3}\) ways to select that pairing from five dice. Since the remaining two dice must have the same value and one value from the original 6 is off-limits, we're selecting one value from the remaining five, or \(\binom{6-1}{1}\).

  1. \(P(\text{four alike}) = 0.0193\) \[ \frac{\binom{6}{1}\binom{5}{4}\binom{5}{1}}{6^5} = \frac{150}{6^5} \approx 0.0193 \]
Explanation

There are again \(\binom{6}{1}\) ways to decide which value will make up our four-alike and \(\binom{5}{4}\) ways to select four dice from five. \(\binom{6-1}{1}\) accounts for the ways to select a value for the remaining die.

  1. \(P(\text{five alike}) = 0.0008\) \[ \frac{\binom{6}{1}\binom{5}{5}}{6^5} = \frac{6}{6^5} \approx 0.0008 \]
Explanation

Given \(\binom{6}{1}\) ways to select which value has a five-alike pairing, there's only \(\binom{5}{5} = 1\) way to roll each value.