Random Experiment

A random experiment is an experiment that yields the following conditions:

  • repeatable
  • several mutual exclusive outcomes are possible 
  • outcome is up to chance

Typical example for random experiments are the roll of a die, or the flip of a coin.

Sample Space

A random experiment can have a number of different outcomes, e.g. the outcome of rolling a die (fair and 6 sided) is a number between 1 and 6. The set of all possible outcomes of an experiment is called the sample space and denoted by $\Omega$. So the set of possible outcomes of the experiment of rolling a die is:


Each individual outcome (1 or 2 or 3 ..) is called an elementary event.


An event is any subset of the sample space, for our example of rolling a die an event could be the occurrence of an even numer. Then the event, let’s call it $A$ would be:


We say that event A occurs when one of the elementary events that constitute the event occurs. So in case of event A above, we can say that it occurs in case we roll a 2 or a 4 or a 6. So although we use a singular term (“event A occurred”) we refer to a set of possible elementary events. 

Event Space

all possible events of a sample space, i.e. all possible combinations of the elementary events of a sample space make up the event space. E.g. for the die roll example


all of the above events, labeled A-D are part of the even space.

Sets in Probability

We have already stated that events are subsets of combinations of the sample space $\Omega$. Therefore we can apply set theory. For example the sets $A$ and $B$ from above contain one common elementary event, which is 1. Sets $A$ and $D$ however are disjunct. We can nicely visualize this using Venn Diagrams:

todo: put here an image that shows set A and B and how they overlap

and another image where they are disjunkt (A and D)

We will resume this topic a bit later.


We can assign a number to our event $A$ which describes our believe about how likely the occurrence of $A$ is. This number is called the probability of the occurrence of $A$ and is written as:


where $a$ is the probability. The probability is not any random number, but must satisfy certain conditions. These conditions are defined by the Probability Axioms of Kolmogoroff.

Probability Axioms of Kolmogoroff

  1. $P(A) \geq 0$
  2. $P(A\cup B) = P(A) + P(B)$ for mutually exclusive events A and B
  3. $P(\Omega)=1$

What this tells us, that a probability is always 0 or positive (Nonnegativity 1.)  Axiom 3 tells us, that we distribute the mass of 1 over all the elementary events in the sample space. This can be nicely visualized with a graph here (todo). And finally axiom 2 tells us, that the probability of events (recall that events are simply subsets of elementary events) can simply be added. But careful, this holds for mutually exclusive events. Todo make a figure here:In case of overlapping events, be careful to subtract the overlap, as you can see in figure todo. 

Naive definition of probability

before the the idea that probability is something obeying a set of axioms, there was another definition: number of favorable outcomes divided by number of possible outcomes. E.g. event A in case of a die roll is {1,2} then P(A) (probability that a 1 or a 2 is the outome) is 2 (because 1 and 2 are two outcomes that we are interested in) / 6 (because with a 6 sided die you can obtain 6 different outcomes) So P(A)=2/6=1/3.

This however works only under 2 VERY strong restrictions:

  1. All outcomes of the experiment are equally likely
  2. there are a finite number of outcomes (finite sample space) (todo what is an example of an infinite number of outcomes?)

Because of these restrictions it was not that useful. However, sometimes it still is useful. And since it deals with numbers of outcomes, it is very helpful to know how to count all the number of outcomes. Sometimes the numbers are so high, you can’t really count it, you need some shortcut, some formula that helps you to compute the number of outcomes.


What we most often use is

without replacementwith replacement
order does not matter (combination)$\binom{n}{k} = \frac{n!}{k!(n-k)!} \\

order does matter (variation)$\frac{n!}{(n-k)!} = n*(n-1)..(n-k+1)$$n^k$

Note, $\frac{n!}{(n-k)!}$ counts all the possible combinations of k items drawn from a set of n items while considering order. When NOT considering order you have as basic set the number of occurrences computed by $\frac{n!}{(n-k)!}$ but you need to take out all those combinations that cannot be distinguished anymore. This is equal to dividing by $k!$. So the binomial coefficient is just the formula for “order matters without replacement” divided by $k!$.

Conditional Probability

often we are interested in the occurrence of an event B after another event, A, has already occurreded. Example. Roll a 6 after ou have already rolled a 6 before. We call this the probability of B given A and write it as:


Using the frequentist appraoch (todo right?) we can define the conditional probability by putting the probability that both events occur into relation to the occurrence of the event on which we condition.

P(B\mid A) =\frac{P(B\cap{A})}{P(A)}

Joint Probability

Follows from the formula for conditional probability

P(A\cap{B})= P(B\mid{A}) * P(A)

Note, that when A and B are disjunct the conditional prob. is 0. When P(A) is 0 then the conditional prob. is not defined.

Conditional Independence


P(A\cap{B})= P(B) * P(A)

We say that A and B are independent of each other. Note that that means that $P(B\mid{A})$ =P(B). In other word, the occurrence of A has no influence on the occurrence of B. 

Note, conditional independence does NOT mean that A and B are disjunct sets.

Give example that shows that when disjunct you know a whole lot about the occurrence of the other event.

On the contrary, if they were disjunct, the B could never occur when A had already occurred, so in this case the occurrence of B would contain a lot of information on the occurrence of A.  For example A=1,3,5 and B=2,4,6. We roll the die only once. B can never occur once A has occurred. hmm better use an urne example here. E.g. on urne has only white balls the other blacks. When you obtain a black ball you know that the event of having chosen urne 1 is impossible.

Show that independence does not imply disjunct sets:

Here show an example how you can be independent but overlap:

Experiment two consecutive coin flips. $\Omega{HH, HT, TH, TT}. Event A be {HH, TH}, event B be {TT,TH}. Make a venn diagram that shows the overlap TH.


$P(A) = 1/4$, $P(B)=1/4$, thus, 


P(A) * P(B) = 1/4 *1/4 = 2/8=1/4=$P(A\cap{B}$ => event A and B are independent, however their event sets overlap.


Joint probabilites

What is it?

In axiom 2 we saw $P(A\cup{B})$. This describes the probability that event $A$ OR event $B$ occurs.  We can also assign a probabilty to $P(A\cap{B})$. This is the probability that event A AND event B occur. For example or random experiment is to roll a die twice and note down the result of the first and the second roll. So the elementary events we can obtain in this eperiment are 36 different ones: $\Omega=\{(1,1), (1,2), …(6,6)\}$=. Let’s say we are interested in the event, let’s call it C, that we roll a 6 both times, thus $C={(6,6)}$. We can construct 2 other events A and B, where A is the event that we obtain a 6 when rolling the die for the first time and event B is the event that we obtain a 6 the second time. Thus, we can say $P(C) = P(A\cap{B})$. The probability of this is according to our frequentist approach 1/36, since there are 36 possible elementary events and one of these is our event $C$.

How to compute it?

In the case above we knew the value of $P(A\cap{B})$, but what if we are not so lucky? How can we compute it if we only knew $P(A)$ and $P(B)$?

For that we must introduce the term of Conditional Probability

So I thought, that when the events A and B are independent of each other, that would mean that there is no overlap between A and B or in other words 


But in this case 

P(A\mid B) = 0

But we do know that when A and B are independent:  P(A\mid{B})=P(A)  

So what is my mistake? My mistake was to assume that statistical independence is given, when A and B do not overlap. But that is wrong! In fact the opposite is true. A and B are independent when $ B \subseteq{A} $  they completely overlap. So conditional probability is a way to quantify the likelihood that the outcome which was observed to belong to B, also belongs to some other given event A (Bertsekas) . So we put the amount of elements that are in A AND B into relation to the number of elements that are in B. Then In this case the occurrence of B does not give us additional information on the occurrence of A

Law of total probability

\[\begin{equation} P(X) = sum

 \end{equation} \tag{1}\label{eq:eq1} \]

Was this helpful?

1 / 0

Cookie Consent with Real Cookie Banner