Conditional probability

From Wikipedia, the free encyclopedia

This article defines some terms which characterize probability distributions of two or more variables.

Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A|B), and is read "the probability of A, given B".

Joint probability is the probability of two events in conjunction. That is, it is the probability of both events together. The joint probability of A and B is written $\scriptstyle P(A \cap B)$ or $\scriptstyle P(A, B).$

Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. If B can be thought of as the event of a random variable X having a given outcome, the marginal probability of A can be obtained by summing (or integrating, more generally) the joint probabilities over all outcomes for X. For example, if there are two possible outcomes for X with corresponding events B and B', this means that $\scriptstyle P(A) = P(A \cap B) + P(A \cap B^')$ . This is called marginalization.

In these definitions, note that there need not be a causal or temporal relation between A and B. A may precede B or vice versa or they may happen at the same time. A may cause B or vice versa or they may have no causal relation at all. Notice, however, that causal and temporal relations are informal notions, not belonging to the probabilistic framework. They may apply in some examples, depending on the interpretation given to events.

Conditioning of probabilities, i.e. updating them to take account of (possibly new) information, may be achieved through Bayes' theorem. In such conditioning, the probability of A given only initial information I, P(A|I), is known as the prior probability. The updated conditional probability of A, given I and the outcome of the event B, is known as the posterior probability, P(A|B,I).

1 Introduction
2 Definition
3 Statistical independence
4 Mutual exclusivity
5 Other considerations
6 The conditional probability fallacy
- 6.1 An example
7 Conditioning on a random variable
8 See also
9 References

[edit] Introduction

Consider the simple scenario of rolling two fair six-sided dice, labelled die 1 and die 2. Define the following three events:

A: Die 1 lands on 3.

B: Die 2 lands on 1.

C: The dice sum to 8.

The prior probability of each event describes how likely the outcome is before the dice are rolled, without any knowledge of the roll's outcome. For example, die 1 is equally likely to fall on each of its 6 sides, so $P (A) = 1 / 6$ . Similarly $P (B) = 1 / 6$ . Likewise, of the 6 × 6 = 36 possible ways that two dice can land, just 5 result in a sum of 8 (namely 2 and 6, 3 and 5, 4 and 4, 5 and 3, and 6 and 2), so $P (C) = 5 / 36$ .

Some of these events can both occur at the same time; for example events A and C can happen at the same time, in the case where die 1 lands on 3 and die 2 lands on 5. This is the only one of the 36 outcomes where both A and C occur, so its probability is 1/36. The probability of both A and C occurring is called the joint probability of A and C and is written $P(A \cap C)$ , so $P(A \cap C) = 1/36$ . On the other hand, if die 2 lands on 1, the dice cannot sum to 8, so $P(B \cap C) = 0$ .

Now suppose we roll the dice and cover up die 2, so we can only see die 1, and observe that die 1 landed on 3. Given this partial information, the probability that the dice sum to 8 is no longer 5/36; instead it is 1/6, since die 2 must land on 5 to achieve this result. This is called the conditional probability, because it's the probability of C under the condition that is A is observed, and is written $P(C \mid A)$ , which is read "the probability of C given A." Similarly, $P(C \mid B) = 0$ , since if we observe die 2 landed on 1, we already know the dice can't sum to 8, regardless of what the other die landed on.

On the other hand, if we roll the dice and cover up die 2, and observe die 1, this has no impact on the probability of event B, which only depends on die 2. We say events A and B are statistically independent or just independent and in this case

$P(B \mid A) = P(B) \, .$

In other words, the probability of B occurring after observing that die 1 landed on 3 is the same as before we observed die 1.

Intersection events and conditional events are related by the formula:

$P(C \mid A) = \frac{P(A \cap C)}{P(A)} .$

In this example, we have:

$1/6 = \frac{1/36}{1/6}$

As noted above, $P(B \mid A) = P(B)$ , so by this formula:

$P(B) = P(B \mid A) = \frac{P(A \cap B)}{P(A)} .$

On multiplying across by P(A),

$P(A)P(B) = P(A \cap B) .$

In other words, if two events are independent, their joint probability is the product of the prior probabilities of each event occurring by itself.

[edit] Definition

Given a probability space $\scriptstyle (\Omega, F, P)$ and two events $\scriptstyle A, B\,\in\, F$ with P(B) > 0, the conditional probability of A given B is defined by

$P(A \mid B) = \frac{P(A \cap B)}{P(B)}.\,$

If $P (B) = 0$ then $P(A \mid B)$ is undefined, or at any rate irrelevant.

[edit] Statistical independence

Two random events A and B are statistically independent if and only if

$P(A \cap B) \ = \ P(A) P(B)$

Thus, if A and B are independent, then their joint probability can be expressed as a simple product of their individual probabilities.

Equivalently, for two independent events A and B with non-zero probabilities,

$P(A|B) \ = \ P(A)$

and

$P(B|A) \ = \ P(B).$

In other words, if A and B are independent, then the conditional probability of A, given B is simply the individual probability of A alone; likewise, the probability of B given A is simply the probability of B alone.

[edit] Mutual exclusivity

Two events A and B are mutually exclusive if and only if $\scriptstyle A \cap B \,=\, \varnothing$ . Then $\scriptstyle P(A \cap B)\, =\, 0$ .

Therefore, if P(B) > 0 then $\scriptstyle P(A\mid B)$ is defined and equal to 0.

[edit] Other considerations

If B is an event and P(B) > 0, then the function Q defined by Q(A) = P(A|B) for all events A is a probability measure.

Many models in data mining can calculate conditional probabilities, including decision trees and Bayesian networks.

[edit] The conditional probability fallacy

Main article: Confusion of the inverse

The conditional probability fallacy is the assumption that P(A|B) is approximately equal to P(B|A). The mathematician John Allen Paulos discusses this in his book Innumeracy (p. 63 et. seq.), where he points out that it is a mistake often made even by doctors, lawyers, and other highly educated non-statisticians. It can be overcome by describing the data in actual numbers rather than probabilities.

The relation between P(A|B) and P(B|A) is given by Bayes Theorem:

$P(B {\mid} A)= P(A {\mid} B) \, \frac{P(B)}{P(A)}.$

In other words, one can only assume that P(A|B) is approximately equal to P(B|A) if the prior probabilities P(A) and P(B) are also approximately equal.

[edit] An example

In the following constructed but realistic situation, the difference between P(A|B) and P(B|A) may be surprising, but is at the same time obvious.

In order to identify individuals having a serious disease in an early curable form, one may consider screening a large group of people. While the benefits are obvious, an argument against such screenings is the disturbance caused by false positive screening results: If a person not having the disease is incorrectly found to have it by the initial test, they will most likely be quite distressed until a more careful test shows that they do not have the disease. Even after being told they are well, their lives may be affected negatively.

The magnitude of this problem is best understood in terms of conditional probabilities.

Suppose 1% of the group suffer from the disease, and the rest are well. Choosing an individual at random,

P (disease) = 1% = 0.01

and

P (well) = 99% = 0.99.

Suppose that when the screening test is applied to a person not having the disease, there is a 1% chance of getting a false positive result, i.e.

P (positive | well) = 1%

, and

P (negative | well) = 99%.

Finally, suppose that when the test is applied to a person having the disease, there is a 1% chance of a false negative result, i.e.

P (negative | disease) = 1%

and

P (positive | disease) = 99%.

Now, one may calculate the following:

The fraction of individuals in the whole group who are well and test negative:

$P(\text{well}\cap\text{negative})=P(\text{well})\times P(\text{negative}|\text{well})=99%\times99%=98.01%$ .

The fraction of individuals in the whole group who are ill and test positive:

$P(\text{disease}\cap\text{positive})=P(\text{disease})\times P(\text{positive}|\text{disease})=1%\times99%=0.99%$ .

The fraction of individuals in the whole group who have false positive results:

$P(\text{well}\cap\text{positive})=P(\text{well})\times P(\text{positive}|\text{well})=99%\times1%=0.99%$ .

The fraction of individuals in the whole group who have false negative results:

$P(\text{disease}\cap\text{negative})=P(\text{disease})\times P(\text{negative}|\text{disease})=1%\times1%=0.01%$ .

Furthermore, the fraction of individuals in the whole group who test positive:

$P(\text{positive})=P(\text{well }\cap\text{ positive}) + P(\text{disease }\cap\text{ positive}) = 0.99%+0.99%=1.98%$ .

Finally, the probability that an individual actually has the disease, given that the test result is positive:

$P(\text{disease}|\text{positive})=\frac{P(\text{disease }\cap\text{ positive})} {P(\text{positive})} = \frac{0.99%}{1.98%}= 50%$ .

In this example, it should be easy to relate to the difference between P(positive|disease) (which is 99%) and P(disease|positive) (which is 50%): the first is the conditional probability that an individual who has the disease tests positive; the second is the conditional probability that an individual who tests positive actually has the disease. With the numbers chosen here, the last result is likely to be deemed unacceptable: half the people testing positive are actually false positives.

[edit] Conditioning on a random variable

There is also a concept of the conditional probability of an event given a random variable. Such a conditional probability is a random variable in its own right.

Suppose X is a random variable that can be equal either to 0 or to 1. As above, one may speak of the conditional probability of any event A given the event X = 0, and also of the conditional probability of A given the event X = 1. The former is denoted P(A|X = 0) and the latter P(A|X = 1). Now define a new random variable Y, whose value is P(A|X = 0) if X = 0 and P(A|X = 1) if X = 1. That is

$Y = \begin{cases} P(A\mid X=0) & \text{if }X=0; \\ P(A\mid X=1) & \text{if }X=1. \end{cases}$

This new random variable is the conditional probability of the event A given the random variable X:

$Y = P(A\mid X) \,$

According to the "law of total probability", the expected value of Y is just the marginal (or "unconditional") probability of A.

More generally still, it is possible to speak of the conditional probability of an event given a sigma-algebra. See conditional expectation.

[edit] See also

[edit] References

This article or section is missing citations or needs footnotes.
Using inline citations helps guard against copyright violations and factual inaccuracies. (December 2007)