Parameters and Statistics
Parameters refer to the defining characteristics of a population, e.g. average household size, average income, etc.. In a normal distribution, it is defined by the mean and standard deviation (ref. formula for the normal distribution).
Statistics are used to estimate these parameters. However, there are non-parametric statistics which are used for populations with no assumed defining parameters.
In testing of hypothesis, we are interested in
Deductive Logic
Formal logic is also called deductive logic. One of the laws in logic that is relevant to testing of hypothesis is:
If A implies B (A -> B), then not B implies not A (~B -> ~A).
["A-> B" can be read as "If A is true then B is also true" .]
["~B -> ~A" can be read as "If B is false then A is also false".
For example, if the statement "If it rains, this floor will get wet" is true, then if this floor is not wet, it is not raining.
However, If A -> B, it is not necessary true that ~A -> ~B. In the above example, if it is not raining, the floor may still be wet (e.g. someone pours water onto it.)
Furthermore, If A -> B, it is also not necessary for B -> A. In the above example, if the floor is wet, it does not mean that it is necessarily raining.
Logic of hypothesis testing
In real life, we rarely find situation where (A -> B) can fully apply. In real life, we are always dealing with incomplete data (such as having samples instead of studying the whole population. Hypothesis testing works out like this:
If A is true, the probability of B to occur is low, but we now observe that B has occurred, therefore we conclude that A is probably false; or [this is called rejecting the hypothesis]
If A is true, the probability for B to occur is not low, though we have observed that B occurred, we cannot conclude that A is false. [not rejecting the hypothesis]
Null Hypothesis
The null hypothesis is the hypothesis that we base on to calculate probabilities of random variables to test if a certain assertion about a parameter is correct.
For example, when we observed 7 "6" in an experiment of casting a dice 20 times, we would like to test if the dice is fair (i.e. balanced or unbiased). To test this we start with a null hypothesis so that we can calculate probabilities:
Ho: The dice is fair, i.e. Probability of having a "6" in one throw is 1/6. [P(6) = 1/6]
Basing on the above null hypothesis, we can calculate the probabilities of various events and determine whether we can reject the hypothesis.
If the dice is fair, in casting the dice 20 times, the probability of observing
p(0 "6") = (5/6)20 = 0.026084
p(1 "6")= 20C1 (1/6) (5/6)19 = 0.13042
p(2 "6")= 20C2 (1/6)2 (5/6)18 = 0.328659
etc.
To test the hypothesis, we ask "What is the probability of observing 7 or more "6" in casting a dice 20 times." i.e. = 0.0371 [The reason we ask 7 or more"6" instead of just 7 "6" is because that we suspect that the dice is biased and if so we would expect to observe more "6". Moreover, the probability of observing exactly 7 "6" or any number of "6" is usually very small.] Since if Ho is true, the probability of observing 7 "6" is so low (< 0.05 as a convention), then we can eject the null hypothesis.
Alternate Hypothesis (Ha or H1)
Literally, it is the hypothesis that we will accept when the null hypothesis is rejected. In most cases, it is the hypothesis that is related to what we considered to be wrong with the Ho. In the above example, we would have:
H1: P(6) > 1/6
Type I and Type II error
| ¡@ | Do not reject Ho |
Reject Ho |
| Ho is true | OK |
Type I error |
| Ho is false | Type II error |
OK |
Type I error (a ) is the probability of rejecting Ho when it is in fact true.
Type II error (b ) is the probability of not rejecting Ho when it is in fact false.