Hypothesis Testing

A statistical hypothesis test is a method of making inferences about one or more populations using sample data. It can help us to handle uncertainty, minimise subjectivity and manage the risk of decision errors. For example, we can use hypothesis test to help answer the following questions with a certain level of confidence:

  • Is process cycle time meeting customer expectations?
  • Is product quality at one factory better than the other?
  • Has the process improve or deteriorated?

Forming Hypotheses

A hypothesis test will have two opposing hypotheses, the null hypothesis and the alternative hypothesis.

The null hypothesis (H0)
  • usually states that some parameter of a population, such as the mean, is not different from a specified value or from that of another population
  • is assumed to be true unless sufficient evidence indicates the contrary
  • is not proven true; you simply fail to disprove it
The alternative hypothesis (H1)
  • states that the null hypothesis is wrong
  • can also specify the direction of the difference

Types of Hypothesis Test

The null hypothesis usually specifies that a parameter equals a specific value. Depending on what you want to find out, you can choose an appropriate alternative hypothesis from the three options as shown in the following table.

To determine if... Alternative hypothesis
Different from test mean Two-tailed
H1: μ ≠ test mean
Less than test mean One-tailed (left-tailed)
H1: μ < 0 minutes
Greater than test mean One-tailed (right-tailed)
H1: μ > 0 minutes

Results of Hypothesis Test

The following table gives a summary of possible results of any hypothesis test.

Decision
Reject H0 Do not reject H0
Truth H0 is true Type I Error Right Decision
H1 is true Right Decision Type II Error
Type I Error

A type I error occurs when H0 is wrongly rejected. For example, in a clinical trial of a new drug, a type I error would occur if we concluded that the new drug produced a different effect when in fact there was no difference with the existing ones. A type I error is often considered to be more serious and therefore more important to avoid than a type II error. The hypothesis test procedure is therefore built to ensure that there is a 'low' probability of rejecting the null hypothesis wrongly. The probability of a type I error is also known as the alpha risk.

Type II Error

A type II error occurs when the H0 is not correctly rejected. In the new drug clinical trial example, a type II error would occur if it was concluded that the new drug did not produce a different effect from the existing ones when in fact it did. Such an error is most frequently due to insufficient sample sizes to identify the falseness of the null hypothesis. The probability of a type II error is also known as the beta risk and the the power of the test is 1 minus beta risk.


Important Concepts


Sample Size

Depending on level of acceptable risk and the sensitivity of the test required, an appropriate sample size must be taken. As sample size increases, Beta risk reduces and the power of the test increases. However, in determining the sample size, you should also consider practical limitations of cost, time and resources.

Significance level

The significance level, a , of a hypothesis test is the limit set on the probability of wrongly rejecting H0. It is set by the investigator in consideration of the consequence of such an error. If the investigator wants to avoid making a false claim, the significance level should be made as small as possible. A typical value for significance level is 0.05, but you can choose higher or lower values depending on the sensitivity required for the test and the consequences of incorrectly rejecting H0.

P-value

P-value is the probability of wrongly rejecting H0 or be seen as the risk of rejecting H0. Therefore, if the p-value is less than the significance level, you can reject the H0.

Confidence interval

A confidence interval is a range of likely estimates of a population parameter (e.g. mean or standard deviation) calculated based on sample data. Like a hypothesis test, it can be used to make inference about the population. For example, if the test value is not within a 95% confidence interval, you can reject H0 at the 0.05 significance llevel. It can also be used to evaluate the reliability of an estimated population parameter. For example, an estimate may not be precise if the interval is very wide.


Logic of Hypothesis Test

Hypothesis testing starts by assuming that the null hypothesis is true. The test then determines how different the sample is from the assumptions in the null hypothesis. If the sample does not meet the assumptions, within a certain level of confidence, then HO is rejected. Putting it in simple terms, hypothesis testing is about the ‘Signal’ to ‘Noise’ ratio. Signal is change or difference we are trying to detect and the Noise is the inherent variability in the system. If the ratio is small, then there is no signal, it is just noise (pure chance). On the other hand, if the ratio large, there is significant changes or differences over and above the noise. Therefore, larger signals are easier to detect and on the other hand, larger noises make signals more difficult to detect. For instance, it is easier to prove a cycle time has reduced from 10 days to 5 days as compared to 10 days from 9 days. In addition, it is easier to detect changes if the variation is small (low noise).


How to Conduct a Hypothesis Test

The steps are similar for all hypothesis tests.

  1. Establish hypothesis
  2. Calculate statistics
  3. Conclude based on significance level. If P <= α , reject H0 and if P > α , fail to reject H0.

Caution

Although hypothesis test provides statistical significance, we also need to assess practical significance by looking at the size of the difference between the average and the hypothesized mean. It is also important to remember that statistical significance is affected by sample size, so statistical and practical significance don’t necessarily go together.