Sampling

In most statistical methods, samples are used to study and make inferences about the population. Sampling is important because it helps to reduce the cost and time taken to collect data for analysis. Sampling is the processing of selecting observations from a population so that by studying the samples, inferences can be made about the population. In other words, samples are like ‘view glasses’ of the population. A sample should resemble the population if the sample size iis large enough and sampling is done in a random manner. In general, the larger the sample size, the smaller the risk of making an incorrect inference to the population. The central limit theorem is the foundation for many statistical procedures. In the case of sampling, the theorem suggests that the sample averages tends towards a normal distribution (if there are large enough number of random samples, each sample contains large enough number of observations and samples are random) The key application of this theorem here is that the mean of sample means tends to the population mean and the variance of the samples tends towards the population variance divided by the sample size.

How to do sampling

Samples taken should be random such that each observation has an equal probability of being present in the sample. The following sections cover some of the key considerations for sampling.

Population Definition

Before taking samples, it is important to correctly define the population of concern which should be based on focused problem definition. The population should be defined where the problem or issue is. For instance, in an opinion poll, the population should not include people who are not eligible to vote. Key considerations to define the population include:

Which area the problem or issue arises?
Who is affected by the problem or issue?
When is the problem or issue happening?
Where is the problem or issue coming from?

Stratification

Stratified sampling should be done if the population has a number of distinct categories. This is to ensure that particular groups within a population are adequately represented in the sample. Typically, stratification chosen should have means which differ substantially from one another and variance minimised within strata and maximised variance between strata.

Systematic Sampling

Random sampling is useful for collecting historical data. For collect new data, systematical sampling can be used. For example, in collecting data in a call centre, one in every five calls can be selected for quality check.

Other considerations

It may be better to collect samples across a longer period of time than collecting a lot of samples within the same period
Unstable processes may require bigger samples
Processes with short cycle time may require frequent sampling