bullet Random Sampling

When conducting statistical studies, it is important to have a "random" sample of subjects so as not to create a bias.

For example, you conduct a survey to determine the favorite TV program viewed by 8th grade students, but you only survey the football team. Your survey then shows the favorite TV program to be a sports program.

Your survey is biased. The students answering your survey were a "select" group of students, not a random group of students. A random group would give you a result more representative of ALL 8th graders.

drfinition A simple random sample is a subset of the statistical population in which each subject from the subset has an equal possibility of being chosen,

Random Samples: No favoritism is shown. The selection is purely by chance.
Random sampling guarantees that the sample chosen is representative of the population
and that the sample is selected in an unbiased way.

An example of a random sample for the TV survey mentioned above could be
surveying students as they enter the school building.

Other methods of random sampling may include drawing names from a hat, assigning and drawing numbers, using random number generators, and using random number table.


bullet Population Data versus Sample Data

When dealing with statistical data, it is important to distinguish between
"population" data sets and "sample" data sets.

definition A population data set contains all members of a specified group (the entire list of possible data values). [Utilizes the count n in formulas.]
Example: The population may be "ALL people living in the US."

definition A sample data set contains a part, or a subset, of a population. The size of a sample is always less than the size of the population from which it is taken. [Utilizes the count n - 1 in formulas.]
Example: The sample may be "SOME people living in the US."

When working with statistics, it is important to know if you are working with an entire population (where you have ALL of the possible data), or if you are working with only a sample (a part) of the data.

Let's take a look at identifying "sample data" versus "population data".
Directions: For the following problems, decide if the situation is dealing with a "population" data set, or with a "sample" data set. Explain your decision.

1. Mrs. Smith wants to do a statistical analysis on students' final examination scores in her math class for the past year. Should she consider her data to be a population data set or a sample data set?

2. A group of students surveys 100 students from their freshman class to determine the number of pets in each student's household. The group plans to compute statistical findings on their data and generalize these findings to the homes of all freshmen students. Should the group consider their data to be a population data set or a sample data set?


Use "population" when:
1. you know you have the ENTIRE population.
2. you have a sample of a larger population, but you are only interested in this sample (and you will not be generalizing your findings to the entire larger population).


For calculator info on
versus sample

click here.

Use "sample" when:
1. you have a sample of a larger population, and you wish to generalize your findings from this sample to the entire larger population from which this sample was taken. The sample will be used as an estimate of the population.

hint gal
Some questions will clearly state whether you are working with a population or a sample. If no statement is present, ask yourself if the statistical findings will be used to describe a larger group.
If the answer is yes, you are working with a sample.
Real world statisticians primarily work with sample situations,
since real-world data can be overwhelmingly large.



bullet Avoiding Biased Data

Statistical bias happens when "favoritism" in the data collection process
(or the reporting process) occurs, resulting in misleading results.

When dealing with data, it is possible that different statistical studies,
concerning the same issue, can arrive at very different results.

For example, one study shows that students who reviewed for the SAT examination for five or more hours, scored in the top 10% of the students taking the test in the spring of 2017. A second study, also relating to the spring of 2017, showed that reviewing for the SAT examination did not result in scores in the top 10% .

How is this possible? Shouldn't statistics always arrive at the same results concerning the same issue? Not necessarily.

Statistics can be influenced by a multitude of factors. In the case of the SAT examination, it may be the case that the populations used in the studies were different. It may have been the case that the students participating in the first study were all honor students, whereas the students in the second study were students of varying ability levels.

manner in which statistics are reported may accidentally (or intentionally) support a specific desired result. For example, a drug company publishes in a magazine a study that found positive results from the use of their knee-joint supplement, but does not publish a study that found negative, or even dangerous, results from use of the same supplement. Consumer beware!



Due to these influencing factors, it is important to understand how to avoid bias situations when using statistical data in your research, as well as how to recognize research which may be exhibiting bias.

Here are a few of the questions to ask yourself when dealing with data:

1. Who is collecting the data?
Does the group collecting the data have an interest in the final results? For example, a drug company funding research on the safety of their product may result in unreliable findings, due to a conflict of interest. The research should have been carried out by a third party that was not connected to, or paid by, the drug company.

2. How was the data collected?
Was the sample (the group used for the study) truly representative of the population (the much larger group to whom the findings are directed)? Was the sample group chosen at random? Statistical bias can be avoided by using random statistical samples. For example, 300 people, upon exiting the last Twilight Saga movie, were asked to respond to a survey regarding their preferences regarding types of literature enjoyed by Americans. Only 50 people completed the survey. Since only 50 of the 300 people responded, this was an insufficient number to be a representative sample. In addition, since it may be the case that the majority of the Twilight Saga movie-goers were teenagers, the sample may have already been biased toward a particular genre of literature.

3. Were reliable measuring instruments used?
If measuring devices were used in the collection of the data, were the devices reliable, accurate and appropriate for the study? For example, a digital scale used for measuring weights may not be sufficiently accurate for samples of extremely small size, thus yielding unreliable data for certain items in the sample.

4. What was the sample size of the study?
How many people, or items were studied? Were there a sufficient number of people (items) in the study to generalize the findings to the entire targeted population? For example, a survey asks high school students in French club, if they drink black coffee. Since there are only 15 students in French club, there are too few participants to generalize the survey's findings to all high school students.

5. When did the study take place?
Is the study (and the data) current, or did it occur decades ago? It may be the case, that developments since the date of the survey will contradict its findings. For example, decades ago, it was felt that second-hand cigarette smoke was not harmful. Further research concluded that this was not true, and that second-hand cigarette smoke contains cancer causing agents and is unsafe. Several laws now restrict smoking in public places.

Statistics analyze data to discover the truth. The truth, however, may be based upon the context of the data, the size of the sample, and the conditions under which the data were collected and reported. It is best to be skeptical (or at least, keep an open mind) when using and reading statistical data.


NOTE: The re-posting of materials (in part or whole) from this site to the Internet is copyright violation
and is not considered "fair use" for educators. Please read the "Terms of Use".