Objectives for Unit Three
Percentiles, Percentile Ranks and Central Tendency
1. Know the meaning of, use for,
and recognize examples of a percentage, percentile, and percentile rank.
A percentage is a percent number
compared to a standard. The standard is either "perfect" or "everyone."
Examples would include "scoring 84% on a test (84% of perfect)" and "64%
passed the test (64% of the students)." Percentages are used to describe
individual cases (84% on a test) or the distribution as a whole (64% of
students).
A percentile is a point that divides a distribution. For example, the 50th percentile divides a distribution into the top and bottom halves. Percentiles are used to describe characteristics of distributions. For example reporting the 50th percentile of a distribution is a measure of the central tendency (this is the median).
A percentile rank is a percent number that indicates the percentage of cases in a distribution below a given variable value. Percentile ranks are used to describe individual cases. For example standardized test scores are usually reported as percentile ranks. When you take the Graduate Record Examination, you get a percentile rank for each part of the test. A percentile rank of 99 indicates that 99% of the persons in the reference group (norm group) scored below the score that you received.
2. Know the reference points
needed to interpret a percentage, percentile, and percentile rank.
To interpret a percentage, it is
frequently helpful to know the possible or perfect value or the number
of cases (subjects) that was used. For example if you get heads 100% of
the time when you flip a coin, it is helpful to know whether you flipped
1 time or 100 times. If you scored 100% on a test it is helpful to know
whether there was 1 question or 100 questions.
To interpret a percentile, it is helpful to know the number possible. For example, if the 50th percentile on a test is 48, it would be helpful to know whether the possible is 50 or 100.
To interpret a percentile rank it is essential to know the characteristics of the reference group to which the score is being compared. For example, if you were a senior math major in college, receiving a percentile rank of 99 (top of the group) on a standardized college math test would be good if the group to which you were compared were other senior math majors, but if the group was college seniors in general it would be not nearly as good.
3. Know the measurement scale
used for a percentage, percentile, and percentile rank and the types of
statistical analysis appropriate for each.
Percentages are measured on a ratio
scale no matter what the scale is on the original variable. If you said
that 30% of a group is male and 70% female, even though gender is nominal,
the percentages are ratio (30% is half as much as 60%). There is a meaningful
zero to percentages which means no cases. There are no limitations in using
percentages in statistical analyses.
Percentiles are measured on the same scale as the original variable. They are not normally used in computing other statistics.
Percentile ranks are reported on an ordinal scale, which means that they are not appropriate for most statistical analyses. You should not compute the mean percentile rank nor compute a correlation between two sets of percentile ranks. Percentile ranks are normally just used to describe individual cases, not group characteristics.
4. Know the values that are appropriate
for percentile ranks.
Percentile ranks from 1 to 99 are
always reported as integers (no decimals). Occasionally values below 1
and above 99 are reported with decimal points. Values of 0 and 100 are
not used.
5. Know the meaning of a quartile.
There are three quartiles (1st,
2nd, and 3rd) that divide a distribution into quarters (top 25% of cases,
next 25%, etc.). The 1st quartile is the 25th percentile.
6. Know the meaning of deviation
score.
A deviation score is the distance
of a score from some point. If no indication is given of what the reference
point is, it is assumed to be the mean of the distribution.
7. Know the characteristics of
measures of the mean, median, and mode.
The mean is the point at the "mathematical
center" or "balance point" of the distribution. It usually does not correspond
to an actual score. The sum of the deviations around the mean equal zero.
The sum of the squared deviations around it are a minimum (less than any
other point).
The median is the point dividing the distribution in half. It frequently does not correspond to an actual score in the distribution. The median is the point around which the sum of the absolute values of the deviations is a minimum (less than any other point).
The mode is the score which is the most common (highest frequency) in the distribution. If there are two or more scores with the same frequency, there would be two or more modes (bimodal, trimodal, etc. distributions).
8. Know how to compute the mean
from raw scores.
The mean is computed by summing
the scores and dividing by N.
9. Know how to compute the median
from raw scores and from a frequency distribution.
If there is an odd number of scores
the median is the middle score when the scores are ranked from highest
to lowest. If there is an even number of scores the median is halfway between
the middle two scores. For example, if there are 11 scores, the median
is the 6th score from the bottom or top of the distribution; if there are
10 scores, the median is halfway between the 5th and 6th scores from the
bottom or top of the distribution. In the following frequency distribution
which has 17 scores, the median would be the 9th score from the bottom
which would be a 2 since both the 9th and 10th scores are 2.
X f
4 3
3 4
2 2
1 8
10. Know how to compute the mode
from raw scores and from a freqency distribution.
The mode is the score that occurs
the most frequently. In a frequency distribution it would be the score
with the largest frequency. In the example in the previous objective, the
mode would be 1 since there are 8 1's which is the most common score.
11. Know situations when the
mean, median, and mode are preferred.
The mean is preferred when a precise
measure of the group is desired and every score is important. It is usually
the best description of the total group for research purposes. Since in
most populations, the means, medians, and modes are similar (and in a normal
distribution they are identical), this is another reason why the mean is
the preferred statistic for research. It is not appropriate to use the
mean when there are extreme cases in the distribution that should not be
used in the description.
The median is preferred in this case (when there are extreme scores to be ignored). The median is not as sensitive to changes in scores as the mean and therefore is not as good for a precise description of the group. The median is a better indicator of the "typical" person in the group since half of the scores are above or below the median.
The mode is only preferred when the most common score is desired, when there is more than one mode, or with nominal data when the most common category is desired.
12. Know the effect changes in
scores or extreme scores have on the mean, median and mode.
If one score is added to a distribution,
the mean will change unless the new score is equal to the mean. If the
score is an extreme score the mean may change a great deal. The median
may change but only very slightly no matter what the new score is. The
mode will probably not change unless the new score is one in which the
frequency is equal to or one less than the existing mode.
13. Know the stability of the
mean, median and mode.
The mean is the most stable of
the three sample measures in terms of estimating the population parameter.
If repeated computing of means, medians, and modes was done from small
samples chosen from a larger population of scores, the variability of the
means of the samples would be smaller than the variability of the medians
or the variability of the modes.
14. Know the symbols for population
and sample means.
The sample mean is called "bar-X"
and is a capital X with a bar over it. The population mean is called "mu"
and is the Greek letter M (µ).
15. Know the value of the term
"average."
The term "average" is not a term
used in statistics or in research. It is a non-professional term used to
indicate central tendency. Since the mean, median, or mode may each be
the best indicator of central tendency or "average", and each of these
terms means a different thing and each has disadvantages, the type of statistic
used must be indicated in research communication and the word average should
not be used.
16. Know how to use the mean
and median to estimate the shape of a distribution.
If the mean is much higher or lower
than the median it suggests either a few extreme scores or a skewed distribution.
The mean is above the median in a positively skewed distribution and below
the median in a negatively skewed distribution. The extreme scores in the
tail of the distribution pulls the mean in that direction. The length of
the tail has no effect on the median.
17. Be able to estimate the mean
and median from a general description of scores.
If no information is known about
the shape of the distribution, the median is best estimated by taking a
point halfway between the highest and lowest point, excluding all extreme
points.
The mean is then adjusted up or down from this point based on whether the predominance of scores are above or below the estimated median and where the extreme scores are located.
18. Know the meaning and uses
for unweighted and weighted means.
Unweighted and weighted means are
means of groups rather than means of cases. An unweighted mean of two or
more groups is the mean of the groups ignoring the number of subjects in
each group. The weighted mean of two or more groups is equivalent to the
mean that would be computed if group membership was ignored and the mean
of all the cases was computed. In computing the weighted mean, the number
of cases in each group weights the mean of the group when the group means
are combined.
The unweighted mean is useful when each group is equally important (the number of cases in the groups is not related to the importance of the groups). The weighted mean is useful when each case is equally important and each group is not equally important.
19. Know how to estimate unweighted
and weighted means.
The unweighted mean is estimated
in the same way as the mean is estimated. It is simply the mean of the
group means. The sample size of each group is ignored.
To estimate the weighted mean, first
the unweighted mean is estimated and then an adjustment is made, taking
into consideration the size of the groups. The means of the larger groups
are more important (the weighted mean will be closer to the mean of the
larger groups than the smaller groups).