Outliers

- They’re extreme values that don’t really fit the pattern
- The rules for what counts as an outlier are given to you in the question
- It’s usually something to do with being a certain number of inter-quartile ranges outside the quartiles
- On box plots, if outliers have been defined, they are marked as crosses, and the box’s whiskers only go out to the outlier boundaries

Box (and whisker) Plots

- Box Plots illustrate the position of the quartiles of some data
- They only have an x-axis… there’s no y-axis
- They look like a box, divided in two by a line somewhere
- They also have whiskers, which extend out to the highest and lowest values in the data set (or, if there are outliers… see above)
- All you have to do to draw one is to plot the median, lower quartile and upper quartile as vertical lines, on a suitable horizontal scale, then draw the whiskers, with horizontal lines going out to them from the edges of the box
- Drawing two box plots on the same scale is a great way to compare the quartiles, range and inter-quartile range of two sets of data

Skew

- Skew is… hard to explain
- It’s how tipped / biased a set of data is towards higher or lower numbers
- It’s easy to visualise on a histogram or a box plot
- If the median is closer to the lower quartile, the data is positively skewed (the high end is fatter)
- If the median is closer to the upper quartile, the data is negatively skewed (the low end is fatter)
- Of course, it is possible for data to be completely symmetrical and have no skew at all
- If you’re asked which measures of location and dispersion it would be best to use for a data set, it’s best to say the median and inter-quartile range if the data is skewed, and the mean and standard deviation if the data isn’t skewed

How To Tell Skew

- You can get a clue from the relative positions of the averages…
- If mean < median < mode, you’ve got negative skew
- If mode < median < mean, you’ve got positive skew
- The median is always the middle average – if the mode is on one side of it, the mean must be on the other side of the median, and vice versa
- If Q3 – Q2 > Q2 – Q1, you’ve got positive skew
- If Q2 – Q1 > Q3 – Q2, you’ve got negative skew
- There’s also another method of working out skew, that a question will sometimes ask you to use:
- Mean – Median (or maybe mode!) all divided by standard deviation
- Don’t memorize it, though, because the question will provide it for you (from what I’ve seen in past papers, anyway…)
- 0 – no skew… 1 – some positive skew, -1 some negative skew, 2 – more positive skew, -2 more negative skew (yeah, this can go outside the range of -1 to 1, unlike the correlation coefficient r)

###### Related articles

- ‘Coding’ and Measures of Dispersion – AS Maths Revision – Statistics (S1) (mattg99.wordpress.com)
- Averages from Charts / Tables – AS Maths Revision – Statistics (S1) (mattg99.wordpress.com)

Advertisements