Skew, Box Plots and Outliers – AS Maths Revision – Statistics (S1)

Outliers

  • They’re extreme values that don’t really fit the pattern
  • The rules for what counts as an outlier are given to you in the question
  • It’s usually something to do with being a certain number of inter-quartile ranges outside the quartiles
  • On box plots, if outliers have been defined, they are marked as crosses, and the box’s whiskers only go out to the outlier boundaries

Box (and whisker) Plots

  • Box Plots illustrate the position of the quartiles of some data
  • They only have an x-axis… there’s no y-axis
  • They look like a box, divided in two by a line somewhere
  • They also have whiskers, which extend out to the highest and lowest values in the data set (or, if there are outliers… see above)
  • All you have to do to draw one is to plot the median, lower quartile and upper quartile as vertical lines, on a suitable horizontal scale, then draw the whiskers, with horizontal lines going out to them from the edges of the box
  • Drawing two box plots on the same scale is a great way to compare the quartiles, range and inter-quartile range of two sets of data

Skew

  • Skew is… hard to explain
  • It’s how tipped / biased a set of data is towards higher or lower numbers
  • It’s easy to visualise on a histogram or a box plot
  • If the median is closer to the lower quartile, the data is positively skewed (the high end is fatter)
  • If the median is closer to the upper quartile, the data is negatively skewed (the low end is fatter)
  • Of course, it is possible for data to be completely symmetrical and have no skew at all
  • If you’re asked which measures of location and dispersion it would be best to use for a data set, it’s best to say the median and inter-quartile range if the data is skewed, and the mean and standard deviation if the data isn’t skewed

How To Tell Skew

  • You can get a clue from the relative positions of the averages…
  • If mean < median < mode, you’ve got negative skew
  • If mode < median < mean, you’ve got positive skew
  • The median is always the middle average – if the mode is on one side of it, the mean must be on the other side of the median, and vice versa
  • If Q3 – Q2 > Q2 – Q1, you’ve got positive skew
  • If Q2 – Q1 > Q3 – Q2, you’ve got negative skew
  • There’s also another method of working out skew, that a question will sometimes ask you to use:
  • Mean – Median (or maybe mode!) all divided by standard deviation
  • Don’t memorize it, though, because the question will provide it for you (from what I’ve seen in past papers, anyway…)
  • 0 – no skew… 1 – some positive skew, -1 some negative skew, 2 – more positive skew, -2 more negative skew (yeah, this can go outside the range of -1 to 1, unlike the correlation coefficient r)
Advertisements

About Matt

I like writing, filmmaking, programming and gaming, and prefer creating media to consuming it. On the topic of consumption, I'm also a big fan of eating.
This entry was posted in AS Maths Revision and tagged , , , , , , , . Bookmark the permalink.

Enter comment:

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s