This is a post that’s been rattling round in my brain for months now, I’m finally getting it down in words. Firstly, a huge shout-out to Anna Martin (@) for all the help, suggestions, proofing,… she’s given so kindly to make sure this all makes sense!
Right this has all got soooo long that I’ve split it into three parts (1) Sampling variation (2) introducing confidence intervals and (3) comparing two groups
Our NZ Stats curriculum is awesome and has developed a clear learning progression for developing (and assessing) students’ sample-to-population inferential understandings. This progression is based on work from Maxine Pfannkuch, Chris Wild, Pip Arnold and others (for example Pfannkuch, M. J., Wild, C. J., & Parsonage, R. (2012). A conceptual pathway to confidence intervals. ZDM – The International Journal of Mathematics Education, 44 (7), 899-911; Arnold, P., Pfannkuch, M., Wild, C. J., Regan, M., & Budgett, S. (2011). Enhancing students’ inferential reasoning: From hands-on to “movies”. Journal of Statistics Education, 19 (2), 1-32) and is about having clear visual cue to develop and reinforce student understanding. We are looking at what we can say about what is happening for two groups back in the population, based on samples, generally focusing on the median and mean.
I’ve popped a summary together here of how I view the learning progressions from Year 10 to Year 13, including new ideas at each level, what needs reinforcing, and “watch-fors” (Year 10, Level 1 Multivariate data 91035, Level 2 Inference 91264, Level 3 Formal Inference 91582). Please let me know if it’s helpful, ask for clarification, play spot the errors etc. But… after teaching these progressions again this year, across all three levels (and still going!) there are some common themes and sticking points for students (and teachers). Here’s my thinking on it.
Sampling variation: The variation in a sample statistic from sample to sample.
This is a BIG IDEA concept that we start developing in Year 10 with our students, repeat and build upon for the next three years. Initially I look at this in a one sample situation with students. Key teaching activities and tools are listed below (with links where possible):
- Students in class each take a random sample from the same population (bag of data cards each) of size n (usually about 30). After a dot plot, box plot and calculating summary statistics (usually median), they then use their sample to answer the investigative (summary) question – for example “What is the median height of doozers from Fraggle Rock?”. Students notice that everyone’s samples are similar but different – they SEE sampling variation. Collect summary statistics (usually lower quartile, median, upper quartile) from each sample into a class graph. Students can SEE sampling variation in the summary statistics. Links for these activities can be found here (Yr10 Karakare College population), here (Yr12, Kiwis population), here (Yr13, Pugs-in-Costumes-on-the-internet populations or Doozers).
These are some samples from Karekare College – our next step would be to create box plots above the dot plots.
- Chris Wild has animations here which track the median and middle 50% for repeated samples, reinforcing what you have done by hand.
- iNZight had dynamic plots collecting the distribution in means or medians from repeated sampling
- Finally (and most often!) I do a lot of hand-waving…. firstly the simple situation of one sample where my hand is capturing sample median from repeated sampling (that is – its tracking the equivalent of the blue line in Wild’s animations above)
The impact of sample size (NZ Curriculum Level 6 (Year 11-ish) onwards)…
The activities above can all be adapted to compare sampling variation with different sample sizes. This year I also had students using the Kiwis population in Excel, combined with a random number to take larger samples. The excel file is here if you want it (full lesson details are here near the bottom of the page), and the pictures of my class collection of repeated samples of different sizes are pictured below. In this picture, the middle 50% (box) is shown, with the green section being from the lower quartile to the median, the median is in pink and the blue section is from the median to the upper quartile. This clearly shows that sampling variation gets less when you have larger sample sizes. Please note that there are 700 kiwis in our population so when you are sampling 400 it is a large proportion of the population.
Hand-waving sampling variation with a change in sample size:
- Take your left hand, have it track the change in sample medians for repeated sampling of size 30. Got it? Okay, now…
- Take your right hand (don’t stop that left hand!), and have it track the change in sample medians for repeated sampling of size 300…
- Okay – you’re laughing (or your students are) as it’s quite tricky to have your hands waving differently but… the point is (really easily, visually, laughably) reinforced that with larger samples the sampling variation is reduced.