Forgetting is necessary for learning, desirable difficulties and the need to dissociate learning and performance
How many questions should I give students to work on after my instruction? Should I group all the questions together or space them over time? Should I ‘block’ questions on the same topic together or should I mix them with questions from other topic areas? Is ‘over-learning’ an efficient strategy for boosting student outcomes? Should I always be using high-frequency formative assessment techniques to guide my instruction? Is it possible, and if so, how do you measure the learning that has happened in a lesson? What does best practice look like in an assessing-without-levels world? Within a progress context, are rapid and sustained mutually inclusive or exclusive?
We live in times of enforced, but relatively unguided change where schools are asking themselves questions about the fundamentals of pedagogy, learning and assessment. As I work on the evolution of my own department’s schemes of work (and the pedagogy I want these to promote) the above questions and more have been at the forefront of my thinking.
The beauty and intellectual intrigue of trying to understand learning stems from many sources: the difficulty to define it; the complexity and often non-intuitive strategies in creating conditions that nurture it; and the impossible, yet relentless focus on trying evaluate and optimise it quantitatively. Every teacher has their view and I’ve often found these differ more broadly in experienced colleagues than in those new to the profession. This is not a criticism, quite the opposite- it is the result of reflective thought after sufficient time and experience realising that ‘the fundamentals’ they learned during their training rest on boggy ground. In my own training, AFL was the non-negotiable, silver bullet to effective learning in the classroom. Now I’ve had a few years working with AFL, I’ve experienced how it can be a double-edged sword if the subtleties are not appreciated. I’ve seen many teachers (myself included in the early years) and government initiatives mistake students’ instantaneous performance for learning, through misunderstanding what AFL’s limitations are. Debate is healthy and useful, but the plural of anecdote is not data. Many middle and senior leaders are currently, in part because they have been challenged to do so by government policy, searching for self-evident teaching truths on which to rebuild their systems and pedagogy upon.
The profession has duly looked to the academic education research community for inspiration and authority on which to distil effective practice from the vast, turbulent-cylcical ocean of fashionable ideas and possibilities. The emergence of Tom Bennet’s ResearchEd community is a natural consequence of numerous teachers simultaneously dipping their toes into educational research findings and wanting to collaborate. I am one of those teachers and I write to share with you some significant CPD I have undertaken over the last year to try to gain insight into potential answers to the questions at the start of this article.
I became interested in the role of memory in maths education following a visit to King Solomon Academy where I met Kris Boulton. I learned about how this school, and its maths department under the leadership of Bruno Reddy, had designed a maths education curriculum that has subsequently resulted in their first cohort achieving 95% A*-C. The pedagogy they developed within their department was based on numerous academic sources relating to cognitive science. Once such source was the work of Robert Bjork, Distinguished Professor of Psychology at the University of California, Los Angeles. I have read his work myself over the last year and am at a point where I am beginning to be able to apply it in my own, and my department’s practice.
Bjork is known for his framework that conceptualises learning within the context of memorisation called The New Theory of Disuse. The work builds on research by Thorndike in the early twentieth century and the observation that learned information fades away over time. Bjork supersedes Thorndike’s Theory of Disuse with his new theory because of research that showed memories do not disappear completely, they instead only become inaccessible over time. Bjork’s New Theory of Disuse puts forward the notion that anything learned (a memory representation) can be thought of as having strengths based on two indices: storage strength and retrieval strength.
Storage strength– reflects how inter-associated a given representation is with other memory representations. It is the depth of learning. Once accumulated, storage strength is never lost- the information remains in memory as evidenced by recognition, priming and, especially, relearning.
Retrieval strength– current ease of access. It is how primed or active an item’s representation is as a consequence of recency or current cues. Information in memory, no matter how over-learned becomes inaccessible with a long enough period of disuse. Retrieval strength falls over time. Recall of the memory representation builds retrieval strength.
Put simply, when we learn something, the depth of understanding to which we have learned it will never recede. Deep learning stays deep. However, unless we regularly recall it, the learning will become more inaccessible as time goes on.
You could therefore plot any memory representation on a two-dimensional plane of storage vs retrieval strength. Examples of typical representations in the four quadrants of the plane would be:
Low-storage/ high-retrieval– What you had for lunch yesterday. You remember it because it is recent, but you’ll soon forget it because you haven’t linked it to other memory representations- it wasn’t that important to you.
Low-storage/ low-retrieval– What you had for lunch this day eight months ago. Same as above but now with low retrieval strength because time has elapsed. The memory has become almost inaccessible.
High-storage/ high-retrieval– The birthday dates of your children. They mean a lot to you (are connected to many other memories) and you recall them regularly.
High-storage/ low-retrieval– The names of people in your Year 1 primary school class. You have forgotten them because you haven’t recalled them recently, but shown a list you could pick them out (storage strength hasn’t been lost and retrieval strength can be quickly rebuilt with recall).
We obviously want students to have memory representations of their learned material in the high-storage/ high-retrieval quadrant. The argument for a mastery, rather than KS3/4 spiral-based curriculum fits within this framework because of the time it creates to do deep learning. Teach-once-deep (but regularly recall) rather than teach-twice-shallow (and not have time to do much recall) makes sense if storage strength is only ever cumulative. Single 5 year curricular are becoming more popular and Bruno Reddy was amongst the first well-known of in the UK to adopt the format.
So, The New Theory of Disuse has been discussed within a memory representation context, the next step is to consider the implications it has on understanding what real ‘learning and progress’ look like in a maths classroom.
Corresponding to storage vs retrieval strength, there is a time-honoured distinction in academic learning research stretching back to the early twentieth century between learning vs performance.
Performance (retrieval strength)– what we can see, observe and measure at the current time in the maths classroom. It’s what I can see in students’ books or on their mini-whiteboards when I’m asking them a question similar to what I’ve just taught them to answer.
Learning (storage strength)– what I have to try to infer rather than what I can measure. The question of whether learning has happened is: “have those relatively permanent changes happened that will support my performance in the long-term.” Learning judgements are focussed on both retention and transfer (to different applications and contexts).
There is a severe danger that current retrieval strength (performance) can be interpreted as storage strength (learning). Bjork’s and others’ work shows that current performance is often a very poor indicator of whether learning has happened. The dissociation between the two can result from things such as predictability or current cues that are there now but won’t be later. These can prop up performance and give the impression rapid learning has happened when it hasn’t. In a 2014 talk at Harvard University, Bjork cites relatively old research that has shown there can be considerable learning in the absence of performance. In more recent research they have shown the converse to be true- you can have considerable increases in performance with virtually no resulting learning.
It is my belief that many experienced teachers have an understanding of the difference between performance and learning and its importance. I would however like to raise the question of whether some contemporary systems and common practices fail to dissociate the two? These include: lesson observation, work sampling, AFL (if the limitations are not understood) and potentially, assessment without levels. However, before I elaborate further on this point, it is important to understand in more detail about the interaction between retrieval and storage strength (they are dependent variables) and also research that has observed the interaction between the two in real world classrooms.
Ebbinghaus was the first to publish on the effects of how forgetting helps learning because frequent recall builds retrieval strength (and storage strength, although Ebbinghaus didn’t distinguish between the two) which then slows the rate of forgetting. However, subsequent research has shown more sophisticatedly that storage strength and retrieval strength are interrelated. As you recall a memory representation, both the retrieval and the storage strengths increase. The degree to which they each increase is dependent upon their relative strengths at the time of recall. Increments in retrieval strength are: a decreasing function of the item’s current retrieval strength, but an increasing function of the item’s current storage strength. The deeper you have learned something previously, the faster you ‘relearn’ it. Conversely, the higher the current retrieval strength of a memory representation, the smaller the increments in storage strength (i.e. learning). Forgetting becomes necessary to reach a new level of learning. Something that is completely accessible (high retrieval strength) is completely unlearnable (cannot raise storage strength) in the sense of getting to another level of learning above that reached already. In other words, if it is memorised by rote alone through repetition that is too high-frequency, the ‘learning’ gains (storage strength increments) rapidly decrease in size. Therefore, conditions that reduce retrieval strength and build storage strength can enhance learning. Bjork refers to these as desirable difficulties because they prevent retrieval strength growing too quickly which would reduce learning gains. In summary, because forgetting enables learning, conditions of instruction that appear to create difficulties for the learner, slowing the rate of apparent learning, often optimise long-term retention and transfer, whereas conditions of instruction that make performance improve rapidly often fail to support long-term retention and transfer.
One such desirable difficulty is The Spacing Effect. Given a constant number of questions that you ask students to complete, they will have better long-term recall if you space the practice with time intervals (in the order of days) in between rather than mass practice where they would do all of the questions during one session. Mass practice is advantageous if you’re measuring performance (short-term) rather than long-term learning. If you do mass practice trials your students will appear to be learning rapidly in comparison with spaced practice, where performance (retrieval strength) will be lower. However, if you do spaced practice, the storage strength (learning) grows faster than if you do mass practice. Bjork cites research by Professor of Psychology at the University of South Florida, Douglas Rohrer that has demonstrated the long-term learning benefits of spacing over massing specifically within the context of maths education.
In a 2007 paper, The shuffling of mathematics problems improves learning, Rohrer & Taylor published results of two experiments, one of which looked at the performance of undergraduate (non-maths specialist) students who were subjected to lessons on a maths topic unfamiliar to them. Half of the students (spacers) did spaced practice (over two weeks) whilst the others (massers) did mass practice of the same number of questions as the spacers, but in a single session. The students all sat an assessment one week after their last practice session. The spacers outperformed the massers scoring an accuracy of 74% vs 49% on the assessment.
Rohrer goes further in his research to attempt to understand if there is an optimum time interval between spaced practice sessions to maximise long-term recall. In a 2008 paper, Spacing effects in learning- a temporal ridgeline of optimal retention, Rohrer et al publish results from experiments that have been synthesised into mathematical functions that give the recall success in terms of both the study gap and the test delay. The variables are dependent, i.e. there is no single study gap that produces optimal recall, it varies according to the test delay. However, certain generalisations can be made. Spacing over single-day intervals is too short- retrieval strength gets boosted too-high, too-fast and storage strength growth is quickly limited. If the test is a reasonable length of time in the future, say 200-300 days, the optimum spacing is approximately 20 days. Over shorter test delays, say 70 days, the optimum spacing is approximately 10 days. Fortnightly recall seems a practical conclusion for practitioners who operate within the real world to adopt.
Rohrer also tested the strategy of over-learning to see if it benefits long-term retention and transfer. Over-learning is defined as giving students significantly more practice questions to complete than other students. In short, whilst there were short-term performance gains, long-term learning was no better. I.e. if students do 5 questions correctly they will retain their learning just as well as if they do 30 similar questions. Over-learning is an ineffective use of time.
The implications for lesson design and curriculum planning are clear, but I will hold back from elaborating on them until I have discussed other desirable difficulties in due course.
However, before we go any further, it would now be appropriate to discuss the omni-present mantra of rapid and sustained progress within the context of the aforementioned research findings. Quite simply, rapid and sustained progress is an oxymoron! If you raise performance (retrieval strength) rapidly, you sacrifice possible learning (storage strength) and sustainability. If you perform highly on a topic too quickly, the automaticity you attain limits the possible long-term learning (storage strength) gains. The storage strength increment is a negative function of the current retrieval strength. Lessons that show the most progress (higher performance) result in sub-optimal retention and transferability (learning). To maximise long-term learning we need to limit performance (retrieval strength gains) in order to optimise storage strength gains. Lessons need to be of a high-challenge nature that prevent automaticity forming too early. This builds storage strength which then ensures subsequent recall events will see large gains in retrieval strength. Do grade 1 practitioners, whose classes show the most progress in lessons always get the best student outcomes when it comes to exam time? Do you know ‘solid grade 2’ practitioners whose classes do just as well, if not better in exams? I’ve been aware of this generically for a while now, but the realisation that retrieval and storage gains are negatively correlated does provide more clarity- the ‘grade 2′ teachers that don’t strive for rapid in-lesson progress have students with more sustained learning. There are no quick fixes to long-term learning- surprise, surprise. In fact, it’s worse than that- the findings of this research show that shortcuts are actually destructive to learning. Buy rapid performance today, pay with sustainability in 12 months’ time! The implications for lesson observations are considerable.
Ofsted no longer grade individual lessons. There is a movement within many schools at present to stop grading lesson observations. This seems logical given Bjork’s message that performance needs to be dissociated from learning. Using one to infer the other has been shown to be unreliable. By observing the performance gains of students in lessons, can we infer reliably the size of the learning gains? Can the observer accurately predict what the students are going to retain and be able to apply to a different context six months after the lesson? Having considered this for months, I cannot think of any way in which learning in any observed lesson can be reliably and accurately measured if we, as the research advises us, dissociate it from performance.
Ending the grading of lessons would potentially have advantages resulting from the focus of non-judgemental debrief conversations being based on strategies for maximising learning, rather than performance, the later of which is common in a good-to-outstanding-led lesson grading culture. In a no-grading, good-to-outstanding culture the conversations could instead be more centred on topics related to maximising learning (long-term retention and transfer), rather than ways to rapidly (and to the detriment of learning) raise performance over short time intervals? “What are you going to do going forwards to ensure that what was covered in today’s lesson becomes learned, i.e. there is long-term retention and transfer of the material?” In the very least, if progress is going to be graded, a ‘progress over time’ judgement is much more desirable than a ‘progress in a lesson’ grading.
Back to Bjork’s desirable difficulties. Using high-frequency, low-stakes testing specifically as learning events (even without feedback) have been shown to have significant positive effects on storage strength gains. The low-stakes part is critical to the effectiveness of this desirable difficulty. Study-test-test-test is more effective than study-study-study-test.
Another highly effective desirable difficulty that increases storage strength gains is contextual interference, one example of which is interleaving. Most practice sets of questions that we get students to work on during study are blocked into topics. We teach them a strategy and they answer a series of questions, all of which require the same strategy. Interleaving is when, instead of blocking, you give students a mixture of questions on different topics (that include and precede today’s lesson). Research findings into the effects of interleaving in maths education say that if my students are learning lots of things, I will maximise long-term retention and transfer (learning) if I arrange the instruction to maximise the possible interference between them, i.e. don’t do blocking. This is counter-intuitive but well-researched. Rohrer believes that one reason interleaving is effective is that it gives students experience in selecting a strategy for answering questions. In blocked sets of questions, the same strategy is repeatedly used and so students only gain experience in executing the already modelled strategy. Transfer in learning requires students to select strategies, before executing them. If all they ever do is blocked activities they never experience the need to select strategies until assessment time.
In a 2007 paper, Rohrer & Taylor had students learning to use and apply different maths formulae to solve problems. Some students did blocked practice, some did interleaved questions. When measured at the end of the lesson(s) students who did blocked practice outperformed the interleaved practice students with an accuracy of 89% vs 60%. However, and very importantly, when tested after a one week time delay the percentages were 20% vs 63% respectively! The interleaved practice students outperformed the blocked practice students 3:1 on a delayed assessment! The lessons that would have been judged to show the most rapid progress resulted in significantly less sustained learning. The lessons in which performance was lower resulted in triple the learning gains. In numerous other studies, Rohrer and others have replicated these findings and have shown why it is imperative we separate learning from performance and not use one to infer the other. If we are talking about the difference between good and outstanding lessons, we must take a long-term perspective, evaluating and discussing strategies and pedagogy that are learning-enhancing rather than performance-enhancing focussed.
Desirable difficulties are desirable because responding to them successfully engages processes that support storage strength gains. They limit gains in retrieval strength and prevent learning becoming automatic too early which would limit potential further increases in storage strength. However, they become undesirable difficulties if the learner is not equipped to respond to them successfully. For example, the student that has low-working memory will struggle if questions early in learning a new topic are interleaved and they are trying to simultaneously select between numerous potential strategies. In this case, if they are placed into cognitive over-load by interleaving, and they don’t have the required self-control and resilience, it would have negative effect. Optimal storage strength gains require sub-optimal performance during lessons and students (and teachers) need to be comfortable with this and remain motivated to get the benefit. This is a considerable challenge. I’ve always found showing students their progress to be a good way to boost their intrinsic motivation. Making lessons more challenging through the introduction of desirable difficulties, in the knowledge that performance of students, if we are maximising learning gains, will be lower, is a hard-sell to students who are motivated by seeing rapid performance gains.
Within the community of teachers currently reading educational research, learning styles has become a bit of an in-joke. They are repeatedly cited as an example of a seemingly intuitive idea that isn’t supported by research evidence. Bjork explains they are based on the meshing assumption that if learning is aligned to a particular personal format preference, it is easier to acquire and thus you will consequently accumulate more of it. The meshing assumption, that easier learning results in more learning, is false for reasons already discussed- limiting retrieval strength gains with desirable difficulties maximises storage strength (learning) gains. Learning styles are the opposite of a desirable difficulty. They are a quick-win for performance, and consequently a loss for learning.
If Bjork’s work is accepted as contextually relevant and applicable for secondary maths education, I believe the following implications seem logical deductions:
- Maximal learning (storage strength gains) requires the limitation of retrieval strength gains, particularly early on in instruction, through the use of desirable difficulties such as: interleaving, spacing and high-frequency/ low-stakes testing. Nearly all maths textbooks have massed, blocked question sets. This needs to change. As Bjork points out, the same questions could be used, it is just the ordering that needs to change. Students need to be experiencing mixed-topic question sets regularly, not just during assessments. This gives them the necessary experience in selecting strategies in addition to executing strategies.
- Rapid and sustained progress is an oxymoron- the two are inversely proportional. We should concern ourselves with understanding how to generate sustained progress and this should be the focus of pedagogical practice, discussions, interventions and performance management systems.
- We should dissociate learning and performance. This means not using performance measures to infer learning gains. Learning cannot be measured within the time-frame of a lesson. Work samples of students’ books does not tell you what the students have learned, only their in-lesson performance that day. Using AFL to assess in-lesson concepts currently being taught, only measures performance, not learning and so don’t use success on an exit ticket as proof of long-term retention and transfer (learning) of the material from that lesson. Using AFL to guide instruction within lessons is the right thing to do, but don’t use it to infer learning that has occurred within the current lesson. Assessment of learning rather than performance should feature a time delay from when the material was last covered and/or be contextually different, thus including a measure of transferability. In an assessing-without-levels world we must ensure we assess in a time-delayed way and be transfer-focussed if we are to avoid previous mistakes with formative assessment learning judgements being based on performance rather than learning (APP etc).
Finally, it should be reiterated that there is a natural tension between the motivation boost of students seeing high performance gains and the reality that slower ones lead to better learning. As Bjork puts it:
“If someone gave me a new course and said ‘do everything you know how to do to make students’ long-term memorisation of key concepts the best’, I could give that a big try; or if they said, ‘do everything you know how to do to get the highest course ratings’, I know something about that too; but what’s awful is that they would not be the same course. They would be quite different courses”.
There is clearly a need for students to understand the ideas of desirable difficulties, to some degree, if we need them to be comfortable with lower performance in harder lessons today, the benefits of which won’t pay off for many months or years ahead. Many cultural and systemic expectations about what effective lessons look like may need reconsidering too. Teaching that facilitates outstanding student learning is different to teaching that facilitates outstanding student performance.
For more info on Robert Bjork’s work see: Go Cognitive
For Doug Rohrer’s publications see here.