Online learning has become an integral part of modern student life with technological advancements granting student’s access to a huge amount of educational content. There have also been many recent efforts to personalize practice according to the needs and abilities of individual students. These developments provide an opportunity to reduce educational inequality, an especially relevant concern with the current importance of remote learning during COVID.
Many researchers have investigated how to design online learning systems so that they can incorporate cognitive science principles known to improve learning. Some of these critical principles include spacing practice over time (rather than cramming) and self-testing. Additionally, it has been found that imposing some difficulty can benefit learning. But how much spacing? What content should be practiced next? How difficult should it be? Answers to these questions have been vague. A common answer in research and with popular learning systems (e.g., Duolingo) has been to have students practice whatever they are about to forget. In other words, high difficulty is encouraged. But even if we ignore the motivational effects of such a strategy, is it actually the most efficient approach?
Our research study, Optimizing practice scheduling requires quantitative tracking of individual item performance, showed how optimally efficient practice could be automatically scheduled for the student using a quantitative model of learning and a difficulty threshold. We hypothesized that practicing the hardest items would not be a universally optimal strategy for one simple reason: harder items are more likely to be answered incorrectly, which is frequently more time consuming due to reviewing corrective feedback.
To determine what difficulty was optimal, we developed a quantitative model of learning to track student learning that accounted for the effects of practice and spacing. We then simulated how much students would learn after the completion of practice sessions set at many different difficulty thresholds. For instance, how much would a student learn, if we had the student practice whatever content the model predicted they had an 80% chance of correctly answering (the difficulty threshold here being 80%)?
We simulated the outcomes of this approach with thousands of simulated students learning Japanese-English vocabulary at each difficulty level between 0 and 99%. What we found was counterintuitive, and unlike previous recommendations offered by researchers and learning technology companies (e.g., Duolingo). Introducing a small amount of difficulty (e.g., 90% probability) was better than a large amount (40%). Part of the reason why this worked was that answering correctly happened much faster, so it was more productive to practice many easy trials (and work up to harder content) than focus on the harder content first (and maybe not get to the easy items). Students learning-per-second was much better overall, when practice exams had a low difficulty threshold. Finally, we tested the predictions of the simulation with real student participants that practiced exams at several difficulty thresholds (including the optimal one) and verified our prediction that lower difficulty did indeed lead to superior memory recall on a final test that students completed a few days later.
A major takeaway from our study is that considering efficiency is vital to optimally designing educational technology. Students have limited time to study, and thus the total time it costs is a vital consideration. In fact, we found that practicing efficiently had an even larger effect on memory than spacing usually does! Practicing according to a specific difficulty threshold also answers the question: “What should the student practice next?”. Our paper also provided a general roadmap for how to implement this approach in future research and educational systems. There are 3 steps: 1) collect a dataset purely for fitting the learner model (that introduces a variety of practice contexts so the model is generalizable and properly estimates spacing effects), 2) simulate practice with a variety of difficulty thresholds, and finally 3) test how well the simulation predictions bare out with an experiment.
Critically, the optimal difficulty threshold will differ across learning contexts. Sometimes, corrective feedback may be especially beneficial for mastering a topic, and thus getting it wrong may be more efficient. Reflecting on why you got something wrong matters to differing degrees, depending on the topic (not so much for vocabulary learning, but quite a bit for Anatomy and Physiology). We are currently evaluating our approach with nursing students learning Anatomy and Physiology content and have indeed found that higher difficulty can be more efficient. We hope our work offers a roadmap for how to optimally schedule practice within online educational systems.
Please sign in or register for FREE
If you are a registered user on Neuroscience Community, please sign in
I Like where this research is heading Luke. Am I correct to summarise that learning through spacing, feedback and self-correction is indeed individually practiced and context is a critical factor to this? Would you still recommend the use of spaced learning tools like Quickstudy or Super Memo?
Hi Dan! Yes I believe your summarization is correct. I'll add that the time costs is a big part of why context is so important. Spaced testing with feedback is definitely effective, especially when personalized to the student. Context (e.g., the type of learning materials) is indeed a big factor for how that spacing and testing should be implemented. For instance, more complex content in which answering test questions is more time consuming (like Anatomy and Physiology) may mean that the relative speed advantage between correct and incorrect I describe is lessened and that practicing at higher difficulty is more efficient (as opposed to fairly low difficulty being more efficient with word pair learning). So spacing would still be good, but the amount spacing spacing that is optimal (and thus how much difficulty is imposed) would be different depending on context.
As for recommending things like Super Memo, they are still typically better than alternatives. People have a hard time scheduling their own practice efficiently, and SuperMemo helps do that for you. So I imagine it is still superior to a fixed schedule of practice (e.g., repeat every X minutes) or ad hoc methods a student may do for themselves (e.g., sort items into different piles a la Leitner method). I don't believe SuperMemo considers time cost differences between correct and incorrect answers however, so it is probably suboptimal. Please reach out if you would like to chat more!