Dr. Benjamin Wiggins uses time-tested exam-creation methods to address language barriers, low-level thinking, poor study skills, and other test problems.
Manager of Instruction,University of Washington
PhD in Learning Sciences, MS in Molecular and Cellular Biology, BS in Biochemistry
For many years after I began teaching, no matter how much my ability progressed, traditional “surprise” exams were always my weakest link. My own education was punctuated with surprise exams, so I was immediately familiar with the stresses inherent to timed, high-pressure testing on challenges that I had to see for the first time and analyze immediately. Worldwide, most STEM education is assessed in this format: The instructor writes the exam, the students see the exam for the first time as they sign their names to it, and a combination of instructions, metadata, and question texts must be understood simultaneously.
In fact, my frequent early mistakes in the classroom were overshadowed by the rancor that came from giving and grading exams. Even as I progressed to some degree of mastery in teaching, it was still the interactions and communication around testing that were overrepresented in the negative comments on post-course evaluation … and the few sour grapes left in my memory after an otherwise great class. It seemed that some of my most inspired questions fell flat for at least a few students.
“Are exams destined for unhappiness?” I wondered. “Is this just part of the job?”
After much introspection, research, and experimentation, I’ve learned a better way, and it has completely changed my classroom. Below, I have laid out my problems with surprise exams, the set of solutions that I’ve found, the steps I now follow (which I call the Public Exam system), and the consistent (though preliminary) data that tells me that it seems to be working.
“Not a single part of this process is new or revolutionary, but it has profoundly changed my teaching experience and the feedback I get from students in every class. I hope that by describing it here, I am helping to pay forward my debt to the teachers (and many, many students) who helped me stumble uphill to a much better place.”
-Benjamin Wiggins, PhD
Course description: For students intending to take advanced courses in the biological sciences or enroll in preprofessional programs. Metabolism and energetics, structure and function of biomolecules, cell structure and function, animal development. Second course in a three-quarter series.
Identifying the problems with “surprise” exams
After working with more than 10,000 students, I classified the problems I was encountering into five specific categories that I wanted to solve, and I identified the questions each problem generated for me:
Problem 1: Ignoring language difficulties
Exams are hard to write but even harder to read, and small mistakes can mean massive divergences between what I want students to do and the task that they perceive. Without professional editors, my mistakes were passed on to student scores. Worryingly, these lost points seemed most often to hurt students whose first language was not English (including both American and international students) and students for whom the “language of college” was less prevalent in their upbringing.
Surprise exams also transform study guides, lecture outlines, and learning goals into documents that are puzzles for information extraction; if you can read them correctly, they are a guide to what will show up on the exam. There is a long history of research literature into the difficulty (if not impossibility) of separating content difficulty from language difficulty (for example, Abedi, 2006).
My question: How can I write a precise, accessible, and more readable exam?
Problem 2: Promoting lower-level thinking
Tasks in any field can be created along a spectrum of complexity (Anderson et al., 2001). Within the constraints of time, space, and gradability, it is far easier to write low-level questions for exams. But college students should be practicing and demonstrating high-level cognitive skills! The problems that keep students too excited to sleep at night are probably never multiple choice. My exams weren’t randomly sampling from molecular biology, but they were actively communicating the nature of science as a realm of memorization and rote, inflexible knowledge.
Beyond the hour of the exam itself, I was pushing students to professional practice of the regurgitation style of thinking; their study sessions were frequently based around flash cards, when what I really want is students thinking deeply about the world and creating solutions to interesting problems.
My question: Without changing the logistical constraints, how do I get more high-level thinking out of my exam?
Problem 3: Creating unproductive stress levels
While we know that humans perform best under some stress, higher levels of stress can be caustic both to learning and to students’ identity development and belonging in a field (Vogel & Schwabe, 2016). Overstressed students are well represented in the groups of students who drop out of not only a major but out of college studies altogether. Worse yet, these students are disproportionately from underrepresented groups. Especially with the generation of the most anxious college students ever, I want to use whatever methods I can to maintain rigor while decreasing the negative impacts of stress.
My question: How can I provide opportunities for students to perform under fair and useful amounts of stress?
Problem 4: Leaving students feeling disenfranchised by test scores
If students are going to maximize their academic potential, then they need to feel that the evaluations of their abilities are real and meaningful. Instead, my students felt disenfranchised—like the exam was a piece of paper that didn’t accurately reflect their understanding and didn’t match my teaching; it was just an exercise that we all had to muddle through. Some students may be able to brush this off as busywork, but what about students for whom college is a novel culture? What are we telling them? Is that the best I can do? What I really want is a challenge that students rise to meet. If I can build questions that students are proud to be able to answer, then the rest of my teaching will align better with building skills that will help them in their own careers and lives.
My question: How can I involve students in the examination process in a real and meaningful way?
Problem 5: Ignoring real-world constraints on exam development
As mentioned earlier, time and energy constraints are all too real for professionals in all fields, including education. While a perfect world would allow time (and extra pay) for excellent exam writing, the reality is that no change in this area would be possible if it were not sustainable. When I train young instructors, I know that any solution that requires extra time is likely to be a complete nonstarter. I knew I would have to pick and choose my approaches based on what I could fit into a realistic time allotment.
My question: How could I improve my exams on a week-to-week basis without increasing the time and energy required?
Seeking solutions—and a better system
With these five problems (and questions) in mind, I started looking for solutions.
First, I studied research literature in a wide range of fields, some of the best being Schwartz et al. 2016; Sawyer, 2005; Pellegrino et al., 2001; Darling-Hammond, 1994; and NRC, 2014. (See complete citations in “References” at the end of this article.)
Even more usefully, I started asking for ideas from expert teachers across the college and K–12 spectrum. In bits and pieces over the years, I cobbled together the Public Exam system.
This system addresses all five of the problems and questions outlined above: To improve readability and comprehension of the test questions, it involves the prerelease of exam material (for study purposes) and enlists students and peers as exam editors. To inspire higher-level studying and rigorous performance under productive amounts of stress, it presents thought challenges before the exam and shifts the perception of the exams from “trials” toward “conversations” about the subject matter. And I have noticed widespread improvements in the time and effort involved in exam development and grading, as well as students’ feelings about the accuracy of the assessments.
(While this particular article is not a journal-submitted peer-reviewed piece, my research into my classrooms and students is ethically guided by the university human subjects review board and approved under UW IRB#44438.)
Not a single part of this process is new or revolutionary, but it has profoundly changed my teaching experience and the feedback I get from students in every class. I hope that by describing it here, I am helping to pay forward my debt to the teachers (and many, many students) who helped me stumble uphill to a much better place.
An Overview of the Public Exam system
Here are the steps that I go through in my exam system today.
1. Review exam topics and learning objectives
To assign appropriate numbers of points to the topics on which the class has spent the most time and effort, I use a simple worksheet like this one.
Between iterations of the course, this serves as a check for myself to make sure my course isn’t drifting away from my primary learning goals.
“By making some of the topical coverage obvious, [the Public Exam] works far better than a traditional study guide to push students to focus on the [material] I consider most crucial.”
-Benjamin Wiggins, PhD
2. Write the Public Exam
The Public Exam is, in essence, an incomplete version of the exam. The Public Exam will be distributed to students in lieu of a traditional study guide or test-topic list. In this pre-exam, students will see roughly 60–70% of all the terms that will be on the actual exam, and the document will be in the same format as the actual exam. By making some of the topical coverage obvious, this document works far better than a study guide to push students to focus on the topics I consider most crucial.
To make it more challenging than a traditional fill-in-the-blank study guide, however, I will leave out key information in each section. The missing information will vary, but it typically leaves room for students to study into the format of the exam instead of simply trying to memorize everything. In other words, students have a much clearer sense for the types of tasks they will be asked to complete and can prepare for the content material more directly. For example, a student who knows they will be asked to fill in a blank in a correct sentence can practice with lots of sentences and lots of missing words that are relevant to the course; they do not need to waste time preparing diagrams, memorizing multiple-choice answers, or any of the other solutions to tasks that you don’t include on your exam.
Here are my approaches to creating Public Exam questions:
Provide data to interpret—but withhold the question. Offering formatted data, but not explaining how they will be asked to interpret it, pushes students to think creatively about multiple possible meanings of the data.
Provide a table/graph and a question—but withhold the data. Providing an example, minus the numbers that will populate the table/graph on the exam, pushes students to take one lens on the data and be able to read and respond to it in real time.
Provide a list of possible questions on a broad topic. When you want students to know a broad range of information but cannot test on all of it, this guides students to preparing for the entire list of questions while you only need to grade a single answer.
Create a scenario—but withhold the question. This allows students time and space to read deeply into a novel situation, case study, or diagram, which would not be as feasible (or even possible) within the constraints of a timed exam. Many courses already use this case method for class time: Student dig into the facts and figures of a case and get dirty with specific questions and answers. In this format, we are asking students to do most of the digging before the exam and then show that they understand the key concepts on paper.
Provide the full text of a challenging/creative question. Occasionally, I will give a question to the students in its entirety. This is typically reserved for questions requiring long thought, outside research, or responses about ethical or moral issues for which I want students to generate study conversations and come up with their own best answer. I expect and hope that students will pre-write answers to put their best, most thoughtful foot forward.
Provide multiple-choice questions, withholding various sections. I use three options:
- Provide the directions and topic—but withhold all question-and-answer text. Supplying the directions for the multiple-choice section of the actual test ahead of time will relieve some of the in-moment reading load during testing. It also guides students in studying by identifying the topic areas they should cover.
- Provide the question—but withhold some/all of the answers. This gives students an opportunity to think creatively in an open-ended way, while the actual exam question might be simpler to deduce from a short list.
- Provide the answers—but withhold some/all of the question. This challenges students to compare and contrast the various answers for a range of possible questions, leading them to understand each choice more deeply.
3. Enlist a peer editor (optional)
When possible, I send my draft of the Public Exam to a peer editor. Teaching assistants, former students, faculty colleagues, and staff can all be extremely helpful for content editing. Seeking peer editors outside of your teaching team can be a great way to spread knowledge of what is happening in your class throughout a larger department. We do not worry about exam secrecy or security, because this entire version will be available to students, with plenty of time. However, this step is not entirely necessary, as all of the students will have an opportunity to provide edits to the Public Exam in step 5, below.
4. Provide the Public Exam online
The Public Exam is made available to students roughly one week before the actual exam is scheduled. Students get to go through the stressful first few minutes of reading an important exam long before the timer starts and the assessment of their performance begins.
5. Assign a task using the Public Exam
To provide students with another opportunity to perform and demonstrate academic rigor in a lower-stress environment, I assign a short online task over the weekend prior to the actual exam. (This task is often included as part of an online Reading Quiz that they would already be doing as homework.) To receive full participation points, students peruse the Public Exam and answer just one of three prompts on the class’s learning management system website:
Option 1: Find an error or problem in the Public Exam. This can be a typo, a factual mistake, a formatting issue, or anything else you think could be improved. Note the error or problem clearly and give an example of an improvement.
Note that students must indicate the number of the question. In my experience, there will always be something that students find problematic, no matter how precisely I write an exam in my own language. If you don’t have this problem, then I am simply jealous!
Option 2: Write a new question for the Public Exam. Read the Public Exam and create a new question to fill in one of the “[withheld]” areas. Simple memorization-level questions are not as good as more thoughtful questions that require deeper understanding, so try to make your question difficult and/or combined with other topics. Include a correct answer.
Students must indicate the number of the question that they used as their starting point.
Option 3: Simply indicate that you do not want to do Option 1 or 2. We are giving this option so that you can choose for yourself whether or not this process is helpful. We think reading the Public Exam thoroughly is a good idea, but this is your decision to make and you’ll get credit either way.
For this option, students simply check a box to indicate their assent and are given full credit for Option 3. While doing any work here is completely optional, over the last four years I have had 32% of students choose Option 1, ~44% of students choose Option 2, and ~24% of students choose Option 3. Unsurprisingly, my initial data suggests that students who engage in Option 1 or 2 tend to perform better on exams, although this correlation is relatively weak and might be explained by less-related student habits.
6. Review students’ edits
When I return to my desk on Monday morning, I have ~750 anonymous responses in a sortable Google Forms spreadsheet. Students are extremely rigorous and motivated editors for their own exam, so I find that the grammar and writing of the document becomes far better more rapidly than it would have had I tried to edit it entirely on my own.
Students’ edit suggestions often give me wording choices that I would not have come up with myself. Additionally, many edits help to reduce “correct but difficult” writing, including gerunds, run-ons, and nested clauses that tend to trip up developing bilingual students.
I go through the edits for each question blissfully quickly, as student responses will group around problematic areas, and I don’t need to address every one of 20 comments that clearly identify one of my boneheaded mistakes.
Need to present more study material? If so, you might find it efficient to comb through student suggestions for Option 2 and provide that document as a study resource.
7. Provide a revised Public Exam
At least a few days before the actual exam, I provide a second (online) version of the Public Exam with meaningful changes highlighted in green, while small edits (that don’t change meaning) remain in black. This allows students to quickly scan for major changes. For students who had edit suggestions that were not resolved (often because they formed a minority viewpoint on the best wording for particular task), this alerts them that they have a few days to work with study partners and get clarity on the writing they didn’t understand.
At a wider scope, I think this method gives students a tangible example of the iterative, communal process that is fundamental to academic and problem-based work. Especially in the biological sciences, the idea of a lone figure in a lab having an aha! moment is now widely understood to be extremely unlikely at best. Science is done by working together for incremental improvement, not by sitting behind the “Professor” nametag and never making a mistake. Through this collaborative process, the exam becomes a bit more “human” to students—and they begin to feel it is more relevant to and reflective of their understanding and work.
8. Revisit and revise the actual exam
As students are studying by reviewing the second version of the Public Exam, I finish a version of the actual exam and send it to colleagues. (In my case, I am lucky to have talented graduate teaching assistants and staff to help.) I ask for their written answers as if they were students; this is far more likely to clarify my errors than simply asking for peer editing.
Based on those answers, I focus on adding information to the actual exam that was withheld from student editors in the Public Exam, and I make the difficult decisions needed to keep the exam to a manageable length.
Interestingly, I have found that graduate TAs generally complete exams (in introductory biology) in ~1.4x the amount of time that the general class will take. This discrepancy exists for several reasons: because TAs aren’t actively studying the material, because they haven’t pored over the Public Exam, and because they are double-checking answers that they might not have time to reread in a timed exam.
My goal is to write exams for which the first student will walk out of the door in half the given time, which typically results in 10–15% (or less) of the class still working up to the deadline.
9. Schedule review sessions (optional)
If possible, I like to schedule review sessions where we use some of the student-suggested questions as examples to work through as practice.
10. Administer the actual exam
Finally, I give the actual exam and let students perform to their best. The exam key is made available online immediately after the exam, so that motivated students can complete their own feedback cycles quickly for best learning.
Here are a few examples of pairs of Public Exams, along with the resulting final version.
Observed outcomes of the Public Exam system
To truly understand the effectiveness of this system, I would want to hire a small, grant-supported team of education researchers and dive in for two to three years. In the meantime, here is the cursory and anecdotal evidence that I can provide.
Personal feedback is more positive
The tenor of exam-related feedback has completely changed in my classes. Students used to tell me that my class was fun except for the exams, but now they come back years later and explain how they are working in jobs and being paid to do things that they see as very similar to what they did on Public Exams.
On a week-to-week basis, I hear students’ excitement about the problems they are thinking through, whereas before it was simply nervousness about knowing all of the topics. I’ve heard many students mention proudly their own role in creating the exam (especially if they see that I made an edit that they suggested). Also, the inevitable frustrations with scores have changed largely from conversations about how the exam was unfair to how they want to improve their own performance. It isn’t that exams in other courses are unfair, but in comparison, the Public Exam stands out as being relevant and understandable.
Finally (and I have no way to prove that this is related), I am being asked for letters of recommendation from far more students of color, female students, and students who are first-generation college students than ever before.
Course evaluations have improved
My anonymous online evaluations have changed, too. Keeping firmly in mind that student evaluations have proven to be biased in serious ways (Falkoff, 2018), the improvements across time for my own evaluations have been noticeable and consistent. Before, virtually every comment that focused on exams and exam grades was negative or critical in nature. Now, comments are virtually 50:50 positive/negative (as scored by an independent reader).
Out of more than 700 comments on my latest evaluations, students who mentioned the impact of the Public Exam system on their own studying or habits were twice as likely to be positive than negative. Again, these are initial numbers and should not be considered proof at the level of peer-reviewed research.
In comparison with publicly available parts of student evaluations for other courses at my university, I can say that the following trends have held true of students in my Public Exam classes. They report:
- A higher overall satisfaction with grading methods
- The highest Challenge and Engagement index
- A very low level of “wasted time” (when asked how much time was spent on the course and how much of that time was valuable)
Interestingly, this high percentage of “valuable time” exceeds that of other introductory-level courses and matches well with very small senior-level labs.
Exams take equal (or less!) time to create
Having worked with both new and established instructors for many years, I know that no change is feasible without being sustainable. I have charted my hours, and the following table represents the time expenditure for an average Public Exam compared to my previous “surprise” exams. This is not a completely fair comparison, as I have likely become more efficient overall through the years, but I hope it shows that the creation of Public Exams is not massively costlier in terms of time (or effort).
|Preparation Task||“Surprise” Exam||Public Exam|
|Initial organization||1–2 hours||1–2 hours|
|First draft: Writing||5–6 hours (full questions)||4–5 hours (incomplete questions)|
|First draft: Editing||3–4 hours||1–2 hours (using student edits)|
|Final draft: Writing||2–4 hours||2 hours (using peer suggestions)|
|Final draft: Editing||2–3 hours||1 hour|
|Total time:||13–19 hours||9–12 hours|
Grading is easier overall
Because students have more time to read directions, I can assign questions whose answers require some formatting on the part of the students to make them easier to grade.
Because students can see the format of the exam beforehand, they are able to become familiar with the desired format of answers, as well. As a result, when given the actual exam, students follow the meta-rules of answering questions much more often. This has simplified the creation of rubrics, reduced the number of penalty points given for noncompliance, and generally decreased graders’ effort and stress.
An open invitation to educators
In summation, the Public Exam system is a way to involve students in the exam process both in creation and preparation for their performance. It helps to deepen the level of thought in the classroom overall by aligning high-level questions with high-level work. I hope it makes my grading more equitable by lowering some noncontent barriers to success, and it puts students on a path to success with an appropriate level of stress in their brains and on my calendar.
I applaud anyone who has read this far; you must really care about your own teaching to sit through this long-winded description. My hope is that the information might shorten someone else’s path toward better assessment methods for their own classroom environment.
If you try some or all of these methods, I would love to hear about it. Good luck, and thank you for thinking deeply and creatively about assessing your next generation of students.
Abedi, J. (2006). “Language Issues in Item Development.” In S. M. Downing and T. M. Haladyna (eds.), Handbook of Test Development. Lawrence Erlbaum Associates Publishers, 2006, pp. 377–398.
Anderson, L.W., et al. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives (abridged edition). Longman, 2001.
Darling-Hammond, L. “Performance-Based Assessment and Educational Equity.” Harvard Educational Review, vol. 64, no. 1, 1994, p. 5.
Falkoff, M. “Why We Must Stop Relying on Student Ratings of Teaching.” The Chronicle of Higher Education, http://www.chronicle.com/article/Why-We-Must-Stop-Relying-on/243213. Accessed April 2018.
National Research Council (NRC). Developing Assessments for the Next Generation Science Standards. National Academies Press, 2014.
Pellegrino, J.W., Chudowsky, N., and Glaser, R. Knowing What Students Know: The Science and Design of Educational Assessment. National Academy Press, 2001.
Sawyer, R.K. (ed.). The Cambridge Handbook of the Learning Sciences. Cambridge University Press, 2005.
Schwartz, D.L., Tsang, J.M., and Blair, K.P. The ABCs of How We Learn: 26 Scientifically Proven Approaches, How They Work, and When to Use Them. WW Norton & Company, 2016.
Vogel, S. and Schwabe, L. (2016). “Learning and Memory Under Stress: Implications for the Classroom.” npj Science of Learning, 2016, vol. 1, p. 16011.