Designing the Grade as a Target to Strive For

October 30, 2024

At the end of the semester, I calculate each of my students’ final grades. Soon, prospective employers or admissions officers will consult these grades in deciding whether to offer my students opportunities. Transcript readers presumably interpret grades as evidence of students’ academic proficiency. Yet we as professors have a huge degree of discretion in designing the assessments and aggregation procedures that produce the grade we report, and our choices make a difference in our students’ final grades. Complicating things further, grading may aim at goals distinct from measuring proficiency: grades are routinely used to reward behaviors that tend to promote learning (even when those behaviors fail to yield fruit), and to reward growth, effort, and contributions to others’ learning. This uneasy combination of purposes raises questions about what normative ideals and constraints should guide the process of assigning grades.

I propose that we think of proficiency-measurement as an external constraint on grading schemes, rather than their central objective. Instead of designing the grade as a maximally accurate proficiency-o-meter, I propose that we think of the grading scheme as analogous to the score in a game: a tool to help structure students’ motivation. Drawing on Thi Nguyen’s conception of striving play, I call this way of approaching grades the striving framework for assessment. In Nguyen’s view, game players engage in striving play when they adopt the temporary ends defined by the game’s rules (say, accumulating tokens) for the sake of some other purpose that is served by pursuing those ends (fun, relaxation, or connection). Likewise, I argue, that our students may adopt the temporary ends defined by the grading scheme (accumulating points) for the sake of an external purpose (learning). Just as an effective game designer uses the scoring system to shape players’ motivation in ways that support the external purposes that players strive for, an effective course designer uses the class’s grading scheme to shape students’ motivations in a way that supports their learning. In both cases, the scoring system is successful if it creates an internal motivational structure that promotes participants’ attainment of their external goals.

The striving framework offers promise and peril. On the side of the promise, it can inspire novel assessment methods and grading structures, and it can help guide our choices among existing assessment designs. On the side of peril, it may seem to betray students’ and transcript-readers’ trust by harnessing the motivational power of the grade (which presupposes its accuracy as a proficiency measure) while undermining the accuracy of that signal. In what follows, I hope to persuade you that this approach to grading can be valuable, honest, and fair.

Promise: three sample assessments

I begin by describing three assignment designs that I use in my undergraduate philosophy courses. Each of them sacrifices some accuracy in measuring students’ proficiency in order to structure students’ motivation more effectively.

Specifications grading for a thesis-defense essay. In specifications grading, students get credit for an assignment only if they achieve a specified threshold of success; otherwise, they must try again. This assessment structure helps us avoid familiar pitfalls in grading a thesis-defense essay. When a thesis-defense essay is graded with a standard rubric, students may “game” the assignment and earn decent grades for low-quality work: by setting up an ineffective objection, for example, a student can make it easier to craft a compelling response. The alternative of holistic grading avoids this problem but sacrifices the clarity of expectations encoded in a rubric. Specifications grading combines the explicit, articulated criteria of the point-value rubric with rigorous holistic demands. It orients students’ motivations toward the combination of features that make an essay successful since they know they get credit only if they write an essay that succeeds in multiple respects simultaneously. Yet specifications grading sacrifices accuracy: a student who has fallen short of the assignment’s specifications (and therefore earns a zero and a chance to try again) has surely demonstrated some philosophical proficiency in his essay. Does the zero in the gradebook accurately measure his proficiency? No. But the striving framework counsels us not to let this kind of measurement-focused reasoning drive our design choices. Characteristically for the striving framework, specifications grading sacrifices retrospective accuracy for the sake of a better prospective motivational structure.

The group project. The challenges of assessing group work fairly are familiar: when all team members are given the same grade, some students are tempted to shirk, but if we assess each team member’s contributions separately, we undermine the team’s collaboration. The striving framework suggests designing group assessments that incentivize students to confront shirking behavior. In the design I use, teams are told in advance that they will be graded as a group unless one or more of their members fails to contribute, in which case team members can ask to be assessed individually. To avail themselves of that option, however, they must demonstrate a bona fide effort to hold their shirking teammates accountable. Once again, in this assessment structure, instructors commit to assigning grades that they know to be “inaccurate” in certain cases: if it becomes clear that one member of a team did no work but her teammates did not try to hold her accountable, members of the team are given an inaccurate group grade. The justification for committing to inaccurate grading in those cases is that this commitment prospectively motivates students to hold one another accountable, spurring them to practice this critical skill.

The collective action game. In applied ethics courses, simulations can help students connect what they are learning with their own practical reasoning. Using the striving framework, I developed a classroom activity that aims to do this: on several occasions throughout the semester, I give students a set number of bonus points, and I give them the option to add the points directly to their own grades or invest them in a pooled account. Pooled points are multiplied by two and divided evenly among all students in the class. Students are then given an opportunity to try to persuade their classmates to pool their points before any “investment decisions” are made.

The grade points awarded in this game do not even purport to measure students’ learning: students who adopt a self-serving strategy always gain the most points. However, students’ eagerness to speak during these activities, their deployment of philosophical concepts in those conversations, and their enthusiastic feedback on anonymous teaching evaluations all lead me to believe that this game is highly conducive to their learning.

The peril: a need for fairness guardrails

With the striving framework in view, we can consider the normative question it raises: is it fair to assign grades designed to motivate learning rather than measure achievement? After all, a college course is not a game. Whereas game points are disposable ends (Nguyen’s term for goals that matter to participants only within the game), grades are instrumentally valuable: they can open doors of opportunity. And the reason grades open doors is that transcript readers interpret grades as a measure of academic proficiency. In subordinating proficiency assessment to other goals, the striving framework might seem dishonest.

I find this concern compelling, but overstated. Seasoned instructors know that there is a great deal of arbitrariness in the design of grading schemes, even when they aim only to measure proficiency. Because grades are somewhat arbitrary, we should not treat their accuracy as sacrosanct, especially at the margins. Instead, we can adopt the more realistic expectation that grades should be “accurate enough.” Here, then, is a rough-and-ready proposal for operationalizing a threshold notion of accuracy. Grades should satisfy the following three parameters:

Parameter 1: Appropriately-prepared students must have a reasonable chance of success. A student who has satisfied the prerequisites for a given course and who invests the expected time and effort should have a reasonable chance of earning a good grade in that course (relative to the local academic culture’s standards). This parameter would not be met by a course with no listed prerequisites that actually presuppose a strong background in metaethics, for example. It is also not met in a course built using the striving framework where one’s path to success can be closed off by bad luck.

Parameter 2: Apart from academic aptitude and organizational skills, students’ unchosen characteristics should have a negligible impact on their grades. Responsible instructors do their best to ensure that characteristics like race, gender, class, and disability will not impact students’ grades, directly or indirectly (say, through racist, sexist, or ableist conduct of teammates on a group project). In making design choices that reduce the advantages conferred by unchosen characteristics, we improve the fairness of our grading schemes.

Parameter 3: Course grades should not grossly mislead transcript readers about students’ proficiency. Although traditional grades are a noisy measure of proficiency, large grade differences (say, between a C and an A) do represent meaningful differences in proficiency. The striving framework should not be permitted to destroy the informational value of these coarse-grained differences. To illustrate, consider this lightly fictionalized case, inspired by a true story: a professor offers an automatic A in the course to any student who produces a viral video on a philosophical topic. Even if this challenge motivates students to strive for the grand prize in a way that is conducive to learning, it also clearly abdicates any effort to use the grade as a coarse-grained signal of proficiency in the event that a student succeeds. Therefore, this type of assignment is ruled out.

Let’s take stock. The striving framework raises a concern about fairness and honesty because it harnesses grades’ motivational power while undermining a precondition of their motivational power (the expectation that grades measure proficiency). To mitigate the concern, I proposed that an assessment scheme must ensure that (1) appropriately-prepared students have a reasonable chance of success; (2) the impact of students’ unchosen characteristics is minimized; and (3) the informational value that grades do have is preserved. Provided the grading scheme remains within these parameters, neither students nor transcript readers have grounds to complain of dishonesty or unfairness. Within these parameters, I argue that a great deal of valuable experimentation is possible, including the three assessment structures I outlined above.

There is a difference between adding some noise to an already noisy signal and embracing gross mischaracterization. The parameters outlined here permit factors apart from our perceptions of students’ proficiency to affect final grades only on the margins, where differences in grades do not express any meaningful difference in students’ proficiency anyway. We should not be so protective of grades’ noisy signals that we foreclose pedagogically valuable approaches to assessment.

Ronni Gura Sadovsky

Dr. Ronni Gura Sadovsky is assistant professor of philosophy at Trinity University, where she teaches courses on contemporary issues in ethics, philosophy of law, and political philosophy. Her scholarship focuses on the way that groups use informal social norms to regulate behavior and to promote their conceptions of justice.

Designing the Grade as a Target to Strive For

Promise: three sample assessments

The peril: a need for fairness guardrails

Ronni Gura Sadovsky

LEAVE A REPLY Cancel reply

Philosophical Mastery and Conceptual Competence

Reflections on Making my Course Relevant for Students’ Lived Experience

Designing for the struggle

Promise: three sample assessments

The peril: a need for fairness guardrails

Ronni Gura Sadovsky

RELATED ARTICLESMORE FROM AUTHOR

“Philosophical Projects: Bringing Everyday Life into Intro to Philosophy,” Mateo Duque

Science and Culture in Latin America, Alejo Stark

Why Reflections on Teaching Philosophy Matter: A Call for Contributions

LEAVE A REPLY Cancel reply

Philosophical Mastery and Conceptual Competence

Reflections on Making my Course Relevant for Students’ Lived Experience

Designing for the struggle

RELATED ARTICLES MORE FROM AUTHOR