Chat GPT and Student Writing: Some Practical Reflections

January 23, 2023

computer-7718768_1280 — Image by Alexandra_Koch from Pixabay

ChatGPT: What it is, how it works, the challenge it presents

Developed by OpenAI, ChatGPT is an AI text generator that uses a large language model (LLM) to create responses to queries. In many ways it is like your phone’s autocomplete function—when you type a sequence of words into your phone, the autocomplete makes a statistical guess, based on its existing database, of what word should come next. ChatGPT, put very simply, is like an autocomplete taken to a higher level: put in a prompt such as “who invented radar?” or “write a funny limerick about George Washington” and it will generate a complete response to the prompt.

ChatGPT’s ability to do this is the result of both human involvement and AI self-critique. First, people created two-way dialogues where one person provided prompts, such as a question or request, and the other responded in an appropriate way. These model conversations were incorporated into an existing dataset from a prior text generator, GPT 3.5. From this initial dataset, ChatGPT was asked to produce multiple responses to given prompts, which were then rated by a human from best to worst. The resulting information was turned into a reward model within ChatGPT, which then uses that model to rank possible responses to any given prompt. As a result of this development process, ChatGPT can critique and refine its own responses to prompts based on inferences from how humans had composed and rated responses to various queries, to the point where it can compose and evaluate its own responses independently.

Even at its first release, ChatGPT proved remarkably effective at developing plausible-sounding responses to countless prompts. It could generate poetry, write jokes, and compose complete student essays. This last ability is the point of concern for educators in general, including philosophy instructors. If ChatGPT can produce a unique, untraceable (ChatGPT’s responses are unique and do not exist outside of it), passable philosophy essay in seconds, then cheating becomes incredibly easy.

Does it work? The best way to tell is to try ChatGPT yourself, at https://openai.com/blog/chatgpt/. (Note that the disclaimers indicate your inputs themselves will be used to improve ChatGPT.) As an example, here is its response to a prompt I gave it, one found in countless philosophy courses:

Prompt: “Explain Descartes’ three arguments for skepticism”

ChatGPT’s response, generated in under one minute: “René Descartes is considered the father of modern Western philosophy and is best known for his formulation of the “Cogito, ergo sum” or “I think, therefore I am” argument. He also developed three arguments for skepticism, which are:

The Dream Argument: Descartes argues that there is no way to tell the difference between waking experience and dream experience, and therefore we cannot trust our senses to give us accurate information about the world.

The Evil Demon Argument: Descartes suggests that it is possible that an all-powerful deceiver is manipulating our perceptions and beliefs, making it impossible for us to trust any of our knowledge.

The Argument from Doubt: Descartes argues that we can only be certain of things that we can clearly and distinctly perceive, and that since we can doubt the accuracy of our senses and the existence of external objects, we can only be certain of our own existence as a thinking being.

These arguments are not meant to prove that knowledge is impossible, but rather to show that our knowledge is uncertain and that we must be cautious in accepting things as true. Descartes believed that by subjecting all of our beliefs to rigorous scrutiny, we could arrive at a foundation of indubitable truths upon which we could build our knowledge.”

The response isn’t perfect by any means, but it certainly seems enough for a passing grade in an introductory philosophy class!

The concern, then, is obvious. And, to make a point I will repeat later: ChatGPT is young. It will get better; it is set up to continually improve as it processes more data. What, then, is the grader of philosophy essays to do?

In the short time that ChatGPT has been public, a wide range of suggestions for how to work with or around it have been proposed. The remainder of this post will sort these responses into general categories and explore each. Since ChatGPT is so new, much of what is said here may need to be modified in the future. That said, much can be said even at this early stage.

First response: ChatGPT cannot do X, so make assignments that require X

This response relies upon the fact that ChatGPT has limitations, sometimes severe. For instance, OpenAI itself acknowledges that ChatGPT sometimes says things that are utterly false, or generates citations that do not exist. As such, perhaps requiring certain factual information in student essays will force them to look beyond ChatGPT. Others use materials specific to the class, assign esoteric materials, require prompts to have a personal element, and so on.

While one might respond to each of these individual responses in turn, I believe a general response can be made to almost all of them: should we say, “ChatGPT cannot do X,” or, “ChatGPT cannot do X yet”? My response echoes Alan Turing in “Computing Machinery and Intelligence,” where he discusses the argument that computers cannot think because they lack some specific capacity or another. After considering many such capacities, he offers a general response that is worth quoting:

“I believe [these responses] are mostly founded on the principle of scientific induction. A man has seen thousands of machines in his lifetime. From what he sees of them he draws a number of general conclusions . . . . Naturally he concludes that these are necessary properties of machines in general. Many of these limitations are associated with the very small storage capacity of most machines.”

In short, we need to stop looking at where technology has been, and look at where it’s going. We have only known ChatGPT for a few weeks, prior to which it was trained either on past data designed for an earlier program, or a limited number of human trainers. To assume that it will not become more accurate, better mimic personal responses (which it can already do—just try it!), or learn more esoteric materials (which is just a matter of collecting more data), is to misunderstand how the model works. The result is a race against an AI that never sleeps, always gathering more information so as to produce more acceptable results. If there is anything ChatGPT cannot do, the default response should be, “When will it be able to?” Perhaps there are some things it will never do, but as a matter of general approach, this is a steeply uphill strategy. (And to head off a related response: yes, you can tell ChatGPT to meet specific formatting requirements like word count, headings, and so on.)

Second response: Call in the ChatGPT hounds

Another way to fight ChatGPT directly is to use programs designed to detect essays written by an AI like ChatGPT. Several options already exist. For instance, GPTZero (https://gptzero.me/) determined that the text on Descartes above was “most likely to be AI generated.” Will these solve the problem?

One clear advantage of these programs is that, in the race to outsmart ChatGPT, they do the racing for us—they can themselves improve over time through machine learning. They can also potentially be used efficiently: while one currently has to submit individual sections of text for analysis, versions that can analyze essays en masse would be easy to build. Institutions might even require student essays to go through bot checkers in a manner similar to Turnitin. (That said, Turnitin is itself controversial for various reasons, such as privacy and copyright issues.)

The most important question, of course, is how good these detectors are. Being even newer than ChatGPT, there is no rigorous data yet on false positives or negatives. Given that they seem to latch onto general patterns of text rather than content, they seem likely to be effective so long as ChatGPT follows consistent patterns in its text generation. However, Turing’s point also applies here: ChatGPT will get better with time, and that could include its responses becoming more “natural” and less formulaic in the ways these programs seek out. In theory, the snoopers can get better at detection as well. The result is an AI cat-and-mouse game. This is a better result than the previous response since the snoopers will presumably be working as hard as ChatGPT. That the snoopers will have the upper hand is not obvious, however, especially if students take further steps such as changing some of the language to make the essays less bot-like. (Of course, that takes effort itself, and so one might wonder why the students don’t just write an essay at that point. Alas, some students go to great lengths in order to avoid going to moderate lengths for an assignment.)

Third response: The Calculator analogy

When pocket calculators first appeared, people suddenly no longer needed to memorize mnemonics or pull out the abacus to do arithmetic—the tools would spit out an answer for them. (And unlike ChatGPT, you could always trust the answer!) Over time, however, math pedagogy found ways to not merely work around, but work with calculators. Math is not merely that which is done on a calculator. In many math courses today, calculators are not merely used but required, considered time-saving tools that let students focus on the more challenging, abstract, and interesting mathematical work.

Put another way: if you can’t beat ‘em, join ‘em. Why not use ChatGPT itself in class? One frequently proposed example is to have ChatGPT generate a response to a prompt, then have students edit or critique that response, or even revise it. Students thus develop their writing and editing skills in a way that uses ChatGPT ethically rather than opposing it.

Students certainly get genuine value out of critiquing written responses, both their own and others. Further, editing and critique are central to what philosophers do, whether our own work or the work of others (such as in refereeing journal articles). However, I see at least two important issues with this analogy:

First, editing and critiquing existing words is not all students are supposed to do in a philosophy class. Students are also supposed to write and develop skills in writing, which can only be done by writing. To go back to the calculator analogy: Calculators are valuable, but if you want to teach someone arithmetic, they won’t learn it by just using a calculator. There are reasons to still teach without a calculator. First, students may not always have one handy (though admittedly, with cellphones they usually do). Second and more importantly, without that knowledge students won’t understand what the calculator is doing or how higher math is built on it. This limits their ability to use the calculator, understand why certain things don’t work, move back and forth between foundational and higher-order math, etc. Foundational knowledge is not trivial; after all, children are still expected to learn arithmetic!

Second, the analogy ultimately falls apart. We might think of the math done with calculators as simply undergirding some other, distinct higher-level mathematical activity that is what really interests us. When we reach the higher level, we don’t want to “waste time” doing the things a calculator does, so using it saves us time. This assumes that there is a relatively clear distinction between “calculator math” as one type of mathematical activity and “abstract math” as another.

This is not the case for philosophical writing. Even the most abstract philosophical writing and reasoning, to be done at all, must engage in the actual construction of words, sentences, and so on. There is no clear separation of “higher” and “lower” levels, such that one is clearly and easily automatable while the other can just be done on top of it. At what point does generating responses to language, through language, not become central anymore—something to be put aside in favor of other, higher-level activities? Students must be able to develop responses (written or verbal) in their introduction to philosophy classes as well as in senior seminars, dissertation defenses, and professional talks. Generating words in response to prompts is, simply put, what we do! Those skills cannot be put aside in favor of some other philosophical activity.

It can be granted that editing is an essential skill. But it’s not the only essential one, and arguably not the most important one. There’s no way around the basic capacity of (apologies to J. L. Austin) knowing how to do things with words.

Fourth response: Raise the standards

One response sometimes given when reading a ChatGPT response—take, as an example, the one above about Descartes—is that it’s just not good enough, so there’s no problem.

The first form of this response is, “I would never give that essay an A.” To this, the response is simple: No one said this was supposed to get an A. Most likely, the students using ChatGPT to cheat—or at least, many of them—are not aiming for an A. They are aiming to pass. Granted that it won’t earn you a doctorate: would the essay above, by the average standards for the courses we see most our students in, pass? I’d be surprised if it did not. Again, it may not be perfect, but to be frank it’s better than many other essays I’ve given passing grades to!

The second response would be frankly disappointed by my apparently sub-basement-level standards. It would say, “No such essay would pass in my class!” I don’t know who this instructor is, but either I envy those students’ average abilities, or they have my sympathies. The example essay above is, again, not perfect, but it’s hardly nonsense or garbage. To say that it should fail as an answer to the above prompt seems, to me at least, pretty unreasonable. How many other students would not pass under such standards? How high of a standard should be used? Such questions can reasonably be debated, but I think it’s clear one must have pretty high standards to say that essays like the one above should just be failed in most or all cases. Remember: most of the students that most philosophy instructors see are in Philosophy 101 or Introduction to Ethics, not advanced seminars or honors sections.

Fifth response: Change our teaching methods

More dramatic solutions generally involve shifting away from the classic out-of-class formal essay. One solution is to try different assignments or grading formats. New approaches, such as specifications grading, have flourished in recent years. One might employ “micro-assignments” of variable formats, incorporate drafting and revising in various ways, or require students to create multi-format projects. The possibilities are limited only by pedagogical ingenuity. One might even see multiple-choice exams rise in popularity—though note that you could put the questions into ChatGPT and get (possibly totally wrong) answers.

There is no simple blanket response; given the huge range of possibilities, the obvious answer is, “It depends on the approach and how it’s applied.” There are pros and cons to specifications grading, to extensive drafting, to multimedia projects, and so on. Many of these may have pedagogical benefits beyond being harder for ChatGPT to emulate. At the same time, many may have substantial drawbacks. They may be more work-intensive on the instructor’s part, not possible online, or lead to accessibility issues. What is required is a serious evaluation of different assignment types in philosophy, a worthy but huge endeavor. I can only say that I hope we as a profession will collectively take this endeavor on.

There is at least one general observation that can be made for many of the possibilities: Turing’s point, that we need to distinguish between what technology can do and what it will be able to do, remains. ChatGPT, for instance, is fully capable of producing a piece of writing and then offering a critique and revision of it. It is, again, unwise to bet against the flexibility of ChatGPT, not to mention the bots of the future.

Sixth response: The nuclear option: In-class work only

One counter to the point just made about ChatGPT’s ability to edit its own work is that revisions of drafts are often done as in-class activities. For obvious reasons, it’s harder to use ChatGPT during class itself, in particular if technology use is limited or prohibited during the activity. If, for instance, students must bring in hard copies and revise them by hand, they can’t just put the text into ChatGPT.

Push this point further: given how easily ChatGPT can be used for an assignment, why allow any opportunity for it to enter the picture? Here we come to the most extreme solution: Make every assignment in-class and with no technology outside of pen(cil) and paper. This need not be limited to the classic in-class written essay exam; one can include in-class participation, short assignments done at different times during class, group activities, and so on.

Shifting assignments to the classroom alone does not eliminate ChatGPT, since students may just open it on their phones or computers (though unless they have a printer handy, they’ll have to do a lot of copying down). For the greatest effectiveness, this approach would have to be combined with at least some restrictions on technology use, such as a ban on technology during exam periods. In short, the most drastic, but perhaps only fool-proof, way to get ChatGPT out of the picture is to get technology out of the picture.

If nothing else, the solution is effective—no tech, no ChatGPT! The costs are high, however—very high.

First, removing technology from the classroom, even on a limited basis, poses potential accessibility issues. At the least, there should be room for technology accommodations. Limitations on accommodations, in turn, can lead to issues such as exposing students who request accommodations, nervousness about requesting them, supporting a regime that requires proof of one’s need, and so on. In short, even limited technology restrictions can lead to ethical issues that must be thought through carefully.

Second, in giving up on all out-of-class writing, one is giving up a lot. There is certainly value in being able to think and write on the spot. But there is certainly also value to spending long periods of time thinking through, drafting, and reworking a sustained piece of writing on a topic that involves perusing a range of sources, chewing over one’s views, and working out the best possible answer framed in the best possible way. One could argue that the baby is being thrown out with the bathwater here.

There’s also the plain fact that this approach is impossible in online courses. In sum, this approach has clear advantages, but also clear and heavy costs to be aware of.

Where things stand

Ideally, one could find an assignment type that it would be impossible for ChatGPT, or any bot, to emulate, with no pedagogical downside. Then again, ideally, students wouldn’t be tempted by ChatGPT in the first place. Any solution is a matter of weighing costs and benefits.

My own views on this topic, which are still developing and open to change, are that our overall approach to assignments will have to evolve, probably dramatically. We are at the infancy of genuine chatbots—they will get better, much better, and soon. Trying to come up with crafty prompts, or placing our trust in bot checkers, is naive. We cannot take the craft of writing out of our courses, and simply raising the standards ignores the possibility that these prompts will produce better essays than students who ought to pass. In my more optimistic moments, I hope that this issue will lead us to continue developing new pedagogies and approaches that will not just navigate ChatGPT, but improve philosophy pedagogy as a whole. As that goes on, however, I also anticipate a substantial increase in in-class work across philosophy courses. Only time will tell, as the age of the AI content generators is still in its infancy.

Derek O'Connell

Derek O’Connell is Assistant to the Department Chair in the Department of Philosophy at Illinois State University, where his roles include instructor and academic advisor. His current interests center on philosophical pedagogy and philosophy of education.

1 COMMENT

H. E. Baber January 23, 2023 At 3:16 pm

I’m fed up with policing–and with this ongoing arms race: students find new ways to cheat, I find new ways to catch them, so they find other ways to cheat, etc. I changed my intro syllabus. Instead of a term paper they do at home, which is IMHO the best thing pedogogically, they’re going to write their term papers long-hand in class. Pity that all student get punished for what a minority do, but that’s life. I will not put myself out to use chatGPT in ‘creative’ ways or work harder because (some) students are jerks.

Loading...

Reply

Chat GPT and Student Writing: Some Practical Reflections

Derek O'Connell

1 COMMENT

LEAVE A REPLY Cancel reply

Topics

Posts You May Enjoy

Epistemic Refusal as a Form of Indigenous* Resistance and Respect

Dusty Slay and Zhuangzi’s Three in the Morning

History of American Philosophy, Robin M. Muller

How to Practice Embodied Pedagogy

Philosophy Club Vox: Nazarbayev University, Kazakhstan, Astana

APA Newsletters, Spring 2017 Edition – Part One

New Prize for Excellence in Philosophy Teaching

Chat GPT and Student Writing: Some Practical Reflections

Derek O'Connell

RELATED ARTICLES

1 COMMENT

LEAVE A REPLY Cancel reply

Topics

Posts You May Enjoy