1 Introduction
There are many ways that generative AI, such as ChatGPT or AI image generators, relates to social epistemology, not least of which is that generative AI has the potential to exacerbate the pollution and degradation of our information ecosystem. But, could generative AI systems also serve as social epistemic models, helping us better understand the causes of problematic social epistemic structures and evaluate proposed solutions to challenging social epistemic problems? In this post we want to pursue this possibility, to gesture at how some recent results in the field of generative AI can be leveraged in the area of social epistemology.
In what follows, we take a close look at the phenomena of model collapse. When generative AI systems consume their own outputs, they have been shown to rapidly deteriorate, quickly evolving to generate nonsensical and incoherent outputs. We want to suggest that these phenomena might inform our views about so-called epistemic bubbles and echo chambers and make possible bridges between social epistemology and computer science.
2 Epistemic Bubbles and Echo Chambers
Before getting to LLMs and model collapse, let’s start with the social-epistemological issue of information bubbles and echo chambers. Probably the most widely-read piece of philosophical work on these issues is Thi Nguyen’s “Echo Chambers and Epistemic Bubbles” or his popular piece “Escape the Echo Chamber” which covers similar ground. With the recognition that much other work has been done in this space, including much in response to Nguyen’s own, we’re going to simply summarize Nguyen’s work on this issue since it might be the most familiar to readers and because it serves our broader purposes nicely. Our overall goal isn’t to show that the phenomenon of model collapse decisively supports or undermines particular views about the nature, source, or best response to epistemic bubbles and echo chambers, but to showcase the potential for exploring these issues through developments in generative AI.
Nguyen’s work is an attempt to distinguish epistemic bubbles from echo chambers as distinct social epistemological phenomena with different causes, different vulnerabilities to dissolution, and carrying different implications in terms of individuals’ obligations to avoid, escape, or resist their respective hold on those inside of them.
A person enters or becomes trapped in an epistemic bubble with respect to some set of beliefs when their information ecosystem lacks sufficient sources of information relevant to those beliefs. Bubbles can arise due to a lack of care in seeking out information or through more subtle mechanisms. For example, algorithmic personalization on social media can simply omit sources of information because the algorithm predicts we won’t want to see it or engage with it (or because it doesn’t cause more platform engagement and ad clicks). An individual might be in an epistemic bubble with regard to their beliefs about, for example, which of many coffee chains is most popular worldwide. A person from Boston who has never left New England, only hangs out with friends who also grew up in Boston, etc. might, on the basis of their daily observations and social media feed, form the belief that Dunkin Donuts is the most popular coffee chain on the planet when it is (inexplicably!) Starbucks.
An echo chamber differs from an epistemic bubble. Those who are within an echo chamber don’t simply come to form false beliefs or have a mistaken set of confidences across beliefs. Instead, the very tools they use to update beliefs or form confidences are corrupted. People in echo chambers do not treat evidence that favors or disfavors their beliefs neutrally but, for example, treat counter-evidence as confirmatory evidence. If our young Bostonian were in an echo chamber, then pointing out that there are (again, inexplicably) nearly triple the number of Starbucks compared to Dunkins globally would not change their view but could be interpreted as evidence that people from the cult of Starbucks were trying defeat Dunkin once and for all.
How do people find themselves in echo chambers? It isn’t simply by omission of information. According to Nguyen, echo chambers often result from bad actors. These actors engage in a process like indoctrination, working to corrupt the epistemic structures of their targets. Other times, individuals create their own echo chambers while attempting to achieve epistemic coherence. That is to say, our young Bostonian might be indoctrinated into his beliefs by the cult of Dunkin or might simply adopt bad epistemic norms in response to observations that would upset his currently stable web of belief.
Because epistemic bubbles and echo chambers differ in how they interact with our belief-forming and updating tools, they also differ in their fragility. Epistemic bubbles are, on Nguyen’s framing, relatively fragile. While it is often very easy to remain inside our bubbles and only glance at our personalized news feeds without intentional intervention, the introduction of relevant information can serve the function of popping our epistemic bubbles in a straightforward way. The Bostonian in a bubble Googles global stats on the number of Starbucks vs Dunks and revises their beliefs. Obviously, this wouldn’t work for the Bostonian in an echo chamber. For Nguyen, this means that those in echo chambers might have diminished responsibility for escaping their epistemic situation compared to those in an epistemic bubble. It also means that something more drastic than adding information is required to help those stuck in echo chambers.
Nguyen proposes the ”social epistemic reboot” as a tool for responding to echo chambers. A social epistemic reboot doesn’t simply provide new information or ask an individual to reassess all their beliefs one by one from within their current epistemic ecosystem. Instead, it places them in a new social epistemic environment that disrupts their usual mechanisms for interacting with information flows. For example, a real-world social epistemic reboot for many people involves going away for college and having to form new bonds and learn to intake information under a new set of social norms, background beliefs, and information flows.
3 Large Language Models and Model Collapse
It is safe to assume that readers are familiar with ChatGPT and perhaps some close cousins like Gemini, Claude, or LLaMA. These are all instances of Large Language Models (LLMs). Large Language Models are themselves part of a broader family of models called ’generative pre-trained transformers’ (the ’GPT’ in ’ChatGPT’), or what is sometimes now just called ’generative AI.’ These models sit within an even broader family of machine learning models.
For those who are unfamiliar with how machine learning works, it can be useful to contrast machine learning with what you might think of as traditional methods of creating programs, algorithms, or models. Models and algorithms are sets of instructions that help transform some form of input into some form of output. Think of a recipe for a cake as a kind of algorithm where the inputs are the raw ingredients, methods, temperatures, etc., and the output is a cake. Lots of computational systems you interact with rely on algorithms or models that were created in the same way that we might create a recipe for a cake. Someone with the relevant expertise specifies in advance how any given input will be transformed into an output. When you’ve got your email program open, there is a recipe in the background that takes your key presses as inputs and shows the letters they represent as pixels on your screen as an output. You click the button on your mouse while hovering over ‘send’ and the output might be a visual indicator that you’ve clicked ‘send’ while at the same time that information is being transmitted as a message that will appear in your recipient’s inbox. We can imagine that a team of people has done all the work specifying how any given input in your email program corresponds to the particular outputs that they do.
In contrast, machine learning offloads the work of mapping inputs to outputs to a learning algorithm. To illustrate let’s contrast how machine learning might be used to build an algorithm for spam filtering. Instead of imagining someone trying to write rules for deciding whether an email is to be classified as legitimate or spam, the machine learning approach might start with a large set of labeled data with legitimate emails and spam emails labeled as such. The learning algorithm then outputs a model or set of instructions based on that training data that can be used to classify future emails. So, the learning algorithm might learn that random capitalization or grammatical patterns are highly correlated with spam and yield a model that classifies future emails with similar patterns as spam. This summary of machine learning is a bit rough and ready, skipping over important details. For example, in addition to a collection of training data used to train a model, there is also a set of data reserved for testing the model to avoid overfitting, not to mention manual tweaking of a model to increase its performance. We’ve also only described what is called supervised learning, which describes model creation using labeled data. In contrast, unsupervised learning, like that involved in training generative AI models, involves carrying out the learning process on unlabeled data. Still, we hope it conveys at an abstract level how machine learning models are developed.
In 2017, researchers at Google released a paper, “Attention Is All You Need,” outlining an advancement in machine learning that serves as the basis for contemporary generative AI. Again, roughly, they explained a technique for having so-called “attention headers” that can keep track of features in, for example, a string of text (or sequences of tokens that include letters, punctuation, etc.). One attention header might be tracking word order, another punctuation, etc. Application of these attention headers enabled the creation of models that could take plain text inputs and generate sentences that conform to our expectations of answers in terms of grammar, content, and the like. Essentially, generative AI is enabled by advancements in machine learning that allow for better recognition of relationships between elements within the training data and optimizing for outputs that meet our expectations about the grammar and content style we expect based on certain types of inputs (whether they be requests for text, images, or the like). Of course, there is a lot more going on under the hood, including additional machine learning that is going on to attempt to preclude certain kinds of outputs for social and ethical reasons.
With a rough and ready mental model of generative AI models, we can now turn to what is known as model collapse. Machine learning isn’t perfect, far from it. One challenge is that the data a system is trained on can contain data that leads the learning algorithm to learn a pattern that interferes with successful prediction, classification, or other output. We’ve already mentioned the problem of overfitting. For example, if we trained a model on labeled legitimate and spam data but all our spam data came from the email address ”spam@spam.com” the resulting model might simply deploy a classifier that identifies spam solely based on whether it comes from that address. The pattern it has identified doesn’t allow for good classification or prediction beyond the training set.
Model collapse is a phenomenon in which a generative model’s output can be degraded by training it on model-generated data (e.g. a new version of ChatGPT trained on the outputs of previous versions of ChatGPT). In some recently published work, computer scientists found that when models are successively trained on the output data of previous generative AI models, the offspring model’s outputs become worse and worse, and can eventually, over a relatively small number of generations, degrade completely and produce nonsensical outputs. In the case of a text transformer, the result might be a model that outputs gibberish, in the case of image models, the result might be a model that fails to output recognizable images.
Model collapse, like overfitting, results from identifying inappropriate patterns in data. The impact of these patterns on performance then gets compounded over successive training. You can think about it this way: A first language model is trained on a corpus of fully human-generated text which, let’s assume, contains no incoherent sentences or bad grammar or the like. This model, Alpha, learns patterns from the data that allow it to generate novel sentences that, for the most part, conform to our expectations for responses to our queries. In other words, it does a good job of generating acceptable text outputs. However, the statistical correlations that it has learned that enable this aren’t perfect. The model might output ungrammatical sentences occasionally or sentences that make no sense. Interestingly, it might output fully grammatical, coherent sentences that are based on bad, ungeneralizable patterns, essentially getting the right answer for the wrong reasons. In such cases, we wouldn’t want the model to rely on that pattern generally. We now train a second language model, Beta, except instead of using the corpora of fully human-generated text, we include as training data the outputs of Alpha, including incoherent, ungrammatical sentences, or those that seem good but rely on patterns that are not generalizable. Since those sentences are part of the corpora, our learning model treats the relevant sequences of text as good data for learning patterns from. Beta is the result of a learning process where there is a portion of data that corrupts the model. Over successive iterations, the corrupting influence becomes overwhelming. In a relatively small number of generations, we might arrive at model, Grue, which is completely incoherent. (As it happens, in the actual results cited above, a model 10-generations in produced outputs consisting almost completely of the expression ‘@-@ tailed jackrabbits.’)
One solution to model collapse is deceptively simple: minimize the amount of model-generated data in the training data! But, this is easier said than done. Model collapse isn’t just an idle worry that can be artificially induced. As more outputs from generative AI populate the internet, even if new models trained on such data don’t fully collapse, there is more and more cause of degradation in the base training materials for future models. And, model collapse need not be accidental! Perhaps you’ve heard about Nightshade, described by its creators as an “offense tool” for artists to deploy against AI generators. This tool generates data meant to reinforce bad patterns in images that are used to train image generators. This is a case of intentionally creating what are called “adversarial examples,” examples that take advantage of a model’s reliance on mistaken patterns, and feeding these to the model as if they were good examples worthy of using for training. One way to improve future models is by training on more and more data, but in order to avoid degradation or collapse, data scientists must find ways of maintaining a healthy proportion of human-generated to AI-generated data. As a larger proportion of available text, image, audio, or other data is generated by generative models, this becomes difficult, especially in the absence of reliable tools for detecting AI-generated data and in the presence of intentional attempts to degrade models.
4 Escape the Collapse?
By our lights, there are some striking similarities between the phenomena of echo chambers and model collapse. In both echo chambers and model collapse, the intake of certain kinds of information has a corrupting influence on the norms, heuristics, or inference patterns deployed when taking in new information that interferes with the capacity to generate appropriate responses to that information. To the extent that these phenomena are similar, studying model collapse might serve to inform and evaluate hypotheses about echo chambers and their relationship to other social epistemic phenomena like epistemic bubbles (and vice-versa) or inform solutions to the problem of echo chambers and other deleterious social epistemic phenomena on the basis of what we learn about how to prevent model collapse. To illustrate how thinking about model collapse could inform our thinking about social epistemic phenomena, let’s revisit some of Nguyen’s views about epistemic bubbles and echo chambers.
First, let’s reconsider Nguyen’s views about the causes of echo chambers. Here, a comparison with the phenomena of model collapse seems to speak to one mechanism by which Nguyen thinks individuals might fall into an echo chamber even absent bad actors intentionally cultivating them: coherence seeking. While echo chambers might often be the result of bad actors reinforcing bad epistemic norms, Nguyen notes that some individuals might enter echo chambers simply in virtue of trying to make new evidence cohere with strongly held existing beliefs. If someone is certain that Dunkin is the most popular coffee chain, then any contrary evidence must be reinterpreted, and that might require adopting an epistemic heuristic by which the source of that information is rejected as not being reliable or seen as part of a conspiracy. The mechanism of corruption in model collapse is structurally very similar. The learning algorithm takes the training data as ground truth, a source of information that it must build its patterns of inference or classification around. The resulting patterns of inference are, internally, as good as the patterns in a learning model that isn’t undergoing corruption. It is the data and treating it as sacrosanct that is the problem.
Of course, the ways in which norms, heuristics, or inference patterns are corrupted likely differ between echo chambers and collapsed models. An echo chamber is defined by the way in which our epistemic patterns are corrupted: we come to discount counter-evidence and allow it instead to reinforce existing beliefs and the epistemic norms of the echo chamber itself. Still, the analogy with model collapse supports the idea that a polluted information ecosystem can be a significant source of epistemic corruption. Perhaps what’s distinctive about echo chambers is the role bad actors play in shaping the form of that corruption.
Perhaps more importantly, to the extent that the mechanism of human epistemic norm corruption and model corruption are similar, we can perhaps learn something about how much pollution an information ecosystem can tolerate before human epistemic norms are corrupted. Generative models don’t collapse immediately, and so perhaps as we learn more about the proportion and structure of good data to bad and how that relates to model collapse, we can simultaneously learn something about what a healthy information ecosystem looks like for us or use it as a first step in developing metrics for measuring one dimension of information ecosystem health.
A second interaction between model collapse and social epistemic phenomena concerns how the former bears on how we conceive of epistemic bubbles. One worry about epistemic bubbles grounded in the analogy with model collapse is that they are ephemeral rather than fragile. By this, we mean that epistemic bubbles might be real but might quickly dissipate, becoming something else because the bubble itself causes a form of epistemic norm corruption. When model collapse is intentionally induced, the learning algorithm is subjected to information that is known to contain generated data. However, the worries about model collapse in the real world involve training sets that are not known to include generated data. Someone might train a model on some random subset of internet data and, as noted above, the nature of that data sample itself can have a corrupting influence on the resulting model. What if the information bubbles we happen to find ourselves in are functionally similar? If a non-representative information ecosystem, one that lacks important information that reinforces appropriate epistemic norms, heuristics, and inference patterns, can ultimately corrupt good epistemic practices, then falling into the wrong epistemic bubble could quickly lead to different epistemic trouble, trouble that is not immediately responsive to new information.
Perhaps the most promising and hopeful outcome of thinking about the relationship between model collapse and social epistemic phenomena like epistemic bubbles and echo chambers is that tools we develop to respond to model collapse might inform approaches to preventing or remediating problematic information ecosystems. For example, if (some) epistemic bubbles are ephemeral rather than fragile, we might use generative models as a test case for combating the resulting corruption. Perhaps we could study how much good information it takes to recover from various states of model collapse to gain some insights about how to remediate those that end up with corrupted epistemic norms in virtue of exposure to systematic information filtering. Another possible lesson: as we noted, one solution to model collapse is to ensure a high proportion of good (human-generated) data to bad (model-generated). This can be done either by finding larger and larger stores of good data or by finding ways to filter bad data from good. To the extent that model collapse mimics falling into an echo chamber, this might support the potential effectiveness of various policies of censorship or content labeling. Studying model collapse might also help validate Nguyen’s proposal about the need for a social epistemic reboot to combat echo chambers. If it is extremely difficult to remediate a collapsed model without starting fresh and retraining from scratch, then that might speak to the need for similar efforts in the case of humans. Or, it could be that as computer and data scientists uncover new ways to avoid model collapse or remediate collapsed models, we gain insights into possible interventions in human information ecosystems.
5 Conclusion
We don’t intend to suggest that models subject to model collapse are in echo chambers or that people are just large language models, nothing so crude and reductive. Nor, again, are we trying to defend a particular account of problematic epistemic phenomena. Instead, we want to gesture at an area of investigation that brings together phenomena in computer and data science and in human information environments. The analogy we draw between model collapse and echo chambers surely needs refinement and validation, but to the extent that we can refine and validate that analogy, learning models might become valuable testing grounds for otherwise difficult-to-test hypotheses about social epistemic phenomena, not to mention what computer and data scientists might learn from philosophers and social scientists studying our information ecosystem.