In the introduction to Mladen Dolar’s A Voice and Nothing More he recounts the story of a very early speaking machine.22Mladen Dolar, A Voice and Nothing More (Cambridge: MIT Press, 2006), 7-9. die Sprech-Maschine was a mechanical device conceived by Wolfgang von Kempelen in 1769, and eventually built and toured in 1783–84.33The “Kempelen” speaking machine, accessed September 28, https://artsandculture.google.com/exhibit/the-kempelen-speaking-machine-leibniz-association/4wIC7hLe64FKJA?hl=en.The machine was able to simulate a human-like voice speaking French, Italian, and Latin phrases. Kempelen was also the inventor of a chess-playing automaton, later known as the Mechanical Turk. The Mechanical Turk44The Mechanical Turk is also, unironically, the name of Amazon’s crowd-sourcing labour model, where real people are paid small amounts to do menial tasks as requested by service users. One of these tasks is classification and verification of machine learning datasets and model output. Kate Crawford, Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence (New Haven: Yale University Press, 2021), 63-69. was essentially a hoax—a small person inside the machine used a complex contraption of mirrors and controls to play the game of chess while remaining completely hidden. It appeared as a machine with human intelligence but was in fact a human. While die Sprech-Maschine appeared as a box with bellows but performed a genuine technical feat—simulated speech. During a performance tour for both machines, die Sprech-Maschine performed as the opening act for the Mechanical Turk—warming up the crowd for the sleight of hand. The obvious mechanical nature of the speaking machine provided plausibility for the chess-playing thinking machine, allowing the audience to immerse themselves in the illusion.55Dolar, A Voice and Nothing More, 7-9.
I imagine an audience rapt by the display, confused and amazed by these machines that so believably simulated human behaviour. I had a similar experience in the early 1990s during my first encounter with a speaking machine, not at a large exhibition but in my own basement.
As a young computer enthusiast, I wanted all the latest and most interesting hardware. I saved up for a Sound Blaster sound card, which promised realistic audio (a thing home computers up to that point didn’t really do). It also included a small demo program called Dr. Sbaitso, a virtual therapist. I could type questions or statements into its simple interface, and it would display and speak answers. I spent hours probing its “intelligence,” wondering what types of questions it could answer and how it sounded so human.
Dr. Sbaitso is one program in the long evolution of computational language.66Language models and interfaces go back to the mid-1960s with the famous ELIZA chatbot created by Joseph Weizenbaum. Dr. Sbaitso is modelled after ELIZA’s DOCTOR script, which imitates a “Rogerian therapist.” The history of ELIZA is well documented by Jeff Shrager on his ELIZAGEN website. ELIZAGEN, accessed September 3, 2021, https://sites.google.com/view/elizagen-org/about. Current advanced language models can write articles, answer questions, mimic voices, participate in real-time chat, generate stories and characters, and more. You’ve likely encountered them through your phone’s autocomplete suggestions, Facebook chatbots, social media posts, customer service portals, art and creative writing practices, and maybe in some places you don’t even realize.
When we encounter these generated texts and avatars, what are we actually reading and what are they saying? Is the encounter with a machinic other or something more mundane? Are contemporary machine learning language systems more like the Mechanical Turk, die Sprech-Maschine, or something in between?
As artist and writer Allison Parrish outlines in her talk about computer-generated text and poetry, a language model is a system that assigns probabilities to parts of language in order to predict what might come next after a given word or phrase.77Allison Parrish, "Language models can only write poetry," August 13, 2021, https://posts.decontextualize.com/language-models-poetry. In order to build this predictive statistical model the systems derive probabilities from an existing dataset, a process referred to as training. Training starts with the analysis of some text from which it extracts tokens (letters, words, syllables, phrases, sentences, or even longer segments). As the program iterates through sections of the corpus, the model is essentially asking itself what patterns of letters or words appear after other patterns and at what probability? The resulting probability weights are stored, and then used to generate new text.
The robustness of this type of model increases with the size of the training corpus.88Parrish, “Language models can only write poetry.”The more data you have, the more tokens and probabilities are derived, and the more nuanced the model becomes. This presents a problem of scale—it takes a lot of computational power to derive probabilities from a very, very large dataset of text. Enter machine learning.
GPT-2 and GPT-3, developed by OpenAI, are two of the most commonly used language models today. They produce reliably high-quality text output, sometimes hard to distinguish from human writing. GPT-3, the newest and most advanced model, is trained on 499 billion tokens.99Chuan Li, "OpenAI's GPT-3 Language Model: A Technical Overview," June 3, 2020, https://lambdalabs.com/blog/demystifying-gpt-3. In order to compile that much written language, OpenAI scraped the public web, full texts from books, Wikipedia, and more.1010Li, “OpenAI's GPT-3 Language Model: A Technical Overview.”
Since GPT-2 and 3 are trained on a massive collection of human written text, we encounter ourselves and our culture reprocessed and modelled through its output. What appears more like a technical wonder, a speaking machine, might actually be a Mechanical Turk with all of us squeezed inside, unaware of the controls—appearing to the audience of ourselves as a thinking machine.
In Dolar’s description, die Sprech-Maschine paves the way for belief in the Mechanical Turk. People were more willing to believe that a thinking machine was possible because a speaking machine was possible. In some ways this same teleology is at play as we encounter increasingly sophisticated machine learning models. When we read something written by GPT-3 we are reading our own words processed, reconfigured, and reflected back to us. Although there is no human secretly typing the exact generated text behind the scenes, there is also no magic at work. Humans wrote the training text, humans selected the data that makes up the corpus, and humans verified the outputs of the model as it learned. The model doesn’t produce new ideas, words, or meanings. At best, language-generating systems produce idiosyncrasies that a human projects meaning onto through interpretation and aesthetic taste—the meaning emerges solely in the mind of the reader.
Like the Mechanical Turk, what appears as magic is informed by a cultural belief that machines can think—a belief reinforced by popular media, entertainment, and the very real technical sophistication of the systems at play. As audiences did while watching Kempelen’s performance, we see thinking in machine learning because we’ve been primed to believe it. The true inner workings are mystified and hidden behind intellectual property laws, dense mathematics, and industry secrecy that obscure and abstract human labour, data, investments, and biases.1111Crawford, Atlas of AI, 211-13.
As Herbert Marcuse wrote, “people recognize themselves in their commodities; they find their soul in their automobile, hi-fi set, split-level home, kitchen equipment,”1212Herbert Marcuse, One-dimensional man: Studies in the ideology of advanced industrial society (Boston: Beacon Press, 1991), 11. and now also in the algorithms and complex models that consume our culture and refract it back to us. A sense of recognition reinforces the notion that there must be some sort of intelligence and meaning residing in the system, that it is somehow “smart.” The type of pattern matching we see in language-generating models can appear supernatural. In reality, rather than a single hidden person playing chess, this is millions or billions of people all at once mediated through data collection, analysis, and complex modelling. We believe it partly because we see ourselves in it.
Language has the power to move, to incite, to enact, to create shared realities. Poetry, stories, essays, news, tweets—these are all places where we collectively create the world. Encountering this eerie similarity in AI-generated texts results in the feeling of meeting an other—it seems human, it uses our languages and our ideas, our worlds. But it is not some other form of intelligence, it is just us. This is both magical and mundane.
See Connections ⤴