By early 2023, large language models (LLMs) were taking the world by storm. Arguably, ChatGPT led the revolution. The interactive chatbot allows users to make comments, ask questions, make requests, or enter into dialogue with the computer program. It is a kind of generative AI, which means that after training on enormous stores of data, it can produce something new and reads fairly convincingly — and eerily — as though it were created by a human.
Despite its ability to mimic human verbiage, ChatGPT was trained to do a straightforward job: use probability and training data to predict the next text that follows a sequence of words. That ability could make it useful for people who work with text, says computer scientist Mark Finlayson of Florida International University. “It’s very good at generating generic, middle school-level English, and that’s a good starting point for 80 percent of what people write in their day-to-day lives,” he says.
By the end of January, barely two months after its online debut, ChatGPT had racked up 100 million users, according to analysts at the financial firm UBS. That caught everyone — including OpenAI, the company behind the technology — by surprise.
“We really underestimated the impact it would have,” says Andrew Mayne, who helped test Chat-GPT and develop applications at OpenAI. For comparison, that user milestone was crossed by the social media platform TikTok after nine months, and Twitter (now X) required more than five years. (ChatGPT broke the record, but it was broken in turn by Threads, a social media program to rival Twitter, which in July gained 100 million users in only five days.)
What were those users doing with this new AI tool? Anything they could think of. Unethical college students wrote papers, while aspiring creators solicited the program to generate song lyrics, poems, recipes, short stories, and fanfiction.
ChatGPT also demonstrated unexpected talents, such as solving math problems (though not always correctly), writing computer code, and other abilities that seemed to have little to do with its training data. “We started to see it doing things that we did not explicitly train it to do,” says Mayne.
ChatGPT produces text, but other generative AI tools produce music, images, videos, or other media — the source of much misinformation, mischief, and trouble. AI is not always trustworthy; these programs can produce nonsensical or factually inaccurate statements (or images) that are nonetheless packaged in a convincing way. They can also amplify inequalities and societal or racial biases from the training data, or generate art or music that imitates a human creator (and may be shared, wittingly or unwittingly, by tens of millions of people online).
Opportunity for Error
Last February, Google unveiled its own chatbot, called Bard, but in its first public demonstration it made an embarrassing factual mistake. (It reported that the James Webb Space Telescope had captured the first image of an exoplanet, but that feat was achieved in 2004 by the Very Large Telescope.) In April, an earworm called “Heart on My Sleeve” began to circulate online, reported to be a collaboration between the musicians Drake and The Weeknd. Except that it wasn’t: “Heart on My Sleeve” was a musical deepfake. An anonymous creator trained a generative AI program to convincingly mimic the singers.
With so many applications — and opportunities for error — the rise of these tools has ignited interest, debate, anxiety and excitement.
“For the first time you can really talk to a computer,” says Bill Marcellino, a sociolinguist and behavioral scientist at the RAND Corporation, the research organization and think tank based in Santa Monica, California. “That’s radical.”
The quest to build an AI system dates back at least to the 1960s and a system called ELIZA, designed by Joseph Weizenbaum, a computer science pioneer at MIT. It was a kind of mechanical therapist that used keywords from a user’s input to generate responses, but it gave the appearance of carrying on an informal conversation.
It was the type of program — maybe the first of its kind — that could even attempt the Turing Test. Named for Alan Turing, the computer science pioneer, the test is a way to gauge the capabilities of an AI system. If the person conversing with the system can’t tell if it’s human or machine, then the system passes. ELIZA didn’t. Experts disagree on whether more recent AI tools like ChatGPT pass the test, or whether the Turing Test even remains a useful metric.
“LLMs have the ability to do something that looks like reasoning,” Marcellino says. But that’s not the same as human thinking. “I don’t want to make claims about actual intelligence.”
Importantly, ELIZA was rule-based, which means it responded mechanically to the user’s input. The newer LLMs don’t follow a set of rules, but instead analyze prior patterns to offer probabilities for how to predict new text (or images, or music).
Generative AI programs are typically based on artificial neural networks, which analyze data and find connections among inputs (which words often appear together, for example). They took a major leap forward in 2017, when Google unveiled the transformer, a kind of neural network approach that can quickly identify patterns and connections between individual inputs. For example, it looks for ways that every word in a text connects to every other word within a certain input length.
The transformer revolutionized language models. It enabled them to quickly find connections between words in enormous datasets. ChatGPT was originally powered by GPT-3.5, a LLM that trained on a dataset that included text from books, articles, and the internet, totaling an estimated 300 billion words. GPT-4, which was released in March, performed even better. (One metric: When ChatGPT took the Uniform Bar Exam, the knowledge test of those seeking a license to practice law, the model scored in the lowest 10 percent of scores. When GPT-4 took the test, it reached the top 10 percent.)
Potential Good
The field is continuing to evolve, says Marcellino. Researchers are looking for ways to build smaller, more nimble models that harness the potential of ChatGPT, applying the tool to medicine, the military, and more. He and his team, for example, have built a chatbot for use with the lingo of the U.S. Army.
The potential good that could come from generative AI programs is matched — and often overshadowed — by the increased risks, to everything from cybersecurity to copyright infringement, from identify theft to national security. Those threats already exist; the question is whether LLMs could increase their reach. More regulation could help: Health care researchers have called for increased government oversight, for example, to ensure that the use of LLMs does not cause harm and that it protects the privacy of patient data.
Finlayson predicts that these new tools will boost efficiency among workers, but doesn’t think they’ll obviate the need for humans — at least in most fields.
“ChatGPT has no sense of what’s right or wrong, what’s correct or factual. It’s not anchored to the real world,” he says. “It’s moving very quickly, and human ingenuity continues to be needed in very serious ways, to address these problems that have developed. I think we will rise to that challenge.”
This story was originally published in our January February 2024 issue. Click here to subscribe to read more stories like this one.