When it comes to artificial intelligence chatbots, bigger is usually better.
Big language models like ChatGPT and Bard, which generate native text of conversations, improve when given more data. Every day, bloggers take to the internet to explain how the latest advances — apps that encapsulate articles, AI-generated podcasts, fine-tuned models that can answer any question related to professional basketball — will “change everything”.
But building bigger and more capable AI requires the processing power few companies have, and there is growing concern that small groups, including Google, Meta, OpenAI, and Microsoft, will exercise near-total control over the technology.
Also, larger language models are more difficult to understand. They are often described as “black boxes”, even by the people who designed them, and prominent figures in the field have expressed discomfort that the aims of A.I. in the end it may not align with our goals. If bigger is better, it's also opaque and more exclusive.
In January, a group of young academics working in natural language processing — the branch of AI focused on linguistic understanding — issued a challenge to try to shift this paradigm. The group asked the team to create a functional language model using a data set that is less than one-ten-thousandth the size used by the most sophisticated big language models. A successful mini model will have nearly the same capabilities as a high-end model but be much smaller, more accessible, and more compatible with humans. This project is called the BabyLM Challenge.
“We are challenging people to think small and focus more on building efficient systems that more people can use,” said Aaron Mueller, computer scientist at Johns Hopkins University and organizer of BabyLM.
Alex Warstadt, computer scientist at ETH Zurich and another organizer of the project, adds, “This challenge poses a question of human language learning, not ‘How big can we make our models?' in the middle of a conversation.”
Large language models are neural networks designed to predict the next word in a given sentence or phrase. They were trained for this task using word sets collected from transcripts, websites, novels, and newspapers. The typical model makes guesses based on example phrases and then adjusts itself depending on how close it is to the correct answer.
By repeating this process over and over again, a model forms a map of how words relate to one another. In general, the more words a model trains, the better it becomes; each phrase provides the model with context, and more context translates into a more detailed impression of what each word means. GPT-3 OpenAI, released in 2020, trained with 200 billion words; Chinchilla DeepMind, released in 2022, is trained on a trillion.
For Ethan Wilcox, a linguist at ETH Zurich, the fact that something non-human can generate language presents an interesting opportunity: Could AI language models be used to study how humans learn language?
For example, nativism, an influential theory that can be traced back to the early work of Noam Chomsky, claims that humans learn language quickly and efficiently because they have an innate understanding of how language works. But language models learn language quickly too, and apparently without an innate understanding of how language works – so maybe nativism can't survive.
The challenge is that language learning models are very different from humans. Humans have a rich body, social life, and sensations. We can smell the mulch, feel the feather blades, hit the door, and taste the peppermint. From the start, we are faced with simple words and syntax that are often not represented in writing. So, Dr. Wilcox concluded that a computer that generates language after being trained on trillions of written words can only tell us so much about our own linguistic processes.
But if a language model is exposed only to words young humans encounter, it might interact with language in ways that can answer certain questions we have about our own abilities.
So, along with half a dozen colleagues, Dr. Wilcox, Dr. Mueller, and Dr. Warstadt put together the BabyLM Challenge, to try to push language models a little closer to human understanding. In January, they sent out a call for the team to train a language model with the same number of words a 13-year-old human encounters — roughly 100 million. Aspiring models will be tested on how well they produce and capture the nuances of language, and the winner will be announced.
Eva Portelance, a linguist at McGill University, encountered the challenge on the day of its announcement. His research straddles the often blurred lines between computer science and linguistics. The first forays into AI, in the 1950s, were driven by a desire to model human cognitive capacities on computers; the basic unit of information processing in AI is the “neuron,” and early language models in the 1980s and 90s were directly inspired by the human brain.
But as processors grew more powerful, and companies began working toward marketable products, computer scientists realized that it was often easier to train a language model on large amounts of data than to force it into a psychologically informed structure. As a result, Dr. Portelance said, “they give us human-like text, but there is no connection between us and how it functions.”
For scientists interested in understanding how the human mind works, these grand models offer limited insight. And because they require tremendous processing power, few researchers have access to it. “Only a small number of highly resourced industrial laboratories are capable of training models with billions of parameters in trillions of words,” Dr. said Wilcox.
“Or even load it,” Dr. Mueller added. “This makes research in the field feel a little less democratic these days.”
The BabyLM Challenge, says Dr. Portelance, can be seen as a step away from the arms race for a larger language model, and a step towards a more accessible and more intuitive AI.
The potential of such a research program has not been overlooked by the larger industrial laboratories. Sam Altman, chief executive of OpenAI, said recently that increasing the size of the language model will not yield the same improvements seen over the last few years. And companies like Google and Meta have also invested in research into more efficient language models, which are informed by human cognitive structures. After all, models that can generate language when trained with less data are also potentially scalable.
Whatever advantages a successful BabyLM may have, for those behind the challenge, the goals are more academic and abstract. Even the prizes undermine the practical. “Just pride,” said Dr. Wilcox.