From search engines to voice assistants, computers are getting better at understanding what we mean. This is thanks to language processing programs that understand vast numbers of words without being explicitly told what they mean. Such programs infer meaning through statistics instead. New research reveals that this computational approach can assign different types of information to a single word, much like the human brain does.
A study published in the journal on April 14 natural human behavior, was co-led by Gabriel Grand, an electrical engineering and computer science graduate student at MIT’s Computer Science and Artificial Intelligence Laboratory, and Idan Blank PhD ’16, an assistant professor at the University of California, Los Angeles. The study was conducted by Ev Fedorenko, a cognitive neuroscientist at the McGovern Brain Institute who studies how the human brain uses and understands language, and his Francisco Pereira at the National Institute of Mental Health. was directed by Fedorenko says the wealth of knowledge her team was able to uncover within computational language models shows just how much we can learn about the world from language alone.
The research team began analyzing statistically-based language processing models in 2015, but the approach was new. Such models derive meaning by analyzing how often word pairs co-occur in text and using their relationships to assess the similarity of word meanings. For example, such a program might conclude that “bread” and “apple” are more similar to each other than “notebook”. This is because “bread” and “apple” are often near words like “eat” and “eat”. A “snack”, but a “notebook” is not.
These models were clearly good at measuring the overall similarity between words. depending on the nature of the “Humans can come up with all these different mental scales to organize their comprehension of words,” explains Grand, a former undergraduate researcher in the Fedorenko lab. For example, “Dolphins and crocodiles may be similar in size, but one is much more dangerous than the other,” he says.
Grand and Blank, then graduate students at the McGovern Institute, wanted to know if the models captured the same nuances. If so, how was the information organized?
To see how the information in such models layered on human language comprehension, the team first asked human volunteers to score words on various scales. The concepts that these words convey are big or small, safe or dangerous, wet or dry, and so on. Then, after mapping where people place different words along these scales, we examined whether language processing models do the same.
Grand explains that the distribution semantic model uses co-occurrence statistics to organize words into huge multidimensional matrices. The more similar the words, the closer they are in that space. The dimensions of space are vast and have no intrinsic meaning built into their structure. “There are hundreds of dimensions in these word embeddings, and I don’t know which dimension means what,” he says. “We’re really trying to look into this black box and think, ‘Is there structure here?'”
Specifically, we asked volunteers whether their models represented the semantic scales they asked them to use. So they looked at where the words in space line up along the vectors defined by these scale extrema. For example, where did the dolphins and tigers change from “big” to “small”? were you
We found that across a set of over 50 global categories and semantic scales, the model constructed words very similar to human volunteers. Although dolphins and tigers are similar in size, they were judged to be far apart in measures of danger and wetness. The model organized words in ways that represented different kinds of meanings and was organized entirely based on word co-occurrence.
Fedorenko says that it teaches us something about the power of language. “The fact that we can recover so much of this rich semantic information from these simple word co-occurrence statistics makes this a very powerful source for learning about things we may not even have direct perceptual experience with. It suggests that there is one.”