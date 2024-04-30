We ask a lot of ourselves as babies. Somehow, we must grow from sensory blobs into mobile, rational, attentive communicators in just a few years. Here you are, a baby without a vocabulary, in a room cluttered with toys and stuffed animals. You pick up a Lincoln Log, and your caretaker tells you, “This is a ‘log.’” Eventually, you come to understand that “log” does not refer strictly to this particular brown plastic cylinder or to brown plastic cylinders in general, but to brown plastic cylinders that embody the characteristics of felled, denuded tree parts, which are also, of course, “logs.”

There has been much research and heated debate around how babies accomplish this. Some scientists have argued that most of our language acquisition can be explained by associative learning, as we relate sounds to sensibilia, much like dogs associate the sound of a bell with food. Others claim that there are features built into the human mind that have shaped the forms of all language and are crucial to our learning. Still others contend that toddlers build their understanding of new words on top of their understanding of other words.

This discourse advanced on a recent Sunday morning, as Tammy Kwan and Brenden Lake delivered blackberries from a bowl into the mouth of their 1-year-old daughter, Luna. Luna was dressed in pink leggings and a pink tutu, with a silicone bib around her neck and a soft pink hat on her head. A lightweight GoPro-type camera was attached to the front.

“Babooga,” she said, pointing a round finger at the berries. Kwan gave her the rest, and Lake looked at the empty bowl, amused. “That’s like $10,” he said. A light on the camera blinked.

For an hour each week over the past 11 months, Lake, a psychologist at New York University whose research focuses on human and artificial intelligence, has been attaching a camera to Luna and recording things from her point of view as she plays. His goal is to use the videos to train a language model using the same sensory input that a toddler is exposed to — a LunaBot, so to speak. By doing so, he hopes to create better tools for understanding both AI and ourselves.

“We see this research as finally making that link, between those two areas of study,” Lake said. “You can finally put them in dialogue with each other.”

There are many roadblocks to using AI models to understand the human mind. The two are starkly different, after all. Modern language and multimodal models — such as OpenAI’s GPT-4 and Google’s Gemini — are assembled on neural networks with little built-in structure and have improved mostly as a result of increased computing power and larger training data sets. Google’s most recent large language model, Llama 3, is trained on more than 10 trillion words; an average 5-year-old is exposed to more like 300,000.

Such models can analyze pixels in images but are unable to taste cheese or berries or feel hunger, important kinds of learning experiences for children. Researchers can try their best to turn a child’s full sensory stream into code, but crucial aspects of their phenomenology will inevitably be missed. “What we’re seeing is only the residue of an active learner,” said Michael Frank, a psychologist at Stanford University who for years has been trying to capture the human experience on camera. His lab is working with more than 25 children around the country, including Luna, to record their experiences at home and in social settings.

Humans are also not mere data receptacles, as neural nets are, but intentional animals. Everything we see, every object we touch, every word we hear couples with the beliefs and desires we have in the moment. “There is a deep relationship between what you’re trying to learn and the data that come in,” said Linda Smith, a psychologist at Indiana University. “These models just predict. They take whatever is put into them and make the next best step.” While you might be able to emulate human intentionality by structuring training data — something Smith’s lab has been attempting to do recently — the most competent AI models, and the companies that make them, have long been geared toward efficiently processing more data, not making more sense out of less.

There is also a more conceptual issue, which stems from the fact that the abilities of AI systems can seem quite human, even though they arise in nonhuman ways. Recently, dubious claims of consciousness, general intelligence and sentience have emerged from industry labs at Google and Microsoft after the release of new models. In March, Claude 3, the newest model from an AI research startup called Anthropic, stirred up debate when, after analyzing a random sentence about pizza toppings hidden in a long list of unrelated documents, it expressed the suspicion that it was being tested. Such reports often smell like marketing ploys rather than objective scientific projects, but they highlight our eagerness to attribute scientific meaning to AI.