How AI learned new words through baby's 'eyes and ears'
Scientists have made a breakthrough in teaching an artificial intelligence (AI) model to learn words using 61 hours of audiovisual footage from a toddler's head-mounted camera. The study featured in Science indicates that learning a language might be easier than we thought. Brenden Lake—the study's senior author and an associate professor at New York University—said, "We showed, for the first time, that you can train an AI model to learn words through the eyes and ears of a single child."
AI model
The research team used a machine-learning model with a vision encoder and a text encoder to process images and written language for AI understanding. They fed the model still frames from the toddler's camera footage along with transcribed text from the audio. This method allowed the AI to connect words with objects and visuals, much like how children learn.
Training method
In training the model, the researchers utilized more than 60 hours of video/audio recordings captured through the lightweight head gadget worn by the child. The gadget was intermittently worn by the toddler from the age of six months until their second birthday. Throughout this 19-month period, the device amassed over 600,000 video frames correlated with over 37,500 transcribed utterances from people in proximity. The capture of ambient conversations and video frames offered insight into a growing child's daily experiences.
Testing and results
To test the AI model, researchers used a method similar to evaluating children's language learning. The AI was shown four images from the training set and asked to identify a specific object, like a ball. The baby cam-trained model achieved 61.6% accuracy, better than random guesses. It also identified objects in new images not in the toddler's recordings, showing its ability to apply what it learned. The paper's lead author, Wai Keen Vong, said, "We were quite surprised by that."
Implications for human language learning
This study challenges the idea that children need complex mechanisms for efficient word understanding. Jessica Sullivan, an associate professor at Skidmore College who wasn't involved in the research, said, "Now I see that, in at least one case, it is possible." However, the study doesn't prove how children learn words but shows what's possible for machines and probably humans. Other factors beyond pattern recognition likely play a role in human learning, according to Linda Smith, an Indiana University Bloomington professor.
Potential impact on AI development
The research could help create AI models that learn more like humans and offer new learning ways. Current large language models (LLMs) need astronomical data for training. The study suggests with the right data, the gap between machine and human learning could shrink significantly. Lake said, "I was surprised how much today's AI systems are able to learn when exposed to quite a minimal amount of data of the sort a child actually receives when they are learning a language."