Artificial intelligence models fall short of predicting social interactions

featured-image

A skill critical for systems to effectively navigate the real world.Continue reading Artificial intelligence models fall short of predicting social interactions on Tech Explorist.

A study from Johns Hopkins University reveals that humans excel over AI in understanding social interactions in motion, an essential skill for technologies such as self-driving cars and assistive robots. Current AI struggles to recognize human intentions, such as whether a pedestrian is about to cross the street or if two people are engaged in conversation. Researchers suggest that this issue stems from how AI is built, as it cannot fully grasp social dynamics.

To compare AI models with human perception, researchers had people watch short video clips and rate how well they understood the social interactions depicted. The videos showed people engaging with each other, doing activities side by side, or acting independently. Next, they tested over 350 AI models—spanning language, video, and image processing—asking them to predict how humans would judge the videos and how their brains might respond.



For large language models, they analyzed human-written captions to see how well AI understood social dynamics. Researchers found that human participants generally agreed on how they interpreted social interactions in videos, but AI models struggled, regardless of their training data or size. Video AI models failed to accurately describe actions in the clips, and even image models analyzing still frames couldn’t reliably detect whether people were communicating.

Language models performed better at predicting human behavior, whereas video models were more effective at estimating brain activity during video viewing. This highlights a major gap in AI’s understanding of unfolding social dynamics. Scientists believe this limitation stems from AI’s design, as current models are built like the part of the human brain that processes static images, unlike the region responsible for interpreting dynamic social scenes.

The study suggests that AI cannot still fully mimic how humans naturally perceive and respond to social interactions. Journal Reference: Kathy Garcia, Emalie McMahon, Colin Conwell, Michael Bonner, Leyla Isik. Modeling dynamic social vision highlights gaps between deep learning and humans.

Paper Topics Artificial Intelligence Robot.