Some artificial intelligence models are struggling to learn the old principle, “Correlation does not equal causation.” And while that’s not a reason to abandon AI tools, a recent study should remind programmers that even reliable versions of the technology are still prone to bouts of weirdness—like claiming knee X-rays can prove someone drinks beer or eats refried beans.
Artificial intelligence models do much more than generate (occasionally accurate) text responses and (somewhat) realistic videos. Truly well-made tools are already helping medical researchers parse troves of datasets to discover new breakthroughs, accurately forecast weather patterns, and assess environmental conservation efforts. But according to a study published in the journal Scientific Reports, algorithmic “shortcut learning” continues to pose a problem of generating results that are simultaneously highly accurate and misinformed.
Researchers at Dartmouth Health recently trained medical AI models on over 25,000 knee X-rays provided by the National Institutes of Health’s Osteoarthritis Initiative. They then essentially worked backwards, tasking the deep learning programs to find commonalities that predicted nonsensical traits, such as which knee-owners clearly drank beer or ate refried beans—which, as the study authors explain, is patently absurd.
“The models are not uncovering a hidden truth about beans or beer hidden within our knees,” they write.
At the same time, however, the team explains these predictions aren’t the result of “mere chance.” The underlying issue is what’s known as algorithmic shortcutting, in which deep learning models find connections through easily detectable—but still irrelevant or misleading—patterns.
“Shortcutting makes it trivial to create models with surprisingly accurate predictions that lack all face validity,” they warn.
Variables identified by the algorithms, for example, included unrelated factors such as differences in X-ray machine models or the equipment’s geographic locations.
“These models can see patterns humans cannot, but not all patterns they identify are meaningful or reliable,” Peter Schilling, an orthopaedic surgeon, Dartmouth Health assistant professor of orthopaedics, and study senior author added in a statement on December 9th. “It’s crucial to recognize these risks to prevent misleading conclusions and ensure scientific integrity.”
An additional, ongoing problem is that there doesn’t seem to be an easy fix to AI shortcut learning. Attempts to address these biases were only “marginally successful,” according to Monday’s announcement.
“This goes beyond bias from clues of race or gender,” said Brandon Hill, a machine learning scientist and study co-author. “We found the algorithm could even learn to predict the year an X-ray was taken. It’s pernicious; when you prevent it from learning one of these elements, it will instead learn another it previously ignored.”
According to Hill, these problems can potentially lead human experts to trust “some really dodgy claims” made by AI models. To Schilling, Hill, and their colleagues, this means that although predictive deep learning programs have their uses, the burden of proof needs to be much more rigorous when using them in situations such as medical research. Hill likens working with the AI to dealing with an extraterrestrial lifeform while simultaneously attempting to anthropomorphize it.
“It is incredibly easy to fall into the trap of presuming that the model ‘sees’ the same way we do,” he says. “In the end it doesn’t. It learned a way to solve the task given to it, but not necessarily how a person would. It doesn’t have logic or reasoning as we typically understand it.”