ChatGPT falls short in heart risk assessment

Despite ChatGPT's reported ability to pass medical exams, new research indicates it would be unwise to rely on it for some health assessments, such as whether a patient with chest pain needs to be hospitalized. In a study involving thousands of simulated cases of patients with chest pain, ChatGPT provided inconsistent conclusions, returning different heart risk assessment levels for the exact same patient data. The generative AI system also failed to match the traditional methods physicians use to judge a patient's cardiac risk.

The findings were published in the journal PLOS ONE . ChatGPT was not acting in a consistent manner. Given the exact same data, ChatGPT would give a score of low risk, then next time an intermediate risk, and occasionally, it would go as far as giving a high risk.

" Dr. Thomas Heston, lead author, researcher with Washington State University's Elson S. Floyd College of Medicine The authors believe the problem is likely due to the level of randomness built into the current version of the software, ChatGPT4, which helps it vary its responses to simulate natural language.

This same randomness, however, does not work well for healthcare uses that require a single, consistent answer, Heston said. "We found there was a lot of variation, and that variation in approach can be dangerous," he said. "It can be a useful tool, but I think the technology is going a lot faster than our understanding of it, so it's critically important that we do a lot of research, esp.