OpenAI has released its latest models, o3 and o4-mini, to increase the reasoning performance. It also aims to provide clearer responses to the user population’s prompts. However, internal testing indicates that these models have higher hallucination rates than older models.
O3 and O4-mini were described to have high hallucination rates. O3 hallucinated in 33% of responses on the Person QA benchmark models, according to internal evaluations, O4-mini was 48%, and the older models: O1 and O3-mini, were 16% and 14.8%, respectively.
Experts have conjectured that the different reinforcement learning methods used could amplify hallucinations. The rapid increase in hallucination rates makes us all wonder if we should trust any material derived from AI now, and in fields of importance, when knowledge is critical. OpenAI admitted to the problem and stated that there will be research to understand why hallucination rates go up as the reasoning model increases in size.
That is an important step, as the first challenge is to understand if we can trust AI systems at all. The O3 and O4-mini modes may be more advanced in terms of AI reasoning, but an increase in overall hallucination trades off how well innovation can balance with accuracy. More research and development maximizes reliability in AI technologies.
.