New ChatGPT Bots Emerge, Promise to Reduce Hallucinations

General
New ChatGPT Bots Emerge, Promise to Reduce Hallucinations

OpenAI is pleased to announce GPT-4, the next evolution of everyone's favorite chatbot ChatGPT. The new version, on top of a more advanced language model that "shows human-level performance in various professional and academic tests," accepts image input and promises stricter rejection behavior to thwart your unwarranted requests.

However, the accompanying GPT-4 Technical Report (opens in new tab) (PDF) warns that the new model is still relatively capable of what the researchers call "hallucinations." This sounds perfectly safe.

What the researchers call hallucinations is that the new ChatGPT model, like the previous version, tends to "produce content that is meaningless or untrue in relation to some sources."

Although the researchers explicitly state that "GPT-4 was trained to reduce the model's hallucinatory tendencies by leveraging data from prior models such as ChatGPT." The researchers not only trained GPT-4 on its own malfunctions, but also on human assessments (open in new tab).

"We collected real-world data that was flagged as untrue, reviewed it, and created "fact" sets for it where possible. This was used to evaluate model generations in relation to the "factual" set, facilitating human evaluation.

This process seems to be quite helpful for closed topics, but for broader strokes, the chatbot still has problems. As noted in the paper, GPT-4 is 29% better than GPT-3.5 on "closed domain" chat, but only 19% better on avoiding "open domain" illusions.

ITNEXT (opens in a new tab), in discussing the difference between open-domain and closed-domain QA, states, "Closed-domain QA is a type of QA system that provides answers based on limited information within a specific domain or knowledge base. Open domain QA systems instead "provide answers based on the vast amount of information available on the Internet and are best suited to specific, limited information needs."

In other words, the chat GPT-4 could still lie to us.

Of course, users will resent chatbots providing false information, but this is not the biggest problem. One of the main problems is overconfidence. The tendency to hallucinate, the paper states, "can be particularly harmful as the persuasiveness and credibility of the model increases, leading to excessive trust by the user."

"Paradoxically, hallucinations can become more dangerous as models become more truthful. It is natural to trust a previously accurate source of information, but as they say, "a broken clock is right twice a day."

Excessive trust is especially problematic when chatbots are integrated into automated systems that help us make decisions in our society. This can cause a feedback loop that leads to a "decline in the overall quality of information."

"It is crucial to recognize that models are not always accurate in acknowledging their limitations, as seen in the tendency to hallucinate."

Problems aside, developers seem quite optimistic about the new model, at least according to the GPT-4 overview on the OpenAI site (opens in new tab).

"We have found and fixed some bugs and improved the theoretical basis. As a result, GPT-4 training (at least for us!) is has stabilized to an unprecedented degree.

The meltdown we are hearing about (open in new tab) is mostly happening through Bing's ChatGPT integration, but we'll see about that when it starts gaslighting again.

ChatGPT-4 is available now for ChatGPT Pro users, but even paying customers should expect the service to be "severely capacity-limited."

Categories