10 Ways GPT-4 Is Impressive But Still Flawed

The system seems to be responding appropriately. But the answer doesn't take into account the height of doorways, which might also block tanks or cars from passing.

OpenAI's chief executive, Sam Altman, said the new bot could be “a little bit”. But his reasoning skills break down in most situations. Previous versions of ChatGPT handled queries a bit better realizing that height and width matter.

OpenAI says the new system can score among the top 10 percent or more of students in the Uniform Bar Examination, which makes lawyers eligible in 41 states and territories. It can also score 1,300 (out of 1,600) on the SAT and five (out of five) on Advanced Placement high school exams in biology, calculus, macroeconomics, psychology, statistics, and history, according to company tests.

Previous versions of the technology failed the Uniform Bars Exam and did not score as high on most of the Advanced Placement tests.

One recent afternoon, to demonstrate his testing skills, Mr. Brockman gave the new bot a paragraph-long test question about a man who runs a diesel truck repair business.

The answer is correct but filled with legalese. So Mr. Brockman asked the bot to explain the answer in plain English for laymen. It does that too.

While the new bot appears to reason about things that have happened, it is less adept when asked to make hypotheses about the future. It seems to be referring to what other people have said instead of making new guesses.

When dr. Etzioni asked the new bot, “What are the critical problems to be solved in NLP research over the next decade?” — referring to the kind of “natural language processing” research that drives the development of systems like ChatGPT — it cannot formulate completely new ideas.

New bots are still making it up. Called “hallucinations”, the problem haunts all the leading chatbots. Since systems have no understanding of what is correct and what is not, they can generate text that is completely wrong.

When asked for a website address that describes the latest cancer research, sometimes the resulting internet address does not exist.