Looking for Signs of Intelligence in Chatbots
A research team led by Hector Zenil of King’s College London has introduced a new framework to evaluate artificial superintelligence, publishing their findings in Nature Communications. The study challenges the assumption that recent advancements in large language models represent a leap toward general intelligence, noting that newer versions actually scored lower on measures of abstraction and prediction than their predecessors. This assessment comes amidst growing excitement over AI capabilities, such as when amateur researcher Liam Price utilized OpenAI’s ChatGPT to solve the decades-old Erdős Problem #1196. Zenil argues that traditional benchmarks often measure how well machines behave like humans rather than how effectively they process data. His team defined superintelligence as a system capable of flawlessly abstracting key features and making predictions where randomness allows. By testing model abstraction, inverse problem-solving, and sequence generation, the researchers found that systems struggle when faced with increased complexity beyond their training data. This indicates that current models may be patching together prior knowledge rather than engaging in deeper logical understanding. The discussion extends to the potential risks and applications of such technology in scientific fields like medicine and climate change modeling. While chatbots have successfully mastered language, Zenil warns that relying solely on these tools could lead to scenarios where scientists cannot fully comprehend the results generated by automated systems. There is also concern regarding neurosymbolic computation, which merges deep learning with symbolic logic to potentially bridge the gap between intuition and formal reasoning. Ultimately, the study highlights the tension between accelerating AI development and maintaining the ability to verify and understand its outputs.
发布时间: June 10, 2026 at 08:00 PM
News Article

内容
A research team led by Hector Zenil of King’s College London has introduced a new framework to evaluate artificial superintelligence, publishing their findings in Nature Communications. The study challenges the assumption that recent advancements in large language models represent a leap toward general intelligence, noting that newer versions actually scored lower on measures of abstraction and prediction than their predecessors. This assessment comes amidst growing excitement over AI capabilities, such as when amateur researcher Liam Price utilized OpenAI’s ChatGPT to solve the decades-old Erdős Problem #1196.
Zenil argues that traditional benchmarks often measure how well machines behave like humans rather than how effectively they process data. His team defined superintelligence as a system capable of flawlessly abstracting key features and making predictions where randomness allows. By testing model abstraction, inverse problem-solving, and sequence generation, the researchers found that systems struggle when faced with increased complexity beyond their training data. This indicates that current models may be patching together prior knowledge rather than engaging in deeper logical understanding.
The discussion extends to the potential risks and applications of such technology in scientific fields like medicine and climate change modeling. While chatbots have successfully mastered language, Zenil warns that relying solely on these tools could lead to scenarios where scientists cannot fully comprehend the results generated by automated systems. There is also concern regarding neurosymbolic computation, which merges deep learning with symbolic logic to potentially bridge the gap between intuition and formal reasoning. Ultimately, the study highlights the tension between accelerating AI development and maintaining the ability to verify and understand its outputs.