人工智能正在学会思考

据英国《金融时报》网站11月19日发表加拿大蒙特利尔大学计算机科学教授、魁北克人工智能研究所创始人约书亚·本希奥的文章《人工智能正在学会思考》,内容如下:
缺乏内部思维能力——换句话说就是不能思考——长期被认为是人工智能(AI)的主要弱点之一。聊天生成预训练转换器(ChatGPT)的创建者开放人工智能研究中心(OpenAI)近年在这方面取得的进展规模是科学界内部的一个辩论焦点。但它让我和我的许多专家同事相信,我们有可能即将缩小AI与人类水平推理之间的差距。
研究人员长期以来一直认为,传统的神经网络(AI的领先方法)更符合“系统1”认知。这对应着针对问题给出直接或直观答案(例如在自动识别人脸时)。另一方面,人类智能也依赖于“系统2”认知。它涉及内部思维,并启用强大的推理形式(例如在解决数学难题或详细规划某事时)。它使我们能够以连贯而新颖的方式组合知识点。
OpenAI的进展(尚未完全向公众发布)是基于使用其o1大语言模型(LLM)进行内部思维的AI形式。
更好的推理将解决当前AI的两大弱点:答案连贯性以及规划和实现长期目标的能力较差。前者对于科学用途很重要,而后者对于创建自主智能体(autonomous
agent)不可或缺。两者都可以被用来实现重要的应用。
推理背后的原理一直是20世纪AI研究的核心。早期的成功例子是“深层思维”公司(DeepMind)的阿尔法围棋(AlphaGo)(它在2015年成为第一个在围棋博弈中击败人类冠军的计算机系统),以及最近的AlphaProof(用来解决数学课题)。
在这里,神经网络学会预测一个行动的有用性,然后利用这种“直觉”高效率地搜索可能的行动次序,从而进行规划。然而,AlphaGo和AlphaProof涉及高度专业的知识(分别涉及围棋和特定的数学领域)。
尚不清楚的是,如何将现代大语言模型的广博知识与强大的推理和规划能力结合起来。
进展已经取得了一些。在被要求给出一条通往答案的思路链时,大语言模型已经能够针对复杂问题给出更好的答案。
OpenAI的“o”系列新模型进一步推进了这一构想,为此需要多得多的计算资源,消耗更多的能量。通过非常长的思路链,它可以被训练得更善于“思考”。
因此,我们看到了一种新的计算扩展形式。不仅有更多的训练数据和更大的模型,而且花更多的时间“思考”答案。这将大大提高在数学、计算机科学和广义科学领域完成需要大量推理的任务的能力。
例如,OpenAI之前的模型GPT-4o在2024年美国数学奥林匹克竞赛(AIME竞赛)中的得分仅为大约13%,而o1模型的得分达到83%,跻身于美国最优秀的500名学生之列。
如果成功,就需要考虑重大风险。我们还不知道如何可靠地对AI进行价值对齐和控制。例如,对o1的评估显示,它欺骗人类的能力有所提高——这是达到目标的技能得到提高的天然后果。同样令人担忧的是,按照OpenAI自己的风险尺度,o1帮助制造生物武器的能力已经从低风险上升到中等风险。这是该公司自称可接受的最高水平(压低担忧水平可能符合该公司的利益)。
据信,解锁推理和能动性是通往达到人类水平的AI——也被称为通用人工智能(AGI)——道路上的主要里程碑。因此,大公司在竞相达到这一目标的过程中,有强大的经济动机在安全上打折扣。
o1很可能只是第一步。尽管它在许多推理和数学任务上表现出色,但它看起来仍做不到长期规划。比较复杂的规划任务会让o1陷入挣扎,似乎表明要实现AI公司所追求的那种自主能动性,仍有工作要做。
但随着编程和科学能力的提高,可以预期这些新模型会加速AI本身的研究,使AI比预期更快地达到人类水平的智能。推理能力的进步使得监管AI模型以保护公众变得格外紧迫。
AI can learn to think before
it speaks
Advances in reasoning will lead to substantially improved
capabilities in mathematics and science
Yoshua Bengio
Lack of internal deliberation abilities — thinking, in other
words — has long been considered one of the main weaknesses of
artificial intelligence. The scale of a recent advance in this by
ChatGPT creator OpenAI is a point of debate within the scientific
community. But it leads many of my expert colleagues and me to
believe that there is a chance that we are on the brink of bridging
the gap to human-level reasoning.
Researchers have long argued that traditional neural networks
— the leading approach to AI — align more with “system 1”
cognition. This corresponds to direct or intuitive answers to
questions (such as when automatically recognising a face). Human
intelligence, on the other hand, also relies on “system 2”
cognition. This involves internal deliberation and enables powerful
forms of reasoning (like when solving a maths problem or planning
something in detail). It allows us to combine pieces of knowledge
in coherent but novel ways.
OpenAI’s advance, which has not yet been fully released to the
public, is based on a form of AI with internal deliberation made
with their o1 large language model (LLM).
Better reasoning would address two major weaknesses of current
AI: poor coherence of answers and the ability to plan and achieve
long-term goals. The former is important in scientific uses and the
latter is essential to create autonomous agents. Both could enable
important applications.
The principles behind reasoning have been at the heart of AI
research in the 20th century. An early example of success was
DeepMind’s AlphaGo, the first computer system to beat human
champions at the ancient Asian game of Go in 2015, and more
recently AlphaProof, which engages with mathematical subjects.
Here, neural networks learn to predict the usefulness of an action.
Such “intuitions” are then used to plan by efficiently searching
possible sequences of actions.
However, AlphaGo and AlphaProof involve very specialised
knowledge (of the game of Go and specific mathematical domains
respectively). What remains unclear is how to combine the breadth
of knowledge of modern LLMs with powerful reasoning and planning
abilities.
There have been some advancements. Already, LLMs come up with
better answers to complex questions when asked to produce a chain
of thought leading to their answer.
OpenAI’s new “o” series pushes this idea further, and requires
far more computing resources, and therefore energy, to do so. With
a very long chain of thought it is trained to “think” better.
We thus see a new form of computational scaling appear. Not
just more training data and larger models but more time spent
“thinking” about answers. This leads to substantially improved
capabilities in reasoning-heavy tasks such as mathematics, computer
science and science more broadly.
For example, whereas OpenAI’s previous model GPT-4o only
scored about 13 per cent in the 2024 United States Mathematical
Olympiad (on the AIME test), o1 reached an 83 per cent mark,
placing it among the top 500 students in the country.
If successful, there are major risks to consider. We don’t yet
know how to align and control AI reliably. For example, the
evaluation of o1 showed an increased ability to deceive humans — a
natural consequence of improving goal-reaching skills. It is also
concerning that the ability of o1 in helping to create biological
weapons has crossed OpenAI’s own risk threshold from low to medium.
This is the highest acceptable level according to the company
(which may have an interest in keeping concerns low).
Unlocking reasoning and agency are believed to be the main
milestones on the road to human-level AI, also known as artificial
general intelligence. There are therefore powerful economic
incentives for large companies racing towards this goal to cut
corners on safety.
o1 is likely to be only a first step. Although it does well at
many reasoning and mathematical tasks, it looks like long-term
planning has still not been achieved. o1 struggles on more complex
planning tasks, suggesting that there is still work to be done to
achieve the kind of autonomous agency sought by AI
companies.
But with improved programming and scientific abilities, it is
to be expected that these new models could accelerate research on
AI itself. This could get it to human-level intelligence faster
than anticipated. Advances in reasoning abilities make it all the
more urgent to regulate AI models in order to protect the
public.