Can ChatGPT win a Fields Medal? | ChatGPT 能否获得菲尔兹奖? - FT中文网
登录×
电子邮件/用户名
密码
记住我
请输入邮箱和密码进行绑定操作:
请输入手机号码,通过短信验证(目前仅支持中国大陆地区的手机号):
请您阅读我们的用户注册协议隐私权保护政策,点击下方按钮即视为您接受。
FT英语电台

Can ChatGPT win a Fields Medal?
ChatGPT 能否获得菲尔兹奖?

New AI models could soon pose a threat to the world’s top mathematicians
新型人工智能模型在解决数学难题方面的能力正迅速提升。其在极具挑战性的全新问题上的表现,已对全球顶尖数学家构成威胁。
00:00

undefined

The writer is a science commentator

When Yang-Hui He, a fellow at the London Institute for Mathematical Sciences, received an invitation to an all-expenses paid weekend held in Berkeley, California, last month, it was a no-brainer. The trip would afford the Oxford university lecturer, an expert in algebraic geometry and string theory, insider access to a potentially historic moment for his discipline.

Plus, the brief sounded fun: working with other top mathematicians to find out if the most advanced AI models, when confronted with brand new problems, could rival or exceed the collaborative reasoning abilities of the best human minds. The answer? The machines did better than expected. “I’m not saying we felt existentially threatened but there was a general feeling of awe,” He told me. He also flew back $1,500 richer after dreaming up a problem that stumped the AI. 

Using AI to crack maths puzzles is not new. In early 2024, Google DeepMind unveiled technology that could hold its own in high-school student maths competitions. But interacting with the latest AI models last month felt more “like working with a very, very good graduate student”.

This moment could potentially change the profession. While the prospect of a machine securing a Fields Medal — widely regarded as mathematics’ equivalent of a Nobel Prize — still feels reassuringly distant, one can envision an unsettling future in which graduate maths programmes are pruned, university departments are shuttered, and the torch of Pythagoras and Euclid passed to a faceless silicon successor.

The weekend in mid-May was organised by Epoch AI, a US-based non-profit organisation that benchmarks AI capabilities. In an initiative set up last autumn called FrontierMath, Epoch paid professional mathematicians to submit novel problems along with their solutions, proofs and derivations, that could be used to challenge AI models.

These specially crafted conundrums, earning their creators up to $1,000 apiece and graded into three tiers of difficulty (including undergraduate and research level), were collected by Epoch via the secure messaging app Signal, so that they could not be inadvertently included in AI training data scraped from the internet. By April this year, Scientific American reported, an OpenAI model had confounded expectations by solving around a fifth of them.

And so it was time for tier 4 challenges: super-tricky problems that would take top academics weeks or months to solve collaboratively — and designed to resist AI guesswork or brute force number-crunching. Thirty academic experts, including He, met at Epoch’s Berkeley offices to brainstorm some new problems in person. Again, secrecy prevailed: lunches and dinners were brought in; attendees signed non-disclosure agreements and He recalled needing security cards to visit the toilets.

The full results of how the AI model performed on 50 tier 4 problems are yet to be disclosed. But He was struck by how much the tech has improved since 2022, when “ChatGPT couldn’t even find the tenth digit of seven divided by 13 . . . now it’s beginning to do something more intelligent.”

He explained how the AI, called o4-mini, was able to solve some of the problems in minutes, writing mathematical scripts and drawing on external specialist software. Most impressive, he said, were detailed literature searches, turning up obscure but critical papers and coding shortcuts. Another attendee, Ken Ono, a University of Virginia mathematician and freelance consultant for Epoch, called the results “frightening”. 

The project is not without controversy: in January, Epoch apologised for initially failing to disclose OpenAI’s financial backing of FrontierMath, leading to suspicions that the company’s AI models, including o4-mini, would have favoured access to some of the unseen maths problems used for benchmarking.

AI models cannot yet tackle the hardest maths challenges. Even so, one can imagine the next generation of machines thinning out the next generation of human mathematicians. That could shrink the pool from which future Fields medallists are drawn; there might be fewer hopefuls to attack famous unsolved problems like the Riemann Hypothesis, one of six carrying a $1mn bounty.

While the use of prime numbers in encryption shows the practical use of mathematics, there is something quite profound about living in a universe filled with dazzling concepts like zero, infinity and imaginary numbers. Perhaps fretting over whether the addition of AI might subtract from this human endeavour is not that irrational after all.

版权声明:本文版权归FT中文网所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。

古巴:特朗普还剩下什么可以推翻?

华盛顿希望对马杜罗的抓捕能加速哈瓦那僵化的共产主义政权的垮台。

美国石油企业的投资者全力押注“唐罗主义”

美国抓捕马杜罗后,美国油田服务公司周一股价合计上涨近7%,但石油公司真的能实现收入飙升吗?

Telegram受制裁影响,5亿美元俄罗斯债券被冻结

创始人杜罗夫试图让这款通讯应用与莫斯科保持距离的努力,正与制裁现实发生碰撞。

斯塔默或希望与欧盟建立更紧密的联系——但布鲁塞尔会合作吗?

英国首相通过进一步向欧盟单一市场靠拢来提振英国增长的计划,在布鲁塞尔遭遇质疑。

调查:西班牙和荷兰央行前行长出任下届欧洲央行行长的呼声最高

欧洲央行的最高职位将于2027年11月出现空缺。

玛丽亚•科里纳•马查多的漫长等待

委内瑞拉主要反对派领导人正在权衡何时回国——以及是否动员其支持者。
设置字号×
最小
较小
默认
较大
最大
分享×