Hacker News 中文摘要

文章摘要

研究测试了Claude Sonnet 4.5、Gemini 2.5 Pro和GPT-5三款AI模型解决Google reCAPTCHA v2的能力。结果显示，Claude表现最佳，成功率60%，Gemini为56%，GPT-5仅28%。不同验证码类型中，静态验证码最易破解，交叉拼图类最难，所有模型在交叉拼图类成功率均低于2%。

文章总结

标题：主流AI模型破解验证码能力评测

研究背景：许多网站使用验证码（CAPTCHA）来区分人类用户和自动化流量。本研究测试了Claude Sonnet 4.5、Gemini 2.5 Pro和GPT-5三款主流AI模型在破解Google reCAPTCHA v2验证码方面的表现。

核心发现： 1. 总体表现： - Claude Sonnet 4.5成功率最高（60%） - Gemini 2.5 Pro次之（56%） - GPT-5表现最差（28%）

验证码类型分析：测试涉及三种验证码类型：

静态型（Static）：3x3静态网格
重载型（Reload）：动态替换点击图像
跨格型（Cross-tile）：4x4网格，物体可能跨越多格

各模型表现： | 模型 | 静态型 | 重载型 | 跨格型 | |------|--------|--------|--------| | Claude | 47.1% | 21.2% | 0% | | Gemini | 56.3% | 13.3% | 1.9% | | GPT-5 | 22.7% | 2.1% | 1.1% |

模型行为差异：

GPT-5表现不佳的主要原因是过度推理和反复修正
相比其他模型，GPT-5产生更多"思考"输出（见图3）
在动态界面中，GPT-5容易出现重复点击等低效行为

验证码难度分析：

所有模型在跨格型验证码上表现最差
模型普遍存在"矩形选择偏好"，难以识别跨格物体
重载型验证码因动态变化特性导致模型容易进入失败循环

研究启示： 1. 过度推理不一定带来更好表现 2. 实时决策能力与深度推理同样重要 3. 现有智能体架构在动态界面中存在局限性

实验方法： 1. 使用Browser Use框架进行浏览器自动化测试 2. 每个模型最多尝试5次不同的验证码挑战 3. 测试在Google官方演示页面进行，避免跨域问题 4. 共进行75次试验，记录388次具体尝试

（注：原文中的图片链接和部分技术细节已省略，保留了核心数据和结论）

评论总结

以下是评论内容的总结：

1. AI解决验证码的能力

支持观点：AI在解决验证码方面表现优异，甚至可能超过人类。
- "Google Gemini is tied for the best and is the cheapest way to solve Google's reCAPTCHA."（评论2）
- "I hypothesize that these AI agents are all likely higher than human performance now."（评论12）
质疑观点：AI在复杂验证码（如跨图块任务）中表现较差，显示其推理能力有限。
- "Cross-tile performance was 0-2%... Seems to really highlight how far these things are from reasoning or human level intelligence."（评论13）
- "Would performance improve if the tiles were stitched together and fed to a vision model?"（评论11）

2. 验证码的实用性与可访问性

批评观点：验证码对残障人士不友好，是一种排除性工具。
- "People with cognitive, sight, motor impairments are at a severe disadvantage... You can add as many aria labels as you like but if you're relying on captchas, you are not accessible."（评论18）
- "They are not the solution. I don't know what is, but this aint it."（评论18）
实用性质疑：验证码可能已无法有效区分人类和AI。
- "The real Turing test is whether your AI passes all the various flavors of captcha checks."（评论15）
- "If you don't want bots to read content, don't put it online, you're just inconveniencing real people now."（评论8）

3. 技术细节与未来展望

性能评估：部分用户对AI的表现感到意外，认为其表现优于预期。
- "I'm surprised how well it holds. I expected close-to-total collapse."（评论5）
- "It'll be a matter of time I guess, but still."（评论5）
未来趋势：验证码可能逐渐被淘汰，或需要更复杂的技术应对。
- "Models will get better at solving captchas in the near future. IMHO, the real concern is cheap captcha solving services."（评论9）
- "Will be interesting to see how Gemini 3 does later this year."（评论2）

4. 其他观点

数据质疑：部分用户认为实验样本不足，结论可能不具代表性。
- "3 models only, can we really call that a benchmark?"（评论6）
用户体验：人类用户在解决验证码时也面临困难。
- "Sometimes I get stuck on an endless loop of buses and fire hydrants."（评论3）
- "Is it assumed that humans perform 100% against this captcha? Because being one of those humans it’s been closer to 50% for me."（评论17）