文章摘要
研究测试了Claude Sonnet 4.5、Gemini 2.5 Pro和GPT-5三款AI模型解决Google reCAPTCHA v2的能力。结果显示,Claude表现最佳,成功率60%,Gemini为56%,GPT-5仅28%。不同验证码类型中,静态验证码最易破解,交叉拼图类最难,所有模型在交叉拼图类成功率均低于2%。
文章总结
标题:主流AI模型破解验证码能力评测
研究背景: 许多网站使用验证码(CAPTCHA)来区分人类用户和自动化流量。本研究测试了Claude Sonnet 4.5、Gemini 2.5 Pro和GPT-5三款主流AI模型在破解Google reCAPTCHA v2验证码方面的表现。
核心发现: 1. 总体表现: - Claude Sonnet 4.5成功率最高(60%) - Gemini 2.5 Pro次之(56%) - GPT-5表现最差(28%)
- 验证码类型分析: 测试涉及三种验证码类型:
- 静态型(Static):3x3静态网格
- 重载型(Reload):动态替换点击图像
- 跨格型(Cross-tile):4x4网格,物体可能跨越多格
各模型表现: | 模型 | 静态型 | 重载型 | 跨格型 | |------|--------|--------|--------| | Claude | 47.1% | 21.2% | 0% | | Gemini | 56.3% | 13.3% | 1.9% | | GPT-5 | 22.7% | 2.1% | 1.1% |
- 模型行为差异:
- GPT-5表现不佳的主要原因是过度推理和反复修正
- 相比其他模型,GPT-5产生更多"思考"输出(见图3)
- 在动态界面中,GPT-5容易出现重复点击等低效行为
- 验证码难度分析:
- 所有模型在跨格型验证码上表现最差
- 模型普遍存在"矩形选择偏好",难以识别跨格物体
- 重载型验证码因动态变化特性导致模型容易进入失败循环
研究启示: 1. 过度推理不一定带来更好表现 2. 实时决策能力与深度推理同样重要 3. 现有智能体架构在动态界面中存在局限性
实验方法: 1. 使用Browser Use框架进行浏览器自动化测试 2. 每个模型最多尝试5次不同的验证码挑战 3. 测试在Google官方演示页面进行,避免跨域问题 4. 共进行75次试验,记录388次具体尝试
(注:原文中的图片链接和部分技术细节已省略,保留了核心数据和结论)
评论总结
以下是评论内容的总结:
1. AI解决验证码的能力
- 支持观点:AI在解决验证码方面表现优异,甚至可能超过人类。
- "Google Gemini is tied for the best and is the cheapest way to solve Google's reCAPTCHA."(评论2)
- "I hypothesize that these AI agents are all likely higher than human performance now."(评论12)
- 质疑观点:AI在复杂验证码(如跨图块任务)中表现较差,显示其推理能力有限。
- "Cross-tile performance was 0-2%... Seems to really highlight how far these things are from reasoning or human level intelligence."(评论13)
- "Would performance improve if the tiles were stitched together and fed to a vision model?"(评论11)
2. 验证码的实用性与可访问性
- 批评观点:验证码对残障人士不友好,是一种排除性工具。
- "People with cognitive, sight, motor impairments are at a severe disadvantage... You can add as many aria labels as you like but if you're relying on captchas, you are not accessible."(评论18)
- "They are not the solution. I don't know what is, but this aint it."(评论18)
- 实用性质疑:验证码可能已无法有效区分人类和AI。
- "The real Turing test is whether your AI passes all the various flavors of captcha checks."(评论15)
- "If you don't want bots to read content, don't put it online, you're just inconveniencing real people now."(评论8)
3. 技术细节与未来展望
- 性能评估:部分用户对AI的表现感到意外,认为其表现优于预期。
- "I'm surprised how well it holds. I expected close-to-total collapse."(评论5)
- "It'll be a matter of time I guess, but still."(评论5)
- 未来趋势:验证码可能逐渐被淘汰,或需要更复杂的技术应对。
- "Models will get better at solving captchas in the near future. IMHO, the real concern is cheap captcha solving services."(评论9)
- "Will be interesting to see how Gemini 3 does later this year."(评论2)
4. 其他观点
- 数据质疑:部分用户认为实验样本不足,结论可能不具代表性。
- "3 models only, can we really call that a benchmark?"(评论6)
- 用户体验:人类用户在解决验证码时也面临困难。
- "Sometimes I get stuck on an endless loop of buses and fire hydrants."(评论3)
- "Is it assumed that humans perform 100% against this captcha? Because being one of those humans it’s been closer to 50% for me."(评论17)