Hacker News 中文摘要

文章摘要

Codeberg因AI爬虫的猛烈攻击导致系统一度极其缓慢，但通过调整防护措施，性能已显著改善。AI爬虫似乎已学会绕过Anubis挑战，该工具曾有效区分真实浏览器与AI爬虫，减轻了手动维护黑名单的负担。Codeberg对新用户的加入表示欢迎，并持续应对新增负载。

文章总结

Codeberg 是一个开源平台，近期因大量AI爬虫的访问导致服务器性能严重下降。Codeberg团队对此表示歉意，并解释了问题的根源：AI爬虫已经学会了如何绕过Anubis挑战。Anubis是Codeberg使用的一种工具，要求浏览器在访问平台前进行大量计算，以区分真实用户和AI爬虫。尽管Anubis在过去几个月内有效阻止了爬虫，但近期AI爬虫的计算能力提升，导致其能够模拟真实浏览器的行为，成功绕过Anubis的防护。

Codeberg团队迅速调整了防护措施，并修复了配置问题，目前服务器性能已恢复正常。他们还提到，尽管AI爬虫的计算能力增强，但华为网络的爬虫在解决Anubis挑战时仍需要几秒钟的时间，这表明AI爬虫的计算能力确实有所提升。

此外，Codeberg团队还分享了一台物理服务器的负载数据，显示在高峰期的负载平均值达到了5831.24，这反映了AI爬虫对服务器资源的巨大消耗。尽管问题暂时得到解决，但Codeberg团队认为，随着AI爬虫的不断进化，未来防护措施将面临更大的挑战。

在讨论中，一些用户提出了其他防护建议，如使用GNUzip炸弹来阻止爬虫，但Codeberg团队认为Anubis仍然是目前最有效的解决方案。尽管Anubis并非完美，但它帮助团队节省了大量手动维护黑名单的工作。

总的来说，Codeberg团队正在积极应对AI爬虫带来的挑战，并呼吁用户理解和支持他们的防护措施。

评论总结

评论主要围绕Anubis的功能、效果及其对AI爬虫的影响展开，观点多样且存在争议。

Anubis的功能与效果
- 有评论认为Anubis更像是一个速率限制器，而非真正的反机器人工具。它通过SHA-256校验和计算来验证客户端，但自动化代理可以轻松绕过这一机制。
  引用：
  - "It seems to be more of a rate limiter than anything else."
  - "Why shouldn’t an automated agent be able to deal with that just as easily, by just feeding that JavaScript to its own interpreter?"
- 另有评论指出，Anubis的SHA256算法对GPU/ASIC友好，导致合法浏览器与大规模爬虫操作之间的计算能力差距较大，建议采用更内存密集的算法。
  引用：
  - "This is very GPU/ASIC friendly, so there’s a big disparity between the amount of compute available in a legit browser vs a datacentre-scale scraping operation."
  - "A more memory-hard 'mining' algorithm could help."
对AI爬虫的批评与建议
- 有评论批评AI公司频繁爬取数据的行为，认为其重复工作且对网站造成负担，建议通过共同爬取组织（如Common Crawl）共享数据。
  引用：
  - "I wish AI companies would instead, I don’t know, fund common crawl or something so that they can have a single organization and set of bots collecting all the training data they need and then share it."
  - "Why wouldn’t like one crawl of each site a day, at a reasonable rate, be enough?"
- 另有评论提出通过提供虚假数据来对抗AI爬虫，认为这是一种“以火攻火”的策略。
  引用：
  - "Fight fire with fire by serving these guys LLM output of made-up news."
  - "Wish them good luck noticing that in their dataset."
对Anubis的质疑与改进建议
- 有评论认为Anubis未能有效阻止恶意爬虫，反而浪费了合法用户的资源，建议将其定位为DDoS防护工具而非AI防护工具。
  引用：
  - "The author of Anubis really should advertise it as a DDoS guard, not an AI guard."
  - "Anubis does nothing to impact bad crawlers, well only the laziest ones."
- 另有评论指出Anubis允许某些用户代理绕过验证，导致其效果有限。
  引用：
  - "Anubis and others allow some user agents to pass without proof of work."
  - "Bad bots (and user) just use an extension that detect Anubis and change the user agent instead."
开发者面临的挑战
- 有评论提到Anubis开发者因个人生活压力未能全力投入项目，导致支持合同受阻，呼吁更多支持。
  引用：
  - "I haven’t been able to put as much energy into Anubis as I’ve wanted because I’ve been incredibly overwhelmed by life."
  - "I just wish I had the time and energy to focus on this without having to worry about being the single income for the household."

总结：Anubis的功能和效果受到质疑，尤其是其对AI爬虫的防护能力有限。评论者提出了改进算法的建议，并批评AI公司频繁爬取数据的行为。同时，开发者面临的生活压力也影响了项目的进展。

AI爬虫似乎已学会如何破解阿努比斯挑战 -- It seems like the AI crawlers learned how to solve the Anubis challenges

文章摘要

文章总结

评论总结