Hacker News 中文摘要

RSS订阅

瘴气：一个将AI网络爬虫困入无尽毒坑的工具 -- Miasma: A tool to trap AI web scrapers in an endless poison pit

原文链接 | HN讨论 | 2026-03-29 22:39:10

文章摘要

该项目是一个名为"miasma"的开源工具，旨在通过制造无限循环的虚假数据来干扰AI网络爬虫，阻止其抓取网页内容。项目使用Rust语言开发，提供了crate包下载和依赖管理，并通过持续集成进行质量检查。项目明确标注"禁止AI"使用，意在保护网站内容不被AI爬取。

文章总结

项目名称：Miasma - 为AI网络爬虫设计的无尽毒池陷阱

项目简介： Miasma是一个开源工具，旨在帮助网站所有者对抗AI公司的大规模网络爬取行为。该项目通过向恶意爬虫发送"有毒"训练数据和自引用链接，形成一个无限循环的陷阱。

核心功能： 1. 提供"毒泉"(poison fountain)数据源，向爬虫发送无效/有害数据 2. 生成自引用链接，使爬虫陷入无限循环 3. 高性能设计，内存占用极低 4. 支持反向代理配置（如Nginx）

技术特性： - 使用Rust语言开发 - 支持通过cargo安装或下载预编译二进制文件 - 可配置连接数限制（默认500个并发请求） - 提供gzip压缩选项降低流量成本 - 内存占用约1MB/请求

使用方法： 1. 在网页中嵌入隐藏链接（使用CSS隐藏） 2. 配置Nginx将特定路径代理到Miasma 3. 启动Miasma服务并设置链接前缀 4. 通过robots.txt保护友好爬虫

配置选项包括： - 服务端口（默认9999） - 最大并发连接数 - 链接前缀路径 - 每个页面的自引用链接数量 - 毒源数据地址

项目强调： - 拒绝AI生成的贡献 - 欢迎人工提交的问题报告和功能请求 - 提供完整的开发文档和配置指南

该项目通过创造性地对抗AI数据爬取行为，为内容创作者提供了一种保护原创内容的解决方案。

评论总结

总结评论内容：

对项目名称的批评

"-1 for the name" (评论1)
"My asthmar / I'm assuming this is a reference to Lord of the flies" (评论13)

对AI数据抓取的担忧

"many new AI company don't seem to respect any decision made by the person who owns the website" (评论2)
"If you have a public website, they are already stealing your work" (评论10)

质疑项目有效性

"Is there any evidence or hints that these actually work?" (评论5)
"Can't the LLMs just ignore or spoof their user agents anyway?" (评论6)

认为会引发技术军备竞赛

"This is ultimately just going to give them training material for how to avoid this crap" (评论11)
"It's essentially an arms race, with the little folks getting crushed" (评论17)

提出其他解决方案

"Why not simply blacklist or rate limit those bot IP's?" (评论7)
"Or you can block bots with these" (评论16)

认为项目适得其反

"Seems counterproductive" (评论9)
"consider that you might be the baddies" (评论14)

类比项目像垃圾邮件

"This is essentially machine-generated spam" (评论17)
"The irony of machine-generated slop to fight machine-generated slop" (评论17)

对项目性质的质疑

"These projects are the new 'To-Do List' app" (评论3)
"Isn't posting projects like this the most visible way to report a bug" (评论4)