Hacker News 中文摘要

文章摘要

研究人员尝试用16个Claude AI并行协作,在无人干预下开发了一个能编译Linux内核的C编译器。该项目耗费近2000次会话和2万美元成本,最终产出10万行代码。实验探索了如何设计测试框架让AI团队自主工作、并行协作的机制,以及这种方法的局限性。

标题：用并行Claude团队构建C编译器

作者：安全团队研究员Nicholas Carlini

核心内容：

注：原文中具体代码片段、过细的技术实现细节及致谢部分等非核心内容已做精简处理。

以下是评论内容的总结，平衡呈现不同观点并保留关键引用：

技术成就值得肯定
- 认为能编译Linux内核是重大突破，远超之前浏览器项目的失败案例
  "This is like a working version of the Cursor blog... much more impressive than a browser" (OsrsNeedsf2P)
  "building a such a complex project like a C compiler on a 20k $ budget in full autonomy is quite impressive" (epolanski)
实验设计严谨
- 强调干净室实现、多架构支持和真实项目测试
  "clean-room implementation... can build Linux 6.9 on x86, ARM, and RISC-V" (NitpickLawyer)
  "passes the developer's ultimate litmus test: it can compile and run Doom" (btown)

代码质量低下
- 指出生成代码效率甚至低于GCC -O0，维护性差
  "Worse than '-O0' takes skill... an equivalent of which one man can produce in under two weeks" (dmitrygr)
  "why x9? who knows?!" (dmitrygr)
涉嫌抄袭训练数据
- 质疑所谓"干净室实现"的真实性，认为依赖现有编译器知识
  "Calling it clean room... when Anthropic stole all open source is laughable" (hrgadyx)
  "obviously it can regurgitate things that are nearly identical to already existing code" (jcalvinowens)
实用价值有限
- 认为20k美元成本过高，产出物无实际应用价值
  "You could hire a dev in India for $1k —- or pay $20k for a buggy mess" (fxtentacle)
  "Microsoft... all solving the wrong problems, your problems not the collective ones" (trilogic)

技术潜力与局限并存
- 承认突破性但也指出当前模型能力的边界
  "it's a cool little experiment... nearly reached the limits of Opus’s abilities" (NitpickLawyer)
  "while these agentic systems can do amazing things... you hit diminishing returns" (btown)
需验证完整功能
- 多次质疑编译后的内核是否能启动
  "Nothing in the post about whether the compiled kernel boots" (sho_hn)
  "it can compile the linux kernel, but does it boot?" (owenpalmer)

提议开发更适合LLM的编程语言 "design a perfect programming language for LLM coding" (small_model)
要求公开实验细节 "All prompts used... The structure of the agent team" (akrauss)

总结：该实验在技术验证层面获得认可，但在代码质量、创新性和社会价值方面存在显著争议，反映出当前AI生成复杂系统的能力边界和伦理争议。