Hacker News 中文摘要

文章摘要

Claude Sonnet 4现已支持100万token的上下文窗口，显著提升了其处理长文本的能力。这一更新使其在理解和生成复杂内容方面更具优势，适用于更广泛的应用场景。

文章总结

Claude Sonnet 4 推出百万级上下文窗口

2025年8月12日，Anthropic公司发布了Claude Sonnet 4的最新版本，该版本具备100万token的上下文窗口，相当于每次提示中可以处理整个《哈利·波特》系列的内容。这一更新使得Claude在处理长文本任务时表现更加出色。

测试内容：

长文本分析：在100万token的上下文中隐藏了两部电影的场景，要求Claude找到并分析这些场景。Claude在速度和准确性上表现优异，尤其是在与Google的Gemini 2.5 Pro和Gemini 2.5 Flash的对比中，Claude的速度更快，且较少出现幻觉（即错误信息）。
代码分析：将Every内容管理系统的整个代码库（约25万token）加上填充代码，测试Claude的代码分析能力。Claude在速度和准确性上略逊于Gemini，但在某些细节上表现不足。
AI外交游戏：在AI外交游戏中，Claude表现出色，尤其是在未经优化的提示下，表现优于其他模型。

结论：

Claude Sonnet 4在处理长上下文任务时表现出色，尤其是在速度和减少幻觉方面。然而，对于需要详细分析的文本和代码任务，Gemini模型仍然更具优势。此外，Claude的定价为每100万token输入6美元，而Gemini Pro和Flash分别为2.5美元和0.3美元，价格上Gemini更具竞争力。

总的来说，Claude Sonnet 4在处理长上下文任务时是一个值得考虑的选择，尤其是在需要快速且准确响应的场景下。

评论总结

Sonnet-4与Gemini-2.5-Flash的性能比较
- 有评论指出Sonnet-4在处理长上下文时比Gemini-2.5-Flash更快，尽管Gemini运行在快速的TPU上。
- 引用：
  - "So sonnet-4 is faster than gemini-2.5-flash at long context. That is surprising. Especially since Gemini runs on those fast TPUS."
  - "i’m really curious how well they perform with a long chat history. i find that gemini often gets confused when the context is long enough."
1M tokens的上下文容量
- 有评论对1M tokens的容量表示怀疑，认为无法容纳所有《哈利·波特》书籍。
- 引用：
  - "I really doubt you can fit all Harry Potter books in 1M tokens."
  - "Claude Sonnet 4 now supports 1M tokens of context."
LLM的数据压缩能力
- 有评论建议将数据压缩作为LLM之间的竞赛，测试其生成紧凑笔记并基于笔记回答问题的能力。
- 引用：
  - "IMO, a good contest between LLMs would be data compression. Each LLM is given the same pile of text, and then asked to create compact notes that fit into N pages of text."
免费使用高级AI模型
- 有评论强调用户可以通过Google AI Studio免费使用包括Gemini 2.5 Pro在内的最新模型，享受1M上下文窗口等高级功能。
- 引用：
  - "What people seem to miss very hard is that they get interactive chat mode of all the models, including the best and newest (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite and older) totally for free."
  - "You really get a very good AI for nothing."

Claude与Gemini：百万上下文标记测试 -- Claude vs. Gemini: Testing on 1M Tokens of Context

文章摘要

文章总结

评论总结