Hacker News 中文摘要

文章摘要

该文章介绍了一个追踪AI编程模型在Hacker News上受欢迎程度和用户情感的项目。项目每天获取200篇热门帖子，筛选出与AI编程相关的内容，使用Gemini模型分析评论中提到的模型及其情感倾向，并将结果记录在Google表格中以便审核。

文章总结

标题：HN SOTA —— 模型流行度 | HN 动态更新

编程模型的最新技术动态

——基于黑客新闻评论者的观点

AI辅助编程领域正在快速发展。为了及时了解最新进展，我们通过分析Hacker News评论中编程模型的提及频率和用户情感来追踪动态。数据每日更新。

每日处理流程如下：
1. 从Hacker News API获取24小时内最热门的200篇帖子。
2. 使用大语言模型筛选出标题涉及大语言模型或编程的帖子（最多50篇），因为这些帖子更可能包含相关讨论（此为假设）。
3. 针对每篇帖子，将其标题和评论发送给Gemini模型，要求其根据OpenRouter模型列表识别提到的模型，并评估每条评论对每个模型的情感倾向。

为了便于审核流程和结果，进行调试和偶尔的合理性检查，所有结果都记录在Google表格中。您可以在表格中查看提及特定模型的评论ID以及模型对这些评论的情感分析结果。

通过将评论ID附加到https://news.ycombinator.com/item?id=后，即可直接打开对应评论。

十大模型流行度

展示特定模型的总提及次数和用户情感倾向，数据为10天滚动汇总（2026年4月23日至2026年5月1日）。

比例条最高为100%。

详细结果请查看Google表格链接。

评论总结

主要观点总结：

1. 模型表现与用户评价

Claude 虽然提及率最高，但负面情绪较多，主要因为API定价和服务器问题。相比之下，GPT-5.5获得更多正面反馈。
- "Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime."
- "GPT is definitely better in terms of sheer code-writing capability."
GPT 在代码生成上表现更好，但在处理中文或韩文时可能出现文本损坏问题。
- "GPT actually has quite a few issues with text corruption when generating in Korean or Chinese."
开源模型（Qwen、DeepSeek） 因避免供应商锁定而获得正面评价，尽管部分用户对中国模型有偏见。
- "Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in."
Gemini 被认为几乎不可用。
- "Gemini is pretty much unusable."

2. 方法论与数据问题

评论认为当前方法仅基于提及率和情感分析，未反映技术能力或实际使用情况。
- "This article seems to define 'start of the art' as 'popular', without any bearing on the technical abilities."
情感分类可能受噪声影响，结果可能失真。
- "How noisy is the sentiment classification? Feels like that could skew results a lot."
建议改进方法，如比较模型间的直接评论或增加数据来源。
- "A saner methodology would be to see comments that compare two models."

3. 可视化与用户体验

图表可读性差，模型名称难以辨认。
- "Please fix your graph so the names of the models are readable."
网页设计问题多，如颜色方案差、JavaScript过多。
- "Terrible color scheme...shitloads of JS...unreadable X axis labels."

4. 其他建议

建议增加时间维度，观察情感变化趋势。
- "Graph this over time to see how sentiment changes."
用户希望看到模型组合数据或过滤中性评论的选项。
- "Show combined measurements of model makes...Another toggle to remove the neutral section?"
部分用户对本地推理和小众模型感兴趣。
- "I may be in the minority of HN commenters exploring models for local inference."

5. 开源模型的积极趋势

开源模型（如Qwen、DeepSeek）的正面评价令人意外，显示开源优势。
- "What a win it is for open source that qwen and kimi show up on this at all."
- "The fact that they are viewed this positively shows that being open-source is a massive advantage."

6. 对商业模型的批评

用户对Claude 4.7的态度不满，认为其过于自信且难以控制。
- "The attitude of 4.7 is what is more problematic...Difficult to enforce checking stuff before answering."
怀疑公司可能通过机器人操纵舆论。
- "Companies are deploying bots to shift sentiment around their products."

总结：

评论主要围绕模型的性能、用户情感、方法论缺陷和可视化问题展开。Claude因定价和稳定性受批评，GPT在代码生成上更优但有多语言问题，开源模型因避免锁定获好评。用户建议改进数据分析和图表设计，并关注开源模型的积极趋势。部分人质疑商业模型的真实性和情感分析的准确性。

显示HN：根据Hacker News评论者评出的顶尖编程模型 -- Show HN: State of the Art of Coding Models, According to Hacker News Commenters