Hacker News 中文摘要

文章摘要

我们向各大AI模型提出相同的政治、经济、社会等敏感问题，多次测试并关闭网络搜索，绘制出模型实际倾向的地图。结果显示，多数模型偏左，但程度不同，且并非完全一致。

文章总结

好的，这是根据您的要求，对原文主要内容进行的中文重述，已保留关键细节并删减了与主题无关的冗余内容。

文章标题：AI的政治偏见 · 各大模型立场一览

核心内容：

该项目通过向所有主流AI模型反复提出一系列关于政治、经济、言论和社会的敏感问题（关闭网络搜索功能），来测量其政治倾向。每个模型的多次回答结果会形成一个“云团”，直观展示其立场分布范围。这绘制出了一张反映模型真实倾向的地图，数据直接来自模型本身，而非其从网络抓取的信息。

研究意义： 如今数百万人向这些模型咨询新闻、辩论甚至投票建议，模型的倾向会潜移默化地影响其给出的答案。研究发现，大多数模型倾向于同一方向，但程度不同，且并非如人们预期的那样清晰分明。

数据概况： 2026年6月 · 6个模型 · 4400个回答 · 无网络搜索

分析维度： * 横轴（经济轴）： 从左（经济干预）到右（自由市场）。 * 纵轴（社会轴）： 从下（自由意志主义）到上（威权主义）。 * 解读： 每个“云团”代表一个模型多次回答的分布范围。云团越靠近中心，代表模型越中立。

主要发现： * 6个模型中有4个偏向左翼。 * 最偏右的模型：Grok * 最稳定的模型：Gemini

模型排名与对比： * 排名： 按模型立场距离中心点的远近排序，并评估其稳定性和倾向强度。 * 分歧点： 列出最能导致模型立场分化的具体问题，并展示每个模型在该问题上的倾向强度。 * 最接近的参照人物： 将每个模型的立场与真实世界中的政治人物（如伯尼·桑德斯、特朗普、习近平等）进行匹配，参照数据来自CHES 2024和V-Dem专家调查。

言行一致性分析： 项目对比了模型“自称”的立场与其实际测量结果。例如： * Grok： 自称中立，但实际测量结果比其声称的偏右0.36个单位。 * Claude： 自称中立，但实际测量结果比其声称的偏左0.34个单位。 * ChatGPT & Llama： 自称中立，但实际测量结果偏左。 * DeepSeek & Gemini： 自称中立，且实际测量结果也接近中心。

方法论简述： * 每个模型在无网络搜索、无系统提示词的条件下，被反复提问同一组开放式问题。 * 使用一个中立的分类器，对每个原始回答的立场倾向、回避程度、拒绝类型和情绪化语言进行评分。 * 最终坐标是加权平均值，并附有95%的置信区间。所有原始回答均被永久存储，以确保结果可复现。

常见问题解答： * 项目目的： 测量主要AI模型在政治、经济、言论和社会等敏感问题上的立场。 * 项目独特性： 将每个模型的结果呈现为一个“云团”而非单一点，以展示其回答的波动范围；公开了完整的提问库和评分权重；测量了模型回答的稳定性；并将模型拒绝回答的行为也作为数据点计入。 * 测试对象： 测试的是模型本身的权重（内在倾向），而非互联网。默认关闭网络搜索，以排除网络信息干扰。 * 项目立场： 该项目是描述性的，而非规定性的。它只报告模型说了什么，不评判谁对谁错。

评论总结

以下是对评论内容的总结，关注主要观点、论据及评分（均为None），并保持不同观点的平衡性。每个观点保留2-3条关键引用（中英文）。

1. 政治偏见与LLM的客观性

观点：LLM的政治偏见常被忽视，但可能影响深远。人们误以为LLM的细致回答代表“全貌”，实则不然。
- 引用：dwoosley: "Political bias of LLMs is something not talked about much... could have a big impact on the next decade."
- 引用：dwoosley: "People seem to think that because an LLM gave a nuanced answer that it means it gave the WHOLE picture… and that’s not always the same thing."
观点：分类结果取决于如何定义“左”与“右”，本质是模型与研究者偏见的差异。
- 引用：mrhottakes: "The outcome is entirely dependent on how the responses to 'politically charged questions' are graded as left vs. right."
- 引用：mrhottakes: "You're mostly just examining a delta in biases between the model and the investigator."

2. 政治罗盘工具的局限性

观点：政治罗盘过于简化，无法捕捉个人政治的复杂性；其“中立”立场本身是主观的。
- 引用：giancarlostoro: "The political compass always felt like the wrong tool to convey something as nuanced as personal politics."
- 引用：vrganj: "This wrongly assumes... there is such a thing as a 'center' or an 'unbiased' position."
- 引用：throw4847285: "The political compass is terrible, full stop. It is a meme... stupid is simple and simple is viral."
观点：图表设计可能误导，如Grok的定位被视觉放大，实际ChatGPT左倾程度更甚。
- 引用：Cakez0r: "The logos have faint grey lines... conspicuously making Grok look very far right... they measured Chatgpt being further to the left than Grok is to the right."

3. 具体模型与案例的争议

观点：DeepSeek被标为“居中”不准确，因其对中国历史（如天安门事件）的回答明显受控。
- 引用：aucisson_masque: "Go ask deepseek about tiananmen square."
- 引用：jazz9k: "How can deepseek 'lean center'? If you’ve ever asked it about real Chinese history, you know this isn’t true."
观点：Grok的右倾定位可能被强制，且其“压力下弯曲”特性值得质疑。
- 引用：EColi: "Interesting how high Grok scored for 'bending under pressure'... how is an llm trained to hold its position?"
- 引用：breakyerself: "Grok is well known to have right wing views directly forced on it."

4. 对LLM角色的期望

观点：用户希望LLM提供多面观点，而非镜像自身或特定阵营的偏见。
- 引用：uyzstvqs: "I absolutely do not want an LLM to mirror my opinions... I need it to give me all relevant sides to an argument."
- 引用：IgorPartola: "I asked Claude to present... the best versions of the conservative and liberal pitches... quite instructive."
观点：LLM应被视为工具而非智能体，其“意见”源于训练数据，需谨慎对待。
- 引用：godshatter: "Treating these things like they have any actual intelligence is a big problem waiting to happen."
- 引用：Terr_: "It's a mistake to consider the difference itself as nefarious... We humans are the ones over-anthropomorphizing."

5. 方法论与数据问题

观点：研究禁用推理功能，降低了基准价值；模型版本信息不透明。
- 引用：spongebobstoes: "Reasoning is the state of the art, and disabling it reduces the value of this research."
- 引用：spongebobstoes: "A brand is not a model, and models change quickly."
观点：图表顶层数据与底层调查结果不一致，需更细致分析。
- 引用：plorg: "The more granulated 'survey' data lower down looks not just much different but more interesting."

6. 政治现实与LLM的脱节

观点：现代政治本质是1%与大众的对立，左右划分已过时。
- 引用：samat: "Real politics is 1% versus everyone... This left vs right divide... is absolutely divide and control tactics."
观点：LLM对特定人物（如特朗普）的定性回避，暴露其偏见。
- 引用：crumpled: "I haven't encountered a chatbot yet that is willing to recognize DJT as a fascist... That tells me a lot."

总结：评论普遍质疑政治罗盘的有效性，认为LLM偏见源于训练数据与分类方法，且用户期望模型提供平衡信息而非立场。具体案例（如DeepSeek、Grok）引发争议，方法论缺陷（如禁用推理、版本不透明）被批评。部分评论强调政治现实复杂性，认为左右划分已不适用。

AI中的政治偏见：AI模型的立场 -- Political bias in AI: Where the AI models stand